# Description

Functions are a fundamental building block of the R language. You've probably used dozens (or even hundreds) of functions written by others, but in order to take your R game to the next level, you'll need to learn to write your own functions. This course will teach you the fundamentals of writing functions in R so that, among other things, you can make your code more readable, avoid coding errors, and automate repetitive tasks.

## 1) A quick refresher

### 1.1) Writing a function

The function template is a useful way to start writing a function:

    my_fun <- function(arg1, arg2) {
      # body
    }
    
my_fun is the variable that you want to assign your function to, arg1 and arg2 are arguments to the function. The template has two arguments, but you can specify any number of arguments, each separated by a comma. You then replace # body with the R code that your function will execute, referring to the inputs by the argument names you specified.

    For example we can built a function that calculate the radio i.e (x/y) of two values.
    # Define ratio() function
    ratio <- function(x, y) {
      x/y
    }

    # Call ratio() with arguments 3 and 4
    ratio(3,4)

Don´t forget that exists two ways to specify the arguments, in the previous example you probably either did ratio(3, 4), which relies on matching by **position**, or ratio(x = 3, y = 4), which relies on matching by **name**.

*Notice that when you call a function, you should place a space around = in function calls, and always put a space after a comma, not before (just like in regular English). Using whitespace makes it easier to skim the function for the important components.*

### 1.2) Scoping
Describes how R look up values when give a name, but it´s important to know how R looking for into a function, when a function is called, the function begins execution in a new working enviroment, (example 1)
    
    f<-function() {
        x<-1
        y<-2
        c(x,y)
    }

    f()

if a variable referred to inside a function **doesn´t exist in the function´s current enviroment,it looks in the enviroment one level up ** (example2)

Note: if a variable didn´t exist in the global enviroment the function will return an error, since this variable isn´t defined locally or at any higher level.

Scoping discribe only where, not when, to look up for a variable, this means it´s possible the retunr value of a function could depend on when you call it (example 3)


In [13]:
#Example 1
f<-function() {
    x<-1
    y<-2
    c(x,y)
}

f()

#Example 2
x<-3
g<-function() {
    y<-4
    c(x,y)
}
g()


#Example 3
h<- function () z 

z<-50
h()

z<-100
h()


### 1.3) Data structures
we need remind our two types of vector that we have in R, there are:

1. Atomic vectors of six types: logical,integer, double, character, complex and raw.
2. list, recoursive vectors, because lists can contain other lists.

the same way, the contents of an atomic vectors is always of one type, whereas list can contain heterogeneous or multiples types 

Note:it´is important remember the subsetting so see into file imagen "list"

## for loops

Let's take a look at the sequence component of our for loop:

i in 1:ncol(df)
Each time our for loop iterates, i takes the next value in 1:ncol(df). This is a pretty common model for a sequence: a sequence of consecutive integers designed to index over one dimension of our data.

What might surprise you is that this isn't the best way to generate such a sequence, especially when you are using for loops inside your own functions. Let's look at an example where df is an empty data frame:

    df <- data.frame()
    1:ncol(df)

    for (i in 1:ncol(df)) {
      print(median(df[[i]]))
    }
Our sequence is now the somewhat non-sensical: 1, 0. You might think you wouldn't be silly enough to use a for loop with an empty data frame, but once you start writing your own functions, there's no telling what the input will be.

A better method is to use the seq_along() function. This function generates a sequence along the index of the object passed to it, but handles the empty case much better.

In [2]:
# Replace the 1:ncol(df) sequence
for (i in 1:ncol(df)) {
  print(median(df[[i]]))
}

for (i in seq_along(df)) {
  print(median(df[[i]]))
}

# Create an empty data frame
empty_df <- data.frame()

for (i in 1:ncol(empty_df)) {
  print(median(empty_df[[i]]))
}

ERROR: Error in 1:ncol(df): argument of length 0


Keeping output

Our for loop does a good job displaying the column medians, but we might want to store these medians in a vector for future use.

Before you start the loop, you must always allocate sufficient space for the output, let's say an object called output. This is very important for efficiency: if you grow the for loop at each iteration (e.g. using c()), your for loop will be very slow.

**A general way of creating an empty vector of given length is the vector() function. It has two arguments: the type of the vector ("logical", "integer", "double", "character", etc.) and the length of the vector.**

Then, at each iteration of the loop you must store the output in the corresponding entry of the output vector, i.e. assign the result to output[[i]]. (You might ask why we are using double brackets here when output is a vector. It's primarily for generalizability: this subsetting will work whether output is a vector or a list.)

Let's edit our loop to store the medians, rather than printing them to the console.

    # Create new double vector: output
    output <- vector("double", ncol(df))

    # Alter the loop
    for (i in seq_along(df)) {
      # Change code to store result in output
      output[i] <- median(df[[i]])
    }

    # Print output
    output

## 2) When and how you should write a function
Writing your own functions is one way to reduce duplication in your code. In this chapter, you'll learn when to write a function, how to get started and what to keep in mind when you are writing. You'll also learn to appreciate that functions have two audiences: the computer (which runs the code) and humans (who need to be able to understand the code).


### 2.1) Why should you write a function?
we have a simple rule:

    if you have copied-and-pasted twice, it´s time to write a function.
    
### 2.2) How should you write a function?

1. Star with a simple problem
2. Get a working snippet of code
3. Rewrite to use temporary template
4. Rewrite for clarity
5. Finally, turn into a function.

### 2.3) How cant write a function?
For this, we will be working with a function that calculate confidence interval:

    mean_ci <- function(x, level = 0.95) {
      se <- sd(x) / sqrt(length(x))
      alpha <- 1 - level
      mean(x) + se * qnorm(c(alpha / 2, 1 - alpha / 2))
    }

#### Argument order
Aside from giving your arguments good names, you should put some thought into what order your arguments are in and if they should have defaults or not.

Arguments are often one of two types:

Data arguments supply the data to compute on.
Detail arguments control the details of how the computation is done.
Generally, data arguments should come first. Detail arguments should go on the end, and usually should have default values.

#### Return statements
One of your colleagues has noticed if you pass mean_ci() an empty vector it returns a confidence interval with missing values at both ends (try it: mean_ci(numeric(0))). In this case, they decided it would make more sense to produce a warning "x was empty" and return c(-Inf, Inf) and have edited the function to be:

    mean_ci <- function(x, level = 0.95) {
      if (length(x) == 0) {
        warning("`x` was empty", call. = FALSE)
        interval <- c(-Inf, Inf)
      } else { 
        se <- sd(x) / sqrt(length(x))
        alpha <- 1 - level
        interval <- mean(x) + 
          se * qnorm(c(alpha / 2, 1 - alpha / 2))
      }
      interval
    }
    
Notice how hard it is now to follow the logic of the function. If you want to know what happens in the empty x case, you need to read the entire function to check if anything happens to interval before the function returns. There isn't much to read in this case, but if this was a longer function you might be scrolling through pages of code.

This is a case where an early return() makes sense. If x is empty, the function should immediately return c(-Inf, Inf)

## 3) Functional programming
You already know how to use a for loop. The goal of this chapter is to teach you how to use the map functions in the purrr package which remove the code that's duplicated across multiple for loops. After completing this chapter you'll be able to solve new iteration problems with greater ease (faster and with fewer bugs).

### 3.1.1) Why functional programming?
Using a for loop to remove duplication, imagine we have a data frame called df:

    df <- data.frame(
      a = rnorm(10),
      b = rnorm(10),
      c = rnorm(10),
      d = rnorm(10)
    )
    
We want to compute the median of each column. You could do this with copy and paste.

    median(df[[1]])
    median(df[[2]])
    median(df[[3]])
    median(df[[4]])
    
But that's a lot of repetition! Let's start by seeing how we could reduce the duplication by using a for loop.

    # Initialize output vector
    output <- numeric(ncol(df))

    # Fill in the body of the for loop
    for (i in seq_along(df)) {            
    output[i]<-median(df[[i]])
    }

    # View the result
    output
    
### 3.1.2) Turning the for loop into a function
Now, imagine you need to do this to another data frame df2. You copy and paste the for loop, and edit every reference to df to be df2 instead.

And then you realize you have another data frame df3 for which you also want the column medians. You copy and paste...and realize you've copied and pasted two times. Time to write a function!

    col_median<- function(x) {
    output <- numeric(ncol(x))
    output<-sapply(x,median)
    output
    }
    
    or
    
    col_median <- function(df) {
    output <- numeric(ncol(df))
    for (i in seq_along(df)) {
        output[[i]] <- median(df[[i]])
      }
    output
    }
    
What about column means?

What if instead of medians of every column you actually want means? , Let's write a col_mean() function that returns the vector of column means.

    # Change col_median() to a col_mean() function to find column means
    col_mean <- function(df) {
      output <- numeric(ncol(df))
      for (i in seq_along(df)) {
        output[[i]] <- mean(df[[i]])
      }
      output
    }

What about column standard deviations?
You now have functions for column medians and means, what about one for standard deviations?

    # Define col_sd() function
    col_sd <- function(df) {
      output <- numeric(length(df))
      for (i in seq_along(df)) {
        output[[i]] <- sd(df[[i]])
      }
      output
    }

Uh oh...time to write a function again
We just copied and pasted the function col_median two times. That's a sure sign we need to write a function. How can we write a function that will take column summaries for any summary function we provide?

Let's look at a simpler example first. Consider the functions f1(), f2() and f3() that take a vector x and return deviations from the mean value raised to the powers 1, 2, and 3 respectively:

    f1 <- function(x) abs(x - mean(x)) ^ 1
    f2 <- function(x) abs(x - mean(x)) ^ 2
    f3 <- function(x) abs(x - mean(x)) ^ 3
    
How could you remove the duplication in this set of function definitions?

Hopefully, you would suggest writing a single function with two arguments: x and power. That way, one function reproduces all the functionality of f1(), f2() and f3(), and more.

    # Add a second argument called power
    f <- function(x,power) {
        # Edit the body to return absolute deviations raised to power
        abs(x - mean(x))^power
    }
    

### 3.2.1 )Functions can be arguments too

### 3.2.2 ) Using a function as an argument
You just saw that we can remove the duplication in our set of summary functions by requiring the function doing the summary as an input. This leads to creating the col_summary function:

    col_summary <- function(df, fun) {
      output <- numeric(ncol(df))
      for (i in seq_along(df)) {
        output[[i]] <- fun(df[[i]])
      }
      output
    }

It may be kind of surprising that you can pass a function as an argument to another function, so let's verify first that it worked. We've found the column means and medians using our old col_mean() and col_median() functions. Your job is to repeat the calculations using col_summary() instead and verify that it works

    # Find the column medians using col_median() and col_summary()
    #col_median(df)
    col_summary(df , fun=median)

    # Find the column means using col_mean() and col_summary()
    #col_mean(df)
    col_summary(df , fun=mean)

    # Find the column IQRs (interquartile)  using col_summary()
    col_summary(df , fun=IQR)
    
### 3.2.3) Introducing purrr
The package purr has a function called "map" that work like sapply or lapply but this new function has  advantages

All the map functions in purrr take a vector,`.x`, as the first argument, then return .f applied to each element of .x. The type of object that is returned is determined by function suffix (the part after _):

    map() returns a list or data frame
    map_lgl() returns a logical vector
    map_int() returns a integer vector
    map_dbl() returns a double vector
    map_chr() returns a character vector
    
so for example, we have:

    # Load the purrr package
    library(purrr)

    # Use map_dbl() to find column means
    map_dbl(df , mean)

    # Use map_dbl() to column medians
    map_dbl(df , median)

    # Use map_dbl() to find column standard deviations
    map_dbl(df , sd)
    
### 3.2.4)The ... argument to the map functions
The map functions use the ... ("dot dot dot") argument to pass along additional arguments to .f each time it’s called. For example, we can pass the trim argument to the mean() function:

    map_dbl(df, mean, trim = 0.5)

Multiple arguments can be passed along using commas to separate them. For example, we can also pass the na.rm argument to mean():

    map_dbl(df, mean, trim = 0.5, na.rm = TRUE)

You don't have to specify the arguments by name, but it is good practice!

**You may be wondering why the arguments to map() are .x and .f and not x and f? It's because .x and .f are very unlikely to be argument names you might pass through the ..., thereby preventing confusion about whether an argument belongs to map() or to the function being mapped**

    # Find the mean of each column
    map_dbl(planes, mean)

    # Find the mean of each column, excluding missing values
    map_dbl(planes, mean, na.rm = TRUE)

    # Find the 5th percentile of each column, excluding missing values
    map_dbl(planes, quantile, probs = c(0.05), na.rm = TRUE)
    
### 3.2.5)Picking the right map function
Choosing the right map function is important. You can always use map(), which will return a list. However, if you know what type of output you expect, you are better to use the corresponding function. That way, if you expect one thing and get another, you'll know immediately because the map function will return an error.

For example, try running:  

    map_lgl(df, mean)

The map functions are what we call type consistent. This means you know exactly what type of output to expect regardless of the input. map_lgl() either returns either a logical vector or an error. map_dbl() returns either a double or an error.

One way to check the output type is to run the corresponding function on the first element. For example, mean(df[[1]]) returns a single numeric value, suggesting map_dbl().

    # Find the columns that are numeric
    map_lgl(df3, is.numeric)

    # Find the type of each column
    map_chr(df3, typeof)

    # Find a summary of each column
    map(df3, summary)
    

### 3.3) ShortCuts
Sometimes you want to call a series of functions or a function where the argument you want to vary isn´t the first argument one way to do this is to anonymous function **on the fly**  for example:

    #anonymous function defined on fly
    map(df, function(x) sum(is.na(x)))
    
Our first of our shorcuts is designed to achieve the same result with much less typing instead to pass anonymous function, we pass a formula the **tilde** signifes **this** is a formula
    
    #anonymous function defined using a formula shorcut
    map(df, ~ sum(is.na(x)))
    
Then we can write a R expression as usual, using a dot as a placeholder for an element of dot x, notice this expression is just the body or our anonymous function, with the dot taking the place of our argument.

The other two shorcuts the map functions provide are associeted with subsetting, imagine we have a list with the following elements:

    list_of_results <- list(
      list(a = 1, b = "A"), 
      list(a = 2, b = "C"), 
      list(a = 3, b = "D")
    )
    
Now we want to extract all "a" elements:
    
    #anonymous function 
    map_dbl(list_of_results, function(x) x[["a"]])
    [1] 1 2 3
    
    #if dot f is just a string, then the element with that name is stracted from each elemenet in dot x 
    map_dbl(list_of_results, "a")
    [1] 1 2 3
    
    #similary, if you provide just an integer to dot f, then the element at that index is extracted from each element of dot x
    map_dbl(list_of_results, 1)
    [1] 1 2 3
    

### 3.3.1) Using an anonymous function
Great! We now have a snippet of code that performs the operation we want on one data frame. One option would be to turn this into a function, for example:

    fit_reg <- function(df) {
      lm(mpg ~ wt, data = df)
    }

Then pass this function into map():

    map(cyl, fit_reg)

But it seems a bit much to define a function for such a specific model when we only want to do this once. Instead of defining the function in the global environment, we will just use the function anonymously inside our call to map().

What does this mean? Instead of referring to our function by name in map(), we define it on the fly in the .f argument to map().

    map(cyl, function(df) lm(mpg ~ wt, data = df))

### 3.3.2) Using a formula
Writing anonymous functions takes a lot of extra key strokes, so purrr provides a shortcut that allows you to write an anonymous function as a one-sided formula instead.

In R, a one-sided formula starts with a ~, followed by an R expression. In purrr's map functions, the R expression can refer to an element of the .x argument using the . character.

Let's take a look at an example. Imagine, instead of a regression on each data frame in cyl, we wanted to know the mean displacement for each data frame. One way to do this would be to use an anonymous function:

    map_dbl(cyl, function(df) mean(df$disp))
    
To perform the same operation using the formula shortcut, we replace the function definition (function(df)) with the ~, then when we need to refer to the element of cyl the function operates on (in this case df), we use a "."(point) .

    map_dbl(cyl, ~ mean(.$disp))

other example could be :

    # Rewrite to use the formula shortcut instead
    map(cyl, ~ lm(mpg ~ wt, data = .))

### 3.3.3) Using a string
There are also some useful shortcuts that come in handy when you want to subset each element of the .x argument. If the .f argument to a map function is set equal to a string, let's say "name", then purrr extracts the "name" element from every element of .x.

This is a really common situation you find yourself in when you work with nested lists. For example, if we have a list of where every element contains an a and b element:

    list_of_results <- list(
      list(a = 1, b = "A"), 
      list(a = 2, b = "C"), 
      list(a = 3, b = "D")
    )
    
We might want to pull out the **a** element from every entry. We could do it with the string shortcut like this:

    map(list_of_results, "a")
    
Now take our list of regresssion models:

    map(cyl, ~ lm(mpg ~ wt, data = .))
    
It might be nice to extract the slope coefficient from each model. You'll do this in a few steps: first fit the models, then get the coefficients from each model using the coef() function, then pull out the wt estimate using the string shortcut.

    # Save the result from the previous exercise to the variable models
    models <- map(cyl, ~ lm(mpg ~ wt, data = .))

    # Use map and coef to get the coefficients for each model: coefs
    coefs <- map(models, coef)

    # Use string shortcut to extract the wt coefficient 
    map(coefs, "wt")
    
### 3.3.4) Using a numeric vector
Another useful shortcut for subsetting is to pass a numeric vector as the .f argument. This works just like passing a string but subsets by index rather than name. For example, with your previous list_of_results:

    list_of_results <- list(
      list(a = 1, b = "A"), 
      list(a = 2, b = "C"), 
      list(a = 3, b = "D")
    )
    
Another way to pull out the a element from each list, is to pull out the first element:

    map(list_of_results, 1)
    
### 3.3.5) Putting it together with pipes
purrr also includes a pipe operator: %>%. The pipe operator is another shortcut that saves typing, but also increases readability. The explanation of the pipe operator is quite simple: x %>% f(y) is another way of writing f(x, y). That is, the left hand side of the pipe, x, becomes the first argument to the function, f(), on the right hand side of the pipe.

Take a look at our code to get our list of models:

    cyl <- split(mtcars, mtcars$cyl) 
    map(cyl, ~ lm(mpg ~ wt, data = .))

We split the data frame mtcars and save it as the variable cyl. We then pass cyl as the first argument to map to fit the models. We could rewrite this using the pipe operator as:

    split(mtcars, mtcars$cyl) %>% 
      map(~ lm(mpg ~ wt, data = .))
      
We read this as "split the data frame mtcars on cyl, then use map() on the result."

One of the powerful things about the pipe is we can chain together many operations. Here is our complete code, written with pipes, instead assigning each step to a variable and using it in the next step:

    mtcars %>% 
      split(mtcars$cyl) %>%
      map(~ lm(mpg ~ wt, data = .)) %>%
      map(coef) %>% 
      map_dbl("wt")
      
We've written some code in the editor to pull out the R2 from each model. Rewrite the last two lines to use a pipe instead.

## 4) Advanced inputs and outputs

Now you've seen how useful the map functions are for reducing duplication, we'll introduce you to a few more functions in purrr that allow you to handle more complicated inputs and outputs. In particular, you'll learn how to deal with functions that might return an error, how to iterate over multiple arguments and how to iterate over functions that have no output at all.


### a) Dealing with failure

One downside of the map functions, compared to for loops, is that if one of the iterations fails, the whole things fails,luckly, purr provides a solution in the form of the safely function, when you wrap a function with **safely** it will always succeed.


for example:
    
    log_list<-map(long_list, log)
    [1]Error in .f (.x[[i]], ...) : non-numeric argument to mathematical function
    
    
    log_list<-map(long_list, safely(log))

safely is like adverb, it takes a function and returns a modified function that never throws an error, when we apply safely to log the result is a new function but the returnet value is now different beacuse the funtions **always returns a list with two components: result and error ** if the function worked, then result contains the result and error is NULL, otherwise result is NULL an the error component the error message 
 

Other adverbs for unusual output are:

possibly() : always succeeds, you give it a default value to return when there is an error
quietly()  : captures printed output, messages, and warnings instead of capturing erros.

#### a.1) Creating a safe function
safely() is an adverb; it takes a verb and modifies it. That is, it takes a function as an argument and it returns a function as its output. The function that is returned is modified so it never throws an error (and never stops the rest of your computation!).

Instead, it always returns a list with two elements:

1. result is the original result. If there was an error, this will be NULL.
2. error is an error object. If the operation was successful this will be NULL.

Let's try to make the readLines() function safe.

    # Create safe_readLines() by passing readLines() to safely()
    safe_readLines <- safely(readLines)

    # Call safe_readLines() on "http://example.org"
    example_lines <- safe_readLines("http://example.org")
    example_lines

    # Call safe_readLines() on "http://asdfasdasdkfjlda"
    nonsense_lines <- safe_readLines("http://asdfasdasdkfjlda")
    nonsense_lines
    
#### a.2) Using map safely
One feature of safely() is that it plays nicely with the map() functions. Consider this list containing the two URLs from the last exercise, plus one additional URL to make things more interesting:

    urls <- list(
      example = "http://example.org",
      rproj = "http://www.r-project.org",
      asdf = "http://asdfasdasdkfjlda"
    )

We are interested in quickly downloading the HTML files at each URL. You might try:

    map(urls, readLines)
    
But it results in an error, Error in file(con, "r") : cannot open the connection, and no output for any of the URLs. Go on, try it!

We can solve this problem by using our safe_readLines() instead.

    # Define safe_readLines()
    safe_readLines <- safely(readLines)

    # Use the safe_readLines() function with map(): html
    html <- map(urls, safe_readLines)

    # Call str() on html
    str(html)

    # Extract the result from one of the successful elements
    html[["example"]][["result"]]

    # Extract the error from the element that was unsuccessful
    html[["asdf"]][["error"]]
    
#### a.3) Working with safe output
We now have output that contains the HTML for each of the two URLs on which readLines() was successful and the error for the other. But the output isn't that easy to work with, since the results and errors are buried in the inner-most level of the list.

purrr provides a **`function transpose()`** that reshapes a list so the inner-most level becomes the outer-most level. In otherwords, it turns a list-of-lists "inside-out". Consider the following list:

    nested_list <- list(
       x1 = list(a = 1, b = 2),
       x2 = list(a = 3, b = 4)
    )
    
If I need to extract the a element in x1, I could do `nested_list[["x1"]][["a"]] `. However, if I transpose the list first, the order of subsetting reverses. That is, to extract the same element I could also do `transpose(nested_list)[["a"]][["x1"]]`.

This is really handy for safe output, since we can grab all the results or all the errors really easily.

    # Define safe_readLines() and html
    safe_readLines <- safely(readLines)
    html <- map(urls, safe_readLines)

    # Examine the structure of transpose(html)
    str(transpose(html))

    # Extract the results with transpose ans save it into variable: res 
    res<-transpose(html)[["result"]]

    # Extract the errors: errs
    errs<-transpose(html)[["error"]]
    
#### a.4) Working with errors and results
What you do with the errors and results is up to you. But, commonly you'll want to collect all the results for the elements that were successful and examine the inputs for all those that weren't.

    # Initialize some objects
    safe_readLines <- safely(readLines)
    html <- map(urls, safe_readLines)
    res <- transpose(html)[["result"]]
    errs <- transpose(html)[["error"]]

    # Create a logical vector is_ok
    is_ok<-map_lgl(errs,is_null)

    # Extract the successful results
    res[is_ok]

    # Find the URLs that were unsuccessful
    urls[!is_ok]
    
### b) Maps over multiple arguments 
in this section we will see how we can be mapping over many arguments, so:

    map2() - iterate over two orguments
    pmap() - iterate over many arguments
    invoke_map - iterate over functions and arguments
    
A running example throughtout the following sections is generating *random samples* (sample means a representative piece) from known statical distributions, we are working with the function rnom that created a normal distribuction over n samples, with mean and sd standar distribution.

Now imagine we want to take 3 samples of size 5,10 and 20, we have two options:

    #First options, we write 3 calls
    rnorm(5)
    rnorm(10)
    rnorm(20)
    
    #Second, we could use our new map skills to write one call
    map(list(5,10,20) , rnorm)
    
**But now, what if we want ot vary the mean of normal distribution for each sample? ** for example
    
    rnorm(5, mean = 1)
    rnorm(10, mean = 5)
    rnorm(20, mean = 10)    

The purr function `map2` is designed exactly for this iteration problem, because map2 has an additional argument, dot y, that allow us to specify another object to iterate over, the function dot f is applied with the first element of dot x as its first argument, and the first argument of dot y as its second argument, then the second element of dot x as its first argument and the second element of dot y as its second argument and so on, until iterated through the lists

    map2(.x, .y, .f, ...)
    #in our example, we have:
    map2(list(5, 10, 20), list(1, 5, 10), rnorm)
    
Let´s take it one step further, what if the sd argument should also vary across our three samples?, now whe have 3 arguments to iterate over! rather than having map3, map4 and so on, purr provides the **pmap** funtion, which handles iterating over any numbers of arguments:

The first argument to pmap is a **list dot** that should cantain all the list to iterate over, using names for this list ensures the values are match up to the right arguments in our function we are iteratively applaying.
    
    rnorm(5, mean = 1, sd=0.1)
    rnorm(10, mean = 5, sd=0.5)
    rnorm(20, mean = 10, sd=0.1)    
    
    #our new function
    pmap(.l, .f, ...)
    
    
    #now
    pmap(list(n = list(5, 10, 20),
              mean = list(1, 5, 10),
              sd = list(0.1, 0.5, 0.1)), rnorm)

*Argument matching *

Compare the following two calls to pmap() (run them in the console and compare their output too!):

    pmap(list(n, mu, sd), rnorm)
    pmap(list(mu, n, sd), rnorm)
    
What's the difference? By default pmap() matches the elements of the list to the arguments in the function by position. In the first case, n to the n argument of rnorm(), mu to the mean argument of rnorm(), and sd to the sd argument of rnorm(). In the second case mu gets matched to the n argument of rnorm(), which is clearly not what we intended!


Instead of relying on this positional matching, a safer alternative is to provide names in our list. The name of each element should be the argument name we want to match it to.

    # Name the elements of the argument list
    pmap(list(mean = mu,n = n, sd = sd), rnorm)

Finally, you might want iterate not over a vector of values, but over functions themselves, for examples, we might be interested in simulating from 3 different distributions: normal, uniform and exponential, this is handled in purr by invoke map function, invoke map is reversed dot f comes first and should be a list functions and dot x is second and is a list where we can supply arguments for each function in dot f

    #for examples:
    rnorm(5)
    runif(5)
    rexp(5)
    
    # function
    invoke_map(.f, .x=list(NULL), ...)
    
    #in out examples
    inveoke_map(list(rnorm, runif, rexp), n = 5)
    
In more complicated cases, the functions may take different arguments, or we may want to pass different values to each function. In this case, we need to supply invoke_map() with a list, where each element specifies the arguments to the corresponding function.

Let's use this approach to simulate three samples from the following three distributions: Normal(10, 1), Uniform(0, 5), and Exponential(5).

    # Define list of functions
    funs <- list("rnorm", "runif", "rexp")

    # Parameter list for rnorm()
    rnorm_params <- list(mean = 10)

    # Add a min element with value 0 and max element with value 5
    runif_params <- list(min = 0,max = 5)

    # Add a rate element with value 5
    rexp_params <- list(rate = 5)

    # Define params for each function
    params <- list(
      rnorm_params,
      runif_params,
      rexp_params
    )

    # Call invoke_map() on funs supplying params and setting n to 5
    invoke_map(funs, params , n = 5)
    
### c) Maps with side effects
We'll introduce the walk functions, the functions in the purr designed for use with functions called ther side effects, but firts what is a side effects?

Side effects: describes anything that happens that isn´t return value of the function or beyond the result of a function, for example priting output, plotting, and savinf files to disk.

In purr the walk functions is just like map except it´s designed fo iterating functions that are called for their side effects

#### c.1) Walk
walk() operates just like map() except it's designed for functions that don't return anything. You use walk() for functions with side effects like printing, plotting or saving.

    # Define list of functions
    funs <- list(Normal = "rnorm", Uniform = "runif", Exp = "rexp")

    # Define params
    params <- list(
      Normal = list(mean = 10),
      Uniform = list(min = 0, max = 5),
      Exp = list(rate = 5)
    )

    # Assign the simulated samples to sims
    sims<-invoke_map(funs, params, n = 50)

    # Use walk() to make a histogram of each element in sims
    walk(sims, hist)
    
#### c.2) Walking over two or more arguments

Those histograms were pretty good, but they really needed better breaks for the bins on the x-axis. That means we need to vary two arguments to hist(): x and breaks. Remember map2()? That allowed us to iterate over two arguments. Guess what? There is a walk2(), too!

    # Replace "Sturges" with reasonable breaks for each sample
    breaks_list <- list(
      Normal = seq(6, 16, 0.5),
      Uniform = seq(0, 5, 0.25),
      Exp = seq(0, 1.5, 0.1)
    )
    # Use walk2() to make histograms with the right breaks
    walk2(sims, breaks_list, hist)
    
#### c.3) Putting together writing functions and walk
In the previous exercise, we hard-coded the breaks, but that was a little lazy. Those breaks probably won't be great if we change the parameters of our simulation.

A better idea would be to generate reasonable breaks based on the actual values in our simulated samples. This is a great chance to review our function writing skills and combine our own function with purrr.

Let's start by writing our own function find_breaks(), which copies the default breaks in the ggplot2 package: break the range of the data in 30 bins.

How do we start? Simple, of course! Here's a snippet of code that works for the first sample:
    
    rng <- range(sims[[1]], na.rm = TRUE)
    seq(rng[1], rng[2], length.out = 30)
    
Turn the snippet above into a function called find_breaks(), which takes a single argument x and return the sequence of breaks.
Check that your function works by calling find_breaks() on sims[[1]].
    
        # Turn this snippet into find_breaks()
    find_breaks <- function(x) {
      rng <- range(x, na.rm = TRUE)
      seq(rng[1], rng[2], length.out = 30)
    }

    # Call find_breaks() on sims[[1]]
    find_breaks(sims[[1]])
    
Now we use map 

    # Use map() to iterate find_breaks() over sims: nice_breaks
    nice_breaks<- map(sims, find_breaks)

    # Use nice_breaks as the second argument to walk2()
    walk2(sims, nice_breaks, hist)
    
#### c.4) Walking with many arguments: pwalk
Ugh! Nice breaks but those plots had UUUUGLY labels and titles. The x-axis labels are easy to fix if we don't mind every plot having its x-axis labeled the same way. We can use the ... argument to any of the map() or walk() functions to pass in further arguments to the function .f. In this case, we might decide we don't want any labels on the x-axis, in which case we need to pass an empty string to the xlab argument of hist():

walk2(sims, nice_breaks, hist, xlab = "")
But, what about the titles? We don't want them to be the same for each plot. How can we iterate over the arguments x, breaks and main? You guessed it, there is a pwalk() function that works just like pmap().

Let's use pwalk() to tidy up these plots. Also, let's increase our sample size to 1000.

    # Increase sample size to 1000
    sims <- invoke_map(funs, params, n = 1000)

    # Compute nice_breaks (don't change this)
    nice_breaks <- map(sims, find_breaks)

    # Create a vector nice_titles
    nice_titles <- c("Normal(10, 1)", "Uniform(0, 5)", "Exp(5)")

    # Use pwalk() instead of walk2()
    pwalk(list(x = sims, breaks = nice_breaks, main = nice_titles), hist, xlab = "")
    
#### c.5) Walking with pipes
One of the nice things about the walk() functions is that they return the object you passed to them. This means they can easily be used in pipelines (a pipeline is just a short way of saying "a statement with lots of pipes").

To illustrate, we'll return to our first example of making histograms for each sample:

    walk(sims, hist)

Take a look at what gets returned:

    tmp <- walk(sims, hist)
    str(tmp)

It's our original sims object. That means we can pipe the sims object along to other functions. For example, we might want some basic summary statistics on each sample as well as our histograms.

    # Pipe this along to map(), using summary() as .f
    sims %>%
      walk(hist) %>% 
        map(summary)

## 5) Robust functions

In this chapter we'll focus on writing functions that don't surprise you or your users. We'll expose you to some functions that work 95% of the time, and 5% of the time fail in surprising ways. You'll learn which functions you should avoid using inside a function and which you should use with care.


### a) Robust functions
Roboust function either return the correct result or they fail with a clear error message, we will see that o lot of problems arise because of a fundamental tension in R: R is both an environment for interactive data analysis and a progamming language, so when you are doing interactive analysis, you want to iterate quicly possible and you check each result as you go that means functionss designed for interactive use can be helpulf;they can guess what you want and if they guess wrong it´s no big deal on the other hand, functions designed for programming should be robust you´re not working with them interactively so sometimes is helpulf isn´t helpuf!

There are three main classes of functions that area often helpful for interactive usage, but can cause problems when writing functions 

1. Type-unstable functions, returns different types of things, for example with one type of input they might return a vector, but with another type they return data frame or matrix.
2. Non-standard evaluation, is a very important part of R´s magic, it lets you use incredibly succinct APIs, like ggplot2 and dplyr, but introduces some ambiguity, which you need to be careful about when programming 
3. Hidden arguments, R has global options, wich can effect the operetion of ceratain functions, the most notorious of these is StringAsFactors.

Before to star we will learn, how your functions throw an informative error message instead of returning incorrect or surprising results.

The ´stopifnot´ function is a quick way to throw an error if a condition isn´t met, the arguments to stopoifnot are logical expressions and if any are false, an error is thrown.

    example
    x<-1:10
    stopifnot(is.character(x))

While stopifnot is great for adding quick checks to a function, the error messages that it generates are not that user friendly, using a condition with the function **stop** is a more verbose alternative that allows you to specify a more helpful arror message. Here is a general pattern for using stop.

    if (condition) {
    stop("Error", call.=FALSE)
    } 
    
You substitute a logical expression for **condition** and your own message for "Error", for example

    if(!is.character(x)) {
    stop("'x' should be a vector", call.=FALSE)
    }

#### a.1) An error is better than a surprise
Recall our both_na() function from Chapter 2, that finds the number of entries where vectors x and y both have missing values:

    both_na <- function(x, y) {
      sum(is.na(x) & is.na(y))
    }
We had an example where the behavior was a little surprising:

    x <- c(NA, NA, NA)
    y <- c( 1, NA, NA, NA)
    both_na(x, y)
    
The function works and returns 3, but we certainly didn't design this function with the idea that people could pass in different length arguments.

Using stopifnot() is a quick way to have your function stop, if a condition isn't met. stopifnot() takes logical expressions as arguments and if any are FALSE an error will occur.

    # Define troublesome x and y
    x <- c(NA, NA, NA)
    y <- c( 1, NA, NA, NA)

    both_na <- function(x, y) {
      # Add stopifnot() to check length of x and y
      stopifnot(length(x) == length(y))  

      sum(is.na(x) & is.na(y))
    }

    # Call both_na() on x and y
    both_na(x, y)
    
#### a.2) An informative error is even better
Using stop() instead of stopifnot() allows you to specify a more informative error message. Recall the general pattern for using stop() is:

    if (condition) {
      stop("Error", call. = FALSE)
    }  
    
Writing good error messages is an important part of writing a good function! We recommend your error tells the user what should be true, not what is false. For example, here a good error would be "x and y must have the same length", rather than the bad error "x and y don't have the same length"

    # Define troublesome x and y
    x <- c(NA, NA, NA)
    y <- c( 1, NA, NA, NA)

    both_na <- function(x, y) {
      # Replace condition with logical 
      if (length(x) != length(y)) {
        # Replace "Error" with better message
        stop("x and y must have the same length", call. = FALSE)
      }  

      sum(is.na(x) & is.na(y))
    }

    # Call both_na() 
    both_na(x, y)
    
### b) Unstable types
Type- incosistent: type of the return object depends on the input

#### b.1) sapply is another common culprit
sapply() is another common offender returning unstable types. The type of output returned from sapply() depends on the type of input.

Consider the following data frame and two calls to sapply():

    df <- data.frame(
      a = 1L,
      b = 1.5,
      y = Sys.time(),
      z = ordered(1)
    )

    A <- sapply(df[1:4], class) 
    B <- sapply(df[3:4], class)
    What type of objects will be A and B be?
    
#### b.2) Using purrr solves the problem
This unpredictable behaviour is a sign that you shouldn't rely on sapply() inside your own functions.

So, what do you do? Use alternate functions that are type consistent! And you already know a whole set: the map() functions in purrr.

In this example, when we call class() on the columns of the data frame we are expecting character output, so our function of choice should be: map_chr():

    df <- data.frame(
      a = 1L,
      b = 1.5,
      y = Sys.time(),
      z = ordered(1)
    )

    A <- map_chr(df[1:4], class) 
    B <- map_chr(df[3:4], class)
    
other example:

    # sapply calls
    A <- sapply(df[1:4], class) 
    B <- sapply(df[3:4], class)
    C <- sapply(df[1:2], class) 

    # Demonstrate type inconsistency
    str(A)
    str(B)
    str(C)

    # Use map() to define X, Y and Z
    X <- map(df[1:4], class) 
    Y <- map(df[3:4], class)
    Z <- map(df[1:2], class) 

    # Use str() to check type consistency
    str(X)
    str(Y)
    str(Z)

####   b.3) A type consistent solution
If we wrap our solution into a function, we can be confident that this function will always return a list because we've used a type consistent function, map():

    col_classes <- function(df) {
      map(df, class)
    }

But what if you wanted this function to always return a character string?

One option would be to decide what should happen if class() returns something longer than length 1. For example, we might simply take the first element of the vector returned by class().

    
### c) Non-standard evaluation (NSE)
they are functions wich dont´use the usual look up rules for variables.

#### c.1) Programming with NSE functions

Let's take a look at a function that uses the non-standard evaluation (NSE) function filter() from the dplyr package:

    big_x <- function(df, threshold) {
      dplyr::filter(df, x > threshold)
    }
    
This big_x() function attempts to return all rows in df where the x column exceeds a certain threshold. Let's get a feel for how it might be used.

#### c.2) When things go wrong
Now, let's see how this function might fail. There are two instances in which the non-standard evaluation of filter() could cause surprising results:

    The x column doesn't exist in df.
    There is a threshold column in df.
 
#### c.3 ) What to do?
To avoid the problems caused by non-standard evaluation functions, you could avoid using them. In our example, we could achieve the same results by using standard subsetting (i.e. []) instead of filter(). For more insight into dealing with NSE and how to write your own non-standard evaluation functions, we recommend reading Hadley's vignette on the topic. Also, programming with the NSE functions in dplyr will be easier in a future version.

If you do need to use non-standard evaluation functions, it's up to you to provide protection against the problem cases. That means you need to know what the problem cases are, to check for them, and to fail explicitly.

To see what that might look like, let's rewrite big_x() to fail for our problem cases.

### d. Hidden 

#### d.1) A hidden dependence
A classic example of a hidden dependence is the stringsAsFactors argument to the read.csv() function (and a few other data frame functions.)

When you see the following code, you don't know exactly what the result will be:

    pools <- read.csv("swimming_pools.csv")
    
That's because if the argument stringsAsFactors isn't specified, it inherits its value from getOption("stringsAsFactors"), a global option that a user may change.

Just to prove that this is the case, let's illustrate the problem, for example :

    # This is the default behavior
    options(stringsAsFactors = TRUE)

    # Read in the swimming_pools.csv to pools
    pools<-read.csv("swimming_pools.csv")

    # Examine the structure of pools
    str(pools)

    # Change the global stringsAsFactors option to FALSE
    options(stringsAsFactors = FALSE)

    # Read in the swimming_pools.csv to pools2
    pools2<-read.csv("swimming_pools.csv")

    # Examine the structure of pools2
    str(pools2)

#### d.2) Legitimate use of options
In general, you want to avoid having the return value of your own functions depend on any global options. That way, you and others can reason about your functions without needing to know the current state of the options.

It is, however, okay to have side effects of a function depend on global options. For example, the print() function uses getOption("digits") as the default for the digits argument. This gives users some control over how results are displayed, but doesn't change the underlying computation.

Let's take a look at an example function that uses a global default sensibly. The print.lm() function has the options digits with default max(3, getOption("digits") - 3).

    # Start with this
    options(digits = 8)

    # Fit a regression model
    fit <- lm(mpg ~ wt, data = mtcars)

    # Look at the summary of the model
    summary(fit)

    # Set the global digits option to 2
    options(digits = 2)

    # Take another look at the summary
    summary(fit)

#### e ) Wrap-up

