# Introduction to Writing Functions in R
link: https://www.datacamp.com/courses/introduction-to-function-writing-in-r

### Course Description
Being able to write your own functions makes your analyses more readable, with fewer errors, and more reusable from project to project. Function writing will increase your productivity more than any other skill! In this course you'll learn the basics of function writing, focusing on the arguments going into the function and the return values. You'll be writing useful data science functions, and using real-world data on Wyoming tourism, stock price/earnings ratios, and grain yields.

### Note how can Resizing plots in the R kernel for Jupyter notebooks
https://blog.revolutionanalytics.com/2015/09/resizing-plots-in-the-r-kernel-for-jupyter-notebooks.html

    library(repr)

    # Change plot size to 4 x 3
    options(repr.plot.width=4, repr.plot.height=3)
    
### Note2 Generate a table 

https://www.tablesgenerator.com/markdown_tables



### Note 3 - DataFrames


In [5]:
library(dplyr)
library(ggplot2)
library(readr)
library(tidyr)
library(datasets)
library(openintro)

library(readr)

#Snake River visits
snake_rivers<-readRDS("https://assets.datacamp.com/production/repositories/5028/datasets/a55843f83746968c7f118d82ed727db9c71e891f/snake_river_visits.rds")

#Standard & Poor 500 price/earnings ratios
price<-readRDS("https://assets.datacamp.com/production/repositories/5028/datasets/675d6348dbcc81bcb4bb9d25e827f5b63d034771/std_and_poor500_with_pe_2019-06-21.rds")

#NASS corn yields
corn<-readRDS("https://assets.datacamp.com/production/repositories/5028/datasets/495f9e5fa1ae333cd013568412df4e7c663c2192/nass.corn.rds")

#NASS wheat yields
wheat<-readRDS("https://assets.datacamp.com/production/repositories/5028/datasets/8dde0453e7b53c630546e5b9723ce279ac6e4901/nass.wheat.rds")

#NASS barley yields
barley<-readRDS("https://assets.datacamp.com/production/repositories/5028/datasets/a5eddd39a47abc8efbdb419c54882112dc28785b/nass.barley.rds")

#path_csv<-"https://assets.datacamp.com/production/course_6430/datasets/Pokemon.csv"
#df_pokemon<-read_csv(path_csv)

"cannot open compressed file 'https://assets.datacamp.com/production/repositories/5028/datasets/675d6348dbcc81bcb4bb9d25e827f5b63d034771/std_and_poor500_with_pe_2019-06-21.rds', probable reason 'Invalid argument'"

ERROR: Error in gzfile(file, "rb"): no se puede abrir la conexión


## 1) How to write a function
Learn why writing your own functions is useful, how to convert a script into a function, and what order you should include the arguments.

### 1.1) (video) Why you should use functions
Benefits of writing of functions

Functions eliminate repetition from your code, which

- can reduce your workload, and
- help avoid errors

Function also allow code reuse and sharing.
 
#### 1.1.1) Calling functions
One way to make your code more readable is to be careful about the order you pass arguments when you call functions, and whether you pass the arguments by position or by name.

`gold_medals`, a numeric vector of the number of gold medals won by each country in the 2016 Summer Olympics, is provided.

For convenience, the arguments of `median()` and `rank()` are displayed using `args()`. Setting `rank()`'s `na.last` argument to `"keep"` means "keep the rank of NA values as NA".

Best practice for calling functions is to include them in the order shown by `args()`, and to only name rare arguments.

**Exercise**
Rewrite the call to median(), following best practices

*Answer*

    # Look at the gold medals data
    # gold_medals

    # Note the arguments to median()
    args(median)

    # Rewrite this function call, following best practices
    median(gold_medals, na.rm = TRUE)

### 1.2) (video) Converting scripts into functions

all functions in R have the same structured that we will show here:

    Name_of_funciton <- Signature {
    the body 
    }

    my_fun <.. function(arg1, arg2) {
    # do something
    }

In R the last line inside the function is the result, so you don't need to assing the last value; it will automatically become the result.

#### 1.2.1) Your first function: tossing a coin
Time to write your first function! It's a really good idea when writing functions to start simple. You can always make a function more complicated later if it's really necessary, so let's not worry about arguments for now.

**Exercise**
- Simulate a single coin toss by using sample() to sample from `coin_sides` once.
- Write a template for your function, naming it toss_coin. The function should take no arguments.

*Answer*

In [2]:
coin_sides <- c("head", "tail")

# Sample from coin_sides once
#sample(coin_sides, size = 1)


# Your functions, from previous steps
toss_coin <- function() {
  coin_sides <- c("head", "tail")
  sample(coin_sides, 1)
}

# Call your function
toss_coin()

#### 1.2.2) Inputs to functions
Most functions require some sort of input to determine what to compute. The inputs to functions are called arguments. You specify them inside the parentheses after the word "function."

As mentioned in the video, the following exercises assume that you are using sample() to do random sampling.

**Exercise**

- Sample from coin_sides n_flips times with replacement
- Update the definition of toss_coin() to accept a single argument, n_flips. The function should sample coin_sides n_flips times with replacement. Remember to change the signature and the body.

*Answer*

In [None]:
coin_sides <- c("head", "tail")
n_flips <- 10

# Sample from coin_sides n_flips times with replacement
sample(coin_sides, n_flips, TRUE)

# Update the function to return n coin tosses
toss_coin <- function(n_flips) {
  coin_sides <- c("head", "tail")
  sample(coin_sides, n_flips, TRUE)
}

# Generate 10 coin tosses
toss_coin(10)

#### 1.2.3) Multiple inputs to functions
If a function should have more than one argument, list them in the function signature, separated by commas.

To solve this exercise, you need to know how to specify sampling weights to `sample()`. Set the prob argument to a numeric vector with the same length as `x`. Each value of prob is the probability of sampling the corresponding element of x, so their values add up to one. In the following example, each sample has a 20% chance of "bat", a 30% chance of "cat" and a 50% chance of "rat".

    sample(c("bat", "cat", "rat"), 10, replace = TRUE, prob = c(0.2, 0.3, 0.5))
    
**Exercise**
- Bias the coin by weighting the sampling. Specify the prob argument so that heads are sampled with probability p_head (and tails are sampled with probability 1 - p_head).
- Update the definition of toss_coin() so it accepts an argument, p_head, and weights the samples using the code you wrote in the previous step.
- Generate 10 coin tosses with an 80% chance of each head.

*Answer*    

In [3]:
coin_sides <- c("head", "tail")
n_flips <- 10
p_head <- 0.8

# Define a vector of weights
weights <- 1-p_head

# Update so that heads are sampled with prob p_head
sample(coin_sides, n_flips, replace = TRUE, c(p_head, weights))



# Update the function so heads have probability p_head
toss_coin <- function(n_flips, p_head) {
  coin_sides <- c("head", "tail")
  # Define a vector of weights
  weights <- c(p_head, 1 - p_head)
  # Modify the sampling to be weighted 
  sample(coin_sides, n_flips, replace = TRUE, prob = weights)
}

# Generate 10 coin tosses
toss_coin(10, p_head = 0.8)


#### 1.2.4) Renaming GLM
R's generalized linear regression function, `glm()`, suffers the same usability problems as `lm()`: its name is an acronym, and its formula and data arguments are in the wrong order.

To solve this exercise, you need to know two things about generalized linear regression:

- `glm()` formulas are specified like `lm()` formulas: response is on the left, and explanatory variables are added on the right.
- To model count data, set `glm()`'s `family` argument to `poisson`, making it a Poisson regression.

Here's you'll use data on the number of yearly visits to Snake River at Jackson Hole, Wyoming, `snake_river_visits`.

**Exercise**

- Run a generalized linear regression by calling glm(). Model n_visits vs. gender, income, and travel on the snake_river_visits dataset, setting the family to poisson.
- Define a function, run_poisson_regression(), to run a Poisson regression. This should take two arguments: data and formula, and call glm(), passing those arguments and setting family to poisson.
- Recreate the Poisson regression model from the first step, this time by calling your run_poisson_regression() function.

*Answer*

    # Run a generalized linear regression 
    glm(
      # Model no. of visits vs. gender, income, travel
       n_visits ~ gender + income + travel, 
      # Use the snake_river_visits dataset
      data = snake_river_visits, 
      # Make it a Poisson regression
      family = "poisson"
    )
    
    
    # Write a function to run a Poisson regression
    run_poisson_regression<- function (data, formula) {
    glm(
      # Model no. of visits vs. gender, income, travel
       formula, 
      # Use the snake_river_visits dataset
      data = data, 
      # Make it a Poisson regression
      family = "poisson"
    )  
    }
    
    # From previous step
    run_poisson_regression <- function(data, formula) {
      glm(formula, data, family = poisson)
    }

    # Re-run the Poisson regression, using your function
    model <- snake_river_visits %>%
      run_poisson_regression(n_visits ~ gender + income + travel)

    # Run this to see the predictions
    snake_river_explanatory %>%
      mutate(predicted_n_visits = predict(model, ., type = "response"))%>%
      arrange(desc(predicted_n_visits))
    

## 2) All about arguments
Learn how to set defaults for arguments, how to pass arguments between functions, and how to check that users specified arguments correctly.

### 2.1 (video) Default arguments
perhaps you noticed a problem with your 'toss coin' function, unless you regularly work with crooks, you will typically want to toss a fair coin that is, usually you want p_head to be zero-point-five, wouldn't it be great if you didn't have to specify it every time?.

It's very easy to specify, we onlye need to define it inside our arguments, here's an update to the function template from the previous chapter.

    my_fun<- function (data_arg1, data_arg2, detail_arg1 = default 1) {
    #Do something
    }

Data arguments don't take a default value, but detail arguments do, you aren't limited to numeric defaults (example boolean), you can even set a default to another argument, for example

    args(median)
    library(jsonlite)
    args(fromJSON)

There are two special cases of defaults worth mentioning: `NULL` defaults and `categorical` defaults.

NULL is seldom used in calculations so a NULL default wouldn't usually be useful, instead, by convention, a NULL default means that there is special handling of  the argument that is to complicated to include in the function signature (so you need to read the documentation)

The second type is the categorical variable, these use a two-step process for definif the default, first you list all choices as a character vector in the signature, then you can call `match.arg()` in the function body 

    args(prop.test)

then, inside the body of 'prop.test()' function there is a line that calls `match.arg()` on alternative and reassings it.

    alternative<-match.arg(alternative)

#### 2.1.1) Numeric defaults
`cut_by_quantile()` converts a numeric vector into a categorical variable where quantiles define the cut points. This is a useful function, but at the moment you have to specify five arguments to make it work. This is too much thinking and typing.

By specifying default arguments, you can make it easier to use. Let's start with `n`, which specifies how many categories to cut `x` into.

A numeric vector of the number of visits to Snake River is provided as `n_visits`.

Note the cut_by_quantile has:

- x : a numeric vector to cut
- n : the number of categories to cut x
- na.ra: shouls missing value be removeed?
- labels: character labels for the catergories
- interval_type: should ranges be open on the left or right?

In [9]:
x<-rnorm(n = 410, mean = 24.99024, sd = 46.62775)

# Set the default for n to 5
cut_by_quantile <- function(x, n, na.rm, labels, interval_type) {
  probs <- seq(0, 1, length.out = n + 1)
  qtiles <- quantile(x, probs, na.rm = na.rm, names = FALSE)
  right <- switch(interval_type, "(lo, hi]" = TRUE, "[lo, hi)" = FALSE)
  cut(x, qtiles, labels = labels, right = right, include.lowest = TRUE)
}

# Remove the n argument from the call
result<-cut_by_quantile(
  x, 
  n = 5, 
  na.rm = FALSE, 
  labels = c("very low", "low", "medium", "high", "very high"),
  interval_type = "(lo, hi]"
)

**Exercise**
- Update the definition of cut_by_quantile() so that the n argument defaults to 5.
- Remove the n argument from the call to cut_by_quantile().

**Note**: Nice numeric default setting! Remember to only set defaults for numeric detail arguments, not data arguments.    

#### 2.1.2) Logical defaults
`cut_by_quantile()` is now slightly easier to use, but you still always have to specify the `na.rm` argument. This removes missing values – it behaves the same as the `na.rm` argument to `mean()` or `sd()`.

Where functions have an argument for removing missing values, the best practice is to not remove them by default (in case you hadn't spotted that you had missing values). That means that the default for `na.rm` should be `FALSE`.

**Exercise**
- Update the definition of cut_by_quantile() so that the na.rm argument defaults to FALSE.
- Remove the na.rm argument from the call to cut_by_quantile()

#### 2.1.3) NULL defaults
The `cut()` function used by `cut_by_quantile()` can automatically provide sensible labels for each category. The code to generate these labels is pretty complicated, so rather than appearing in the function signature directly, its `labels` argument defaults to `NULL`, and the calculation details are shown on the `?cut` help page.

**Exercise**

- Update the definition of cut_by_quantile() so that the labels argument defaults to NULL.
- Remove the labels argument from the call to cut_by_quantile().

#### 2.1.4)  Categorical defaults
When cutting up a numeric vector, you need to worry about what happens if a value lands exactly on a boundary. You can either put this value into a category of the lower interval or the higher interval. That is, you can choose your intervals to include values at the top boundary but not the bottom (in mathematical terminology, "open on the left, closed on the right", or `(lo, hi])`. Or you can choose the opposite ("closed on the left, open on the right", or `[lo, hi)`). `cut_by_quantile()` should allow these two choices.

The pattern for categorical defaults is:

    function(cat_arg = c("choice1", "choice2")) {
      cat_arg <- match.arg(cat_arg)
    }
    
Free hint: In the console, type `head(rank)` to see the start of `rank()`'s definition, and look at the ties.method argument.

**Exercise**
- Update the signature of `cut_by_quantile()` so that the `interval_type` argument can be `"(lo, hi]"` or `"[lo, hi)"`. Note the space after each comma.
- Update the body of `cut_by_quantile()` to match the `interval_type` argument.
- Remove the `interval_type` argument from the call to `cut_by_quantile()`.

**Note** match.arg() handles throwing an error if the user types a value that wasn't specified

*Answer* 

2.1.1, 2.1.2, 2.1.3, 2.1.4

In [22]:
# 
cut_by_quantile <- function(x, n = 5, na.rm = FALSE, labels = NULL, 
                            interval_type = c("(lo, hi]","[lo, hi)")) {
  # Match the interval_type argument
  interval_type <- match.arg(interval_type)
  probs <- seq(0, 1, length.out = n + 1)
  qtiles <- quantile(x, probs, na.rm = na.rm, names = FALSE)
  right <- switch(interval_type, "(lo, hi]" = TRUE, "[lo, hi)" = FALSE)
  cut(x, qtiles, labels = labels, right = right, include.lowest = TRUE)
}

# 
head(cut_by_quantile(x))

### 2.2 (video) Passing arguments between functions
It's very straightforward, for example:

    cal_geom_mean<- function(x) {
    x %>% 
        log() %>% 
        mean() %>% 
        exp()
    }
    
but now what happen if want to define a parameter to handle  missing values, here we can define a parameter as `na.ra = FALSE` and  pass it to our function (Note: a conventional way to manage missing values in R is with `na.ra`)    

    cal_geom_mean<- function(x, na.ra = FALSE) {
    x %>% 
        log() %>% 
        mean(na.ra = na.ra) %>% 
        exp()
    }
    
The ellipsis or `...` argument allows you to simplify your code a little bit instead of explecity naming `na.ra` ans setting a default and passing it to mean you just say "accept any other arguments into cal_geom_mean, then pass them to mean" 

    cal_geom_mean<- function(x, ...) {
    x %>% 
        log() %>% 
        mean(...) %>% 
        exp()
    }

#### 2.2.1) Harmonic mean
The harmonic mean is the reciprocal of the arithmetic mean of the reciprocal of the data. That is:

    harmonic_mean(x)=1/arithmetic_mean(1/x)
    
The harmonic mean is often used to average ratio data. You'll be using it on the price/earnings ratio of stocks in the Standard and Poor's 500 index, provided as `std_and_poor500`. Price/earnings ratio is a measure of how expensive a stock is.

The dplyr package is loaded.

**Exercise**

- Look at `std_and_poor500` (you'll need this later). Write a function, `get_reciprocal`, to get the reciprocal of an input `x`. Its only argument should be `x`, and it should return one over `x`.
- Write a function, calc_harmonic_mean(), that calculates the harmonic mean of its only input, x.
- Using std_and_poor500, group by sector, and summarize to calculate the harmonic mean of the price/earning ratios in the pe_ratio column.

*Answer*


In [1]:
library(dplyr)

# Look at the Standard and Poor 500 data
#glimpse(std_and_poor500)

# From previous steps
get_reciprocal <- function(x) {
  1 / x
}

calc_harmonic_mean <- function(x) {
  x %>%
    get_reciprocal() %>%
    mean() %>%
    get_reciprocal()
}

#std_and_poor500 %>% 
#  # Group by sector
#  group_by(sector) %>% 
#  # Summarize, calculating harmonic mean of P/E ratio
#  summarize(hmean_pe_ratio = calc_harmonic_mean(pe_ratio))


"package 'dplyr' was built under R version 3.5.3"
Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union



#### 2.2.2) Dealing with missing values
In the last exercise, many sectors had an NA value for the harmonic mean. It would be useful for your function to be able to remove missing values before calculating.

Rather than writing your own code for this, you can outsource this functionality to `mean()`.

The dplyr package is loaded.

**Exercise**
- Modify the signature and body of calc_harmonic_mean() so it has an na.rm argument, defaulting to false, that gets passed to mean().
- Using std_and_poor500, group by sector, and summarize to calculate the harmonic mean of the price/earning ratios in the pe_ratio column, removing missing values.

*Answer*

In [None]:
# From previous step
calc_harmonic_mean <- function(x, na.rm = FALSE) {
  x %>%
    get_reciprocal() %>%
    mean(na.rm = na.rm) %>%
    get_reciprocal()
}

#std_and_poor500 %>% 
#  # Group by sector
#  group_by(sector) %>% 
#  # Summarize, calculating harmonic mean of P/E ratio
#  summarize(hmean_pe_ratio = calc_harmonic_mean(pe_ratio, na.rm = TRUE))

#### 2.2.3) Passing arguments with ...
Rather than explicitly giving `calc_harmonic_mean()` and `na.rm` argument, you can use `...` to simply "pass other arguments" to `mean()`.

The dplyr package is loaded.

**Exercise**
- Replace the na.rm argument with ... in the signature and body of calc_harmonic_mean().
- Using std_and_poor500, group by sector, and summarize to calculate the harmonic mean of the price/earning ratios in the pe_ratio column, removing missing values.

*Answer*

    calc_harmonic_mean <- function(x, ...) {
      x %>%
        get_reciprocal() %>%
        mean(...) %>%
        get_reciprocal()
    }

    std_and_poor500 %>% 
      # Group by sector
      group_by(sector) %>% 
      # Summarize, calculating harmonic mean of P/E ratio
      summarize(hmean_pe_ratio =  calc_harmonic_mean(pe_ratio, na.rm = TRUE))

### 2.3) (video) Checking arguments
when something fail we have two possibilities the user or the programmer, we can handle it verifying the type of inputs with a simple if else and it's called "assertions"  , however it's boring, fortunately R has many fine packages for writing assertions, here we will use the `assertive` package.

The assertive package contains over seventy checks on variables types, from common types:

- assert_is_numeric()
- assert_is_character()
- is_data.frame()
- ...
- is_two_sided_formula()
- is_tskernel()

#### 2.3.1) Throwing errors with bad arguments
If a user provides a bad input to a function, the best course of action is to throw an error letting them know. The two rules are

- Throw the error message as soon as you realize there is a problem (typically at the start of the function).
- Make the error message easily understandable.

You can use the `assert_*()` functions from `assertive` to check inputs and throw errors when they fail.

**Exercise**
- Add a line to the body of calc_harmonic_mean() to assert that x is numeric.
- Look at what happens when you pass a character argument to calc_harmonic_mean().

*Answer*

#### 2.3.2) Fixing function arguments
The harmonic mean function is almost complete. However, you still need to provide some checks on the `na.rm` argument. This time, rather than throwing errors when the input is in an incorrect form, you are going to try to fix it.

`na.rm` should be a logical vector with one element (that is, TRUE, or FALSE).

The `assertive` package is loaded for you.

In [2]:
library(assertive)


calc_harmonic_mean <- function(x, na.rm = FALSE) {
  # Assert that x is numeric
  assert_is_numeric(x)
  x %>%
    get_reciprocal() %>%
    mean(na.rm = na.rm) %>%
    get_reciprocal()
}

# See what happens when you pass it strings
#calc_harmonic_mean(std_and_poor500$sector)


calc_harmonic_mean(c("Luis,","Meza"))


"package 'assertive' was built under R version 3.5.3"

ERROR: Error in calc_harmonic_mean(c("Luis,", "Meza")): is_numeric : x is not of class 'numeric'; it has class 'character'.


#### 2.3.2) Custom error logic
Sometimes the `assert_*()` functions in `assertive` don't give the most informative error message. For example, the assertions that check if a number is in a numeric range will tell the user that a value is out of range, but the won't say why that's a problem. In that case, you can use the `is_*()` functions in conjunction with messages, warnings, or errors to define custom feedback.

The harmonic mean only makes sense when `x` has all positive values. (Try calculating the harmonic mean of one and minus one to see why.) Make sure your users know this!

**Exercise**
- If any values of `x` are non-positive (ignoring `NA`s) then throw an error.
- Look at what happens when you pass a character argument to `calc_harmonic_mean()`.

*Answer*

In [None]:
calc_harmonic_mean <- function(x, na.rm = FALSE) {
  assert_is_numeric(x)
  # Check if any values of x are non-positive
  if(any(is_non_positive(x), na.rm = TRUE)) {
    # Throw an error
    stop("x contains non-positive values, so the harmonic mean makes no sense.")
  }
  x %>%
    get_reciprocal() %>%
    mean(na.rm = na.rm) %>%
    get_reciprocal()
}

# See what happens when you pass it negative numbers
# calc_harmonic_mean(std_and_poor500$pe_ratio - 20)

#### 2.3.3) Fixing function arguments
The harmonic mean function is almost complete. However, you still need to provide some checks on the `na.rm` argument. This time, rather than throwing errors when the input is in an incorrect form, you are going to try to fix it.

`na.rm` should be a logical vector with one element (that is, TRUE, or FALSE).

The assertive package is loaded for you.

**Exercise**

- Update calc_harmonic_mean() to fix the na.rm argument. Use use_first() to select the first element, and coerce_to() to change it to logical.

*Answer*

In [None]:
# Update the function definition to fix the na.rm argument
calc_harmonic_mean <- function(x, na.rm = FALSE) {
  assert_is_numeric(x)
  if(any(is_non_positive(x), na.rm = TRUE)) {
    stop("x contains non-positive values, so the harmonic mean makes no sense.")
  }
  # Use the first value of na.rm, and coerce to logical
  na.rm <- coerce_to(use_first(na.rm), target_class = "logical")
  x %>%
    get_reciprocal() %>%
    mean(na.rm = na.rm) %>%
    get_reciprocal()
}

# See what happens when you pass it malformed na.rm
# calc_harmonic_mean(std_and_poor500$pe_ratio, na.rm = 1:5)

## 3) Return values and scope
Learn how to return early from a function, how to return multiple values, and understand how R decides which variables exist.

### 3.1) (video) Returning values from functions
the value that is returned from a function is the last value that was calculated when the end of the function body was reached, sometimes it´s useful return it early, you can do it simple, only usigin the function `return()`, however when you are handling errors it's possible that you think that it is the last value i.e our result, instead we can handle it as `warning`.

#### 3.1.1) Returning early
Sometimes, you don't need to run through the whole body of a function to get the answer. In that case you can return early from that function using `return()`.

To check if `x` is divisible by `n`, you can use `is_divisible_by(x, n)` from assertive.

Alternatively, use the modulo operator, `%%`. `x %% n` gives the remainder when dividing `x` by `n`, so `x %% n == 0` determines whether `x` is divisible by `n`. Try 1:10 %% 3 == 0 in the console.

To solve this exercise, you need to know that a leap year is every 400th year (like the year 2000) or every 4th year that isn't a century (like 1904 but not 1900 or 1905).

assertive is loaded.

**Exercise**

- Complete the definition of is_leap_year(), checking for the cases of year being divisible by 400, then 100, then 4, returning early from the function in each case.

*Answer*


In [None]:
library(assertive)

In [2]:
is_leap_year <- function(year) {
  # If year is div. by 400 return TRUE
  if(is_divisible_by(year, 400)) {
    return(TRUE)
  }
  # If year is div. by 100 return FALSE
  if(is_divisible_by(year, 100)) {
    return(FALSE)
  }  
  # If year is div. by 4 return TRUE
  if(is_divisible_by(year, 4)) {
    return(TRUE)
  }
  # Otherwise return FALSE
  FALSE
}

is_leap_year(2000)

"package 'assertive' was built under R version 3.5.3"

#### 3.1.2) Returning invisibly
When the main purpose of a function is to generate output, like drawing a plot or printing something in the console, you may not want a return value to be printed as well. In that case, the value should be `invisibly` returned.

The base R plot function returns `NULL`, since its main purpose is to draw a plot. This isn't helpful if you want to use it in piped code: instead it should invisibly return the plot data to be piped on to the next step.

Recall that `plot()` has a formula interface: instead of giving it vectors for `x` and `y`, you can specify a formula describing which columns of a data frame go on the `x` and `y` axes, and a data argument for the data frame. Note that just like `lm()`, the arguments are the wrong way round because the detail argument, formula, comes before the data argument.

    plot(y ~ x, data = data)

**Exercise**
- Use the cars dataset and the formula interface to plot(), draw a scatter plot of dist versus speed.
- Give pipeable_plot() data and formula arguments (in that order). Make it draw the plot, then invisibly return data.

*Answer*

    # Define a pipeable plot fn with data and formula args
    pipeable_plot <- function(data, formula) {
      # Call plot() with the formula interface
      plot(formula, data)
      # Invisibly return the input dataset
      invisible(data)
    }

    # Draw the scatter plot of dist vs. speed again
    plt_dist_vs_speed <- cars %>% 
      pipeable_plot(dist ~ speed)

    # Now the plot object has a value
    plt_dist_vs_speed

### 3.2) (video) Returning multiple values from functions
R functions can only return a single value, but there are two ways to get around this rule.
- return several objects in a list
- store object as attributes

let's write a function to retrieve details about the current R session, so to do this we can writing a list

In [3]:
#R.version.string
#Sys.info()[c("sysname","release")]
#loadedNamespaces()

session<-function(){
    list(
    r_version = R.version.string,
    operating_system = Sys.info()[c("sysname","release")],
    loaded_pkgs = loadedNamespaces()
    )
}

session()

suppose the user actually want each of the return values separately, instead of in the list, they can do this using the `zeallot` package's multi-assignment operator, in python, this is called unpacking variables.

The multiple-assignment operator is the usual assignment left arrow, but wrapped in percent signs, now we are three variables available cointaining the individual vales.

In [4]:
library(zeallot) # multiple, unpacking and destructing assignment
c(vrsn, os, pkgs) %<-% session()

vrsn

os

pkgs

"package 'zeallot' was built under R version 3.5.3"

the other technique for returning multiple values involves attributes, you can use `attr` function can be used to retrieve a specific attribute or `attributes` to se all attributes, but with `attr` can also be use to set attributes

In [None]:
month_no<-setNames(1:12, month.abb)
str(month_no)
# we can see all attributes
attributes(month_no)

# or can see a specific attributes and same time assign it 
attr(month_no, "names") <-month.name

# thank to the previous step we reassing the value of attributes
attr(month_no, "names")

Now, we will see other example by this times with a data frame, so first we need load some packages so:

In [13]:
#data(Orange, package = "datasets")
library(tibble)
library(datasets)
data(Orange)

Naranja <-Orange
Naranja<-as_tibble(Naranja)
attributes(Naranja)

#Note how if we apply a group by into our data, this is adding as attributes
library(dplyr)
Naranja %>%
group_by(Tree) %>%
attributes()




$names
[1] "Tree"          "age"           "circumference"

$row.names
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27 28 29 30 31 32 33 34 35

$formula
circumference ~ age | Tree
<environment: R_EmptyEnv>

$labels
$labels$x
[1] "Time since December 31, 1968"

$labels$y
[1] "Trunk circumference"


$units
$units$x
[1] "(days)"

$units$y
[1] "(mm)"


$class
[1] "tbl_df"     "tbl"        "data.frame"


$names
[1] "Tree"          "age"           "circumference"

$row.names
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27 28 29 30 31 32 33 34 35

$formula
circumference ~ age | Tree
<environment: R_EmptyEnv>

$labels
$labels$x
[1] "Time since December 31, 1968"

$labels$y
[1] "Trunk circumference"


$units
$units$x
[1] "(days)"

$units$y
[1] "(mm)"


$class
[1] "grouped_df" "tbl_df"     "tbl"        "data.frame"

$groups
# A tibble: 5 x 2
  Tree  .rows    
  <ord> <list>   
1 3     <int [7]>
2 1     <int [7]>
3 5     <int [7]>
4 2     <int [7]>
5 4     <int [7]>


#### 3.2.1 Returning many things
Functions can only return one value. If you want to return multiple things, then you can store them all in a list.

If users want to have the list items as separate variables, they can assign each list element to its own variable using `zeallot`'s multi-assignment operator, `%<-%`.

`glance(), tidy(), and augment()` each take the model object as their only argument.

The Poisson regression model of Snake River visits is available as `model`. `broom` and `zeallot` are loaded.

**Exercise**
- Examine the structure of model.
- Use broom functions to create a list containing the model-, coefficient-, and observation-level parts of model.
- Wrap the code into a function, groom_model(), that accepts model as its only argument.
- Call groom_model() on model, multi-assigning the result to three variables at once: mdl, cff, and obs.

*Answer*

    # Look at the structure of model (it's a mess!)
    str(model)

    # Use broom tools to get a list of 3 data frames
    list(
      # Get model-level values
      model = glance(model),
      # Get coefficient-level values
      coefficients = tidy(model),
      # Get observation-level values
      observations = augment(model)
    )
    
    
        # Wrap this code into a function, groom_model
    groom_model<-function(model) {
      list(
        model = glance(model),
        coefficients = tidy(model),
        observations = augment(model)
      )
    }
    
    
        # Call groom_model on model, assigning to 3 variables
    #library(zeallot) # multiple, unpacking and destructing assignment
    c(mdl, cff, obs) %<-% groom_model(model)

    # See these individual variables
    mdl; cff; obs


#### 3.2.2) Returning metadata
Sometimes you want the return multiple things from a function, but you want the result to have a particular class (for example, a data frame or a numeric vector), so returning a list isn't appropriate. This is common when you have a result plus metadata about the result. (Metadata is "data about the data". For example, it could be the file a dataset was loaded from, or the username of the person who created the variable, or the number of iterations for an algorithm to converge.)

In that case, you can store the metadata in attributes. Recall the syntax for assigning attributes is as follows.

    attr(object, "attribute_name") <- attribute_value
    
**Exercise**
- Update pipeable_plot() so the result has an attribute named "formula" with the value of formula.
- plt_dist_vs_speed, that you previously created, is shown. Examine its updated structure.

*Answer*   



    pipeable_plot <- function(data, formula) {
      plot(formula, data)
      # Add a "formula" attribute to data
       attr(data, "formula")<- formula
      invisible(data)
    }

    # From previous exercise
    plt_dist_vs_speed <- cars %>% 
      pipeable_plot(dist ~ speed)

    # Examine the structure of the result
    str(plt_dist_vs_speed)

### 3.3) (video) Enviroments
environments are a type of variable that is used to store other variables, most of the time, you can think of it as special lists, the main difference between list and environments is that the latter have a parent (Note: the parent of the enviroment also has a parent, and so they form a sequence), you can find 

In [20]:
#we define a list
datacamp_lst<-list(
name = "DataCamp", 
founding_year = 2013,
website = "www.datacamp.com")

#now we will create a enviroment
datacamp_env<-list2env(datacamp_lst)

#and now we will see if at first glance, we can see the differences 
ls.str(datacamp_lst)
ls.str(datacamp_env)

#to see the parent
parent<-parent.env(datacamp_env)
environmentName(parent)

#after that we can see the grandparent
grandparent<-parent.env(parent)
environmentName(grandparent)

founding_year :  num 2013
name :  chr "DataCamp"
website :  chr "www.datacamp.com"

founding_year :  num 2013
name :  chr "DataCamp"
website :  chr "www.datacamp.com"

#### 3.3.1) Creating and exploring environments
Environments are used to store other variables. Mostly, you can think of them as lists, but there's an important extra property that is relevant to writing functions. Every environment has a parent environment (except the empty environment, at the root of the environment tree). This determines which variables R know about at different places in your code.

Facts about the Republic of South Africa are contained in `capitals`, `national_parks`, and `population`.

**Exercise**

- Create rsa_lst, a named list from capitals, national_parks, and population. Use those values as the names.
- List the structure of each element of rsa_lst using ls.str().
- Convert the list to an environment, rsa_env, using list2env().
- List the structure of each element of rsa_env
- Find the parent environment of rsa_env and print its name.

*Answer*
    # Add capitals, national_parks, & population to a named list
    rsa_lst <- list(
      capitals = capitals,
      national_parks = national_parks,
      population = population
    )

    # List the structure of each element of rsa_lst
    ls.str(rsa_lst)


    # Convert the list to an environment
    rsa_env <- list2env(rsa_lst)

    # List the structure of each variable
    ls.str(rsa_env)
    
    # Find the parent environment of rsa_env
    parent <- parent.env(rsa_env)

    # Print its name
    environmentName(parent)

#### 3.3.2) Do variables exist?
If R cannot find a variable in the current environment, it will look in the parent environment, then the grandparent environment, and so on until it finds it.

`rsa_env` has been modified so it includes capitals and national_parks, but not population.

**Exercise**
- Check if population exists in rsa_env, using default inheritance rules.
- Check if population exists in rsa_env, ignoring inheritance.

*Answer*

    # Compare the contents of the global environment and rsa_env
    ls.str(globalenv())
    ls.str(rsa_env)

    # Does population exist in rsa_env?
    exists("population", envir = rsa_env)

    # Does population exist in rsa_env, ignoring inheritance?
    exists("population", envir = rsa_env, inherits = FALSE)

In [15]:
?list2env

### 3.4) (video) Scope and Procedence

## 4) Case study on grain yields
Apply your function writing skills to a case study involving data preparation, visualization, and modeling.

### 4.1 (video) Grain yields and unit conversion
The `magrittr` packages has some function that raplace arithmetic and subsetting operators, in order to make your code more pipe friendl as:

    x*y = x %>% multiple_by(y)
    x^y = x %>% raise_to_power(y)
    x[y] = x %>% extract(y)
    

#### 4.1.1 Converting areas to metric 1
In this chapter, you'll be working with grain yield data from the United States Department of Agriculture, National Agricultural Statistics Service. Unfortunately, they report all areas in acres. So, the first thing you need to do is write some utility functions to convert areas in acres to areas in hectares.

To solve this exercise, you need to know the following:

    There are 4840 square yards in an acre.
    There are 36 inches in a yard and one inch is 0.0254 meters.
    There are 10000 square meters in a hectare.
    
**Exercise**

- Write a function, acres_to_sq_yards(), to convert areas in acres to areas in square yards. This should take a single argument, acres.
- Write a function, yards_to_meters(), to convert distances in yards to distances in meters. This should take a single argument, yards.
- Write a function, sq_meters_to_hectares(), to convert areas in square meters to areas in hectares. This should take a single argument, sq_meters.


*Answer*    

In [2]:
acres_to_sq_yards<-function(acres) {
    acres*4840
}

yards_to_meters<-function(yards){
    yards*36*.0254
}

sq_meters_to_hectares<- function(sq_meters) {
    sq_meters/10000
}


#### 4.1.2) Converting areas to metric 2
You're almost there with creating a function to convert acres to hectares. You need another utility function to deal with getting from square yards to square meters. Then, you can bring everything together to write the overall acres-to-hectares conversion function. Finally, in the next exercise you'll be calculating area conversions in the denominator of a ratio, so you'll need a harmonic acre-to-hectare conversion function.

Free hints: magrittr's raise_to_power() will be useful here. The last step is similar to Chapter 2's Harmonic Mean.

The three utility functions from the last exercise (acres_to_sq_yards(), yards_to_meters(), and sq_meters_to_hectares()) are available, as is your get_reciprocal() from Chapter 2. magrittr is loaded.

**Exercise**
- Write a function to convert distance in square yards to square meters. It should take the square root of the input, then convert yards to meters, then square the result.
- Write a function to convert areas in acres to hectares. The function should convert the input from acres to square yards, then to square meters, then to hectares.
- Write a function to harmonically convert areas in acres to hectares. The function should get the reciprocal of the input, then convert from acres to hectares, then get the reciprocal again.

*Answer*

In [5]:
library(magrittr)
get_reciprocal <- function(x) {
  1 / x
}

# Write a function to convert sq. yards to sq. meters
sq_yards_to_sq_meters <- function(sq_yards) {
  sq_yards %>%
    # Take the square root
    sqrt() %>%
    # Convert yards to meters
    yards_to_meters() %>%
    # Square it
    raise_to_power(2)
}


# Load the function from the previous step

# Write a function to convert acres to hectares
acres_to_hectares <- function(acres) {
  acres %>%
    # Convert acres to sq yards
    acres_to_sq_yards() %>%
    # Convert sq yards to sq meters
    sq_yards_to_sq_meters()  %>% 
    # Convert sq meters to hectares
    sq_meters_to_hectares()
}


# Define a harmonic acres to hectares function
harmonic_acres_to_hectares <- function(acres) {
  acres %>% 
    # Get the reciprocal
    get_reciprocal() %>%
    # Convert acres to hectares
    acres_to_hectares() %>% 
    # Get the reciprocal again
    get_reciprocal()
}

#### 4.1.3) Converting yields to metric
The yields in the NASS corn data are also given in US units, namely bushels per acre. You'll need to write some more utility functions to convert this unit to the metric unit of kg per hectare.

Bushels historically meant a volume of 8 gallons, but in the context of grain, they are now defined as masses. This mass differs for each grain! To solve this exercise, you need to know these facts.

    One pound (lb) is 0.45359237 kilograms (kg).
    One bushel is 48 lbs of barley, 56 lbs of corn, or 60 lbs of wheat.
    magrittr is loaded.
    
**Exercise**
- Write a function to convert masses in lb to kg. This should take a single argument, lbs. 
- Write a function to convert masses in bushels to lbs. This should take two arguments, bushels and crop. It should define a lookup vector of scale factors for each crop (barley, corn, wheat), extract the scale factor for the crop, then multiply this by the number of bushels.
- Write a function to convert masses in bushels to kgs. This should take two arguments, bushels and crop. It should convert the mass in bushels to lbs then to kgs.
- Write a function to convert yields in bushels/acre to kg/ha. The arguments should be bushels_per_acre and crop. Three choices of crop should be allowed: "barley", "corn", and "wheat". It should match the crop argument, then convert bushels to kgs, then convert harmonic acres to hectares.


*Answer*

In [None]:
# Write a function to convert lb to kg
lbs_to_kgs <- function(lbs){
  lbs*.45359237
}

# Write a function to convert bushels to lbs
bushels_to_lbs <- function(bushels, crop) {
  # Define a lookup table of scale factors
  c(barley = 48, corn = 56, wheat = 60) %>%
    # Extract the value for the crop
    extract(crop) %>%
    # Multiply by the no. of bushels
    multiply_by(bushels)
}


bushels_to_kgs <- function(bushels,crop) {
  bushels %>%
    # Convert bushels to lbs for this crop
    bushels_to_lbs(crop) %>%
    # Convert lbs to kgs
    lbs_to_kgs()
}


# Write a function to convert bushels/acre to kg/ha
bushels_per_acre_to_kgs_per_hectare <- function(bushels_per_acre, crop = c("barley", "corn", "wheat")) {
  # Match the crop argument
  crop <- match.arg(crop)
  bushels_per_acre %>%
    # Convert bushels to kgs for this crop
    bushels_to_kgs(crop) %>%
    # Convert harmonic acres to ha
    harmonic_acres_to_hectares()

In [None]:

library(repr)
# Change plot size to 4 x 3
    options(repr.plot.width=4, repr.plot.height=3)

**Exercise**

*Answer*