# A quick refresher

## Writing a function in R

### Functions fundamentals

* Defining your own function

```
my_fun <- function(arg1, arg2) {
    body  
}
```

```
add <- function(x, y = 1) {
    x + y
}
```

* Anatomy of a function

    - formals
    - body
    - environment : where it's defined

In [1]:
add <- function(x, y = 1) {
    x + y
}

formals(add)

body(add)

environment(add)

$x


$y
[1] 1


{
    x + y
}

<environment: R_GlobalEnv>

* Output: return value

```
f <- function(x) {
    if (x < 0) {
        -x
    } else {
        x
    }
}
```

    - The last expression evaluated in a function is the return value
    - return(value) forces the function to stop execution and return value
    
* Functions are objects
    - mean2 <- mean : mean2 will do exactly the same job that mean does
    - function(x) { x + 1 }
    - (function(x) { x + 1})(2) : An example of ananymous function

In [2]:
(function(x) { x + 1})(2)

### Summary

* Thre parts of a function:
    - Arguments
    - Body
    - Environment
* Return value is the last executed expression or the first executed return() statement
* Functions can be treated like usual R objects

## Writing a function

In [3]:
# Define ratio() function
# my_fun <- function(arg1, arg2) {
#   # body
# }
ratio <- function(x, y) {
  x / y
}

# Call ratio() with arguments 3 and 4
ratio(3, 4)

## Arguments

In [4]:
# Rewrite the call to follow best practices
# mean(0.1,x=c(1:9, NA),TRUE)
mean(c(1:9, NA),trim = 0.1, na.rm = TRUE)

## Environments

### Scoping describes how R looks up values by name

### If a name isn't defined inside a function, R will look one level up

```
x <- 2

g <- function() {
    y <- 1
    c(x, y)
}
```

Here, x is not defined inside the function g and in this case, R looks for a variable x outside of the function g.

### If a name isn't defined locally, or at a higher level, an error occurs

### Scoping describes where, not when, to look for a value

```
f <- function() x

x <- 15
f() # Will return 15

x <- 20
f() # Will return 20
```

### Lookup works the same for functions

```
l <- function(x) x + 1

m <- function() {
    l <- function(x) x * 2
    l(10)
}

m() # Will return 20
```

```
c <- 10
c(c = c) # Here, R can differentiate a function c and a variable c
```

### Each call to a function has its own clean environment

```
j <- function() {
    if (!exists("a")) {
        a <- 1
    } else {
        a <- a + 1
    }
    print(a)
}
```

* Any local variables created in a function are never available in the global environment.


## Summary

* When you call a function, a new environment is made for the function to do its work.
* The new environment is populated with the argument values.
* Objects are looked for first in this environment.
* If they are not found, they are looked for in the environment that the function was created in.

## Data structures

### Two types of vectors in R

* Atomic vectors of six types: logical, integer, double, character, complex, and raw.
* Lists, a.k.a recursive vectors, because lists can contain other lists.
* Atomic vectors are hemogeneous, lists can be heterogeneous.

### Every vector has two key properties

```
# Its type, find with typeof()
typeof(letters)

# Its length, find with length()
length(letters)
```

### Missing values

* NULL : often used to indicate the absence of a vector. Its type is NULL and length is 0.
* NA : used to indicate the absence of a value in a vector, a.k.a a missing value. Its type is logical and length is 1.

### NAs inside vectors

```
x <- c(1, 2, 3, NA, 5)

is.na(x) # Will return FALSE FLASE FALSE TRUE FALSE
```

### Missing values are contagious

* Any mathematical operation with NAs will return NA.

```
NA + 10

NA / 2

NA > 5

10 == NA

NA == NA
```

### Lists

* Useful because they can contain heterogeneous objects.
* Complicated return objects are ofen lists, i.e. from lm()
* Created with list()
* Subset with [, [[ or \\$
    - [ extracts a sublist.
    - [[ and \\$ extract elements, remove a level of hierarchy.
    
### Subsetting lists

```
a <- list(
    a = 1:3,
    b = "a string",
    c = pi,
    d = list(-1, -5)
)

str(a[4])
str(a[[4]])
```

## Subsetting lists

In [7]:
# Define a variable tricky-list
tricky_list <- list(nums = c(-0.99, -0.30, -0.31, 0.83, -1.66,
                             -0.97, 1.96, -0.38, -0.91, -1.35),
                   y = c(F, F, F, F, F, T, T, T, T, T),
                   x = list("hello!", "hi!", "goodbye!", "bye!"),
                   model = lm(mpg ~ wt, data = mtcars))

# What type of object is the...
# 2nd element in tricky_list
typeof(tricky_list[[2]])

# Element called x in tricky_list
typeof(tricky_list[["x"]])

# 2nd element inside the element called x in tricky_list
typeof(tricky_list[["x"]][[2]])

## Exploring lists

In [8]:
# Guess where the regression model is stored
names(tricky_list)

# Use names() and str() on the model element
names(tricky_list[["model"]])
str(tricky_list[["model"]])

# Subset the coefficients element
tricky_list[["model"]][["coefficients"]]

# Subset the wt element
tricky_list[["model"]][["coefficients"]][["wt"]]

List of 12
 $ coefficients : Named num [1:2] 37.29 -5.34
  ..- attr(*, "names")= chr [1:2] "(Intercept)" "wt"
 $ residuals    : Named num [1:32] -2.28 -0.92 -2.09 1.3 -0.2 ...
  ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
 $ effects      : Named num [1:32] -113.65 -29.116 -1.661 1.631 0.111 ...
  ..- attr(*, "names")= chr [1:32] "(Intercept)" "wt" "" "" ...
 $ rank         : int 2
 $ fitted.values: Named num [1:32] 23.3 21.9 24.9 20.1 18.9 ...
  ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
 $ assign       : int [1:2] 0 1
 $ qr           :List of 5
  ..$ qr   : num [1:32, 1:2] -5.657 0.177 0.177 0.177 0.177 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
  .. .. ..$ : chr [1:2] "(Intercept)" "wt"
  .. ..- attr(*, "assign")= int [1:2] 0 1
  ..$ qraux: num [1:2] 1.18 1.05
  ..$ pivot: int [1:2] 1 2
  ..$ tol 

## for loops

### for loops in R

```
primes_list <- list(2, 3, 5, 7, 11, 13)

for (i in 1:length(primes_list)) {
    print(primes_list[[i]])
}
```

### Parts of a for loop

* Sequence : for (**i in 1:length(primes_list)**)
* Body : The part within {} and describes the operations to iterate, referring back to our index i. 
* Output : Prints the output in the console rather than saving it.


### Looping over columns in a data frame

```
df <- data.frame(a = rnorm(10),
                 b = rnorm(10),
                 c = rnorm(10),
                 d = rnorm(10))

for (i in 1:ncol(df)) {
    print(median(df[[i]]))
}                 
```

### Moving forward

* A safer way to generate the sequence using seq_along()
* Saving output instead of printing it

## A safer way to create the sequence

In [15]:
# Define df with random values from normal distribution
df <- data.frame(a = rnorm(10),
                n = rnorm(10),
                c = rnorm(10),
                d = rnorm(10))

# Replace the 1:ncol(df) sequence
# for (i in 1:ncol(df)) {
#   print(median(df[[i]]))
# }
for (i in seq_along(df)) {
  print(median(df[[i]]))
}

[1] 0.4254134
[1] 0.05132088
[1] -0.08057442
[1] 0.8371726


In [16]:
# Change the value of df
df <- data.frame()

# Repeat for loop to verify there is no error
for (i in seq_along(df)) {
  print(median(df[[i]]))
}

## Keeping output

In [19]:
# Redefine df
df <- data.frame(a = rnorm(10),
                n = rnorm(10),
                c = rnorm(10),
                d = rnorm(10))


# Create new double vector: output
output <- vector(mode = "double", length = ncol(df))

# Alter the loop
for (i in seq_along(df)) {
  # Change code to store result in output
  # print(median(df[[i]]))
  output[i] <- median(df[[i]])
}

# Print output
print(output)

[1]  0.1743221 -0.3786583 -0.2917193 -0.3254997
