# Intro to R Review

### August Guang
### Ellen Duong
### 2024-09-16

## Structure

We will be reviewing through some (not all) of the material again, starting from R Data Structures and giving everyone more time to practice actually writing their own code vs executing. At the end of each review section there will be a set of coding practice questions to answer - you are welcome to work in pairs, but you can work on your own if you need to. All questions we will work on in class.

# 1. R Data Structures

 * We went over:
   * `vector`
   * `matrix`
   * `list`
   * `dataframe`

## 1.1 `vector`

 * `vector` type in R is a 1D array that holds data that is all the same type.
 * The `c()` function combines elements to form a vector

In [1]:
my_vec <- c(42, 137, 34)
my_vec

## 1.2 Indexing - accessing elements

There are multiple ways to access the elements of a vector (or any other data structure).

 * The `[]` notation is what is used for access.

In [18]:
people <- c("august", "ellen", "paul")
people[1]

### 1.2.1 Range-based indexing

Range-based indexing allows you to get multiple elements of a vector, or multiple "slices" (i.e. contiguous elements)

In [19]:
people[2:3]

### 1.2.2 Vector indexing

We can use a vector of integers for indexing when we want multiple elements that are not all one after another

In [20]:
people[c(1,3)]

### 1.2.3 Logical (boolean) indexing [new]

We can use logical (or boolean) values to index as well

In [21]:
people[c(TRUE,FALSE,FALSE)]

### 1.2.4 Modifying elements with indexing [new]

We can use indexing to modify a vector's elements.

In [22]:
people[3] <- "not paul"
people

You can replace multiple elements all at once:

In [32]:
people[2:3] <- "no one"
people

In [33]:
people[2:3] <- c("ellen d", "perhaps paul")
people

### 1.2.5 Adding elements or concatenating vectors

You can add elements to a vector by specifying the index you want the element to be added at. R will autofill any indices without a present element at them as `NA`, or missing.

You can also concatenate vectors with the `c()` function.

In [23]:
people[6] <- "paul h"
people

In [25]:
people2 <- c(people, c("paul c", "paul s", "paul x"))
people2

## 1.3 `matrix`

* 2-dimensional array
* Has to store data of all the same types
* First argument is vector of data to be arranged as a matrix.
* Additional arguments:
    * You use `nrow` to specify the number of rows, and `ncol` to specify the number of columns. These are required arguments, but only 1 is needed.
    * The `byrow` argument will determine whether the values are arranged by row or by column. The default is by column.

In [10]:
m <- matrix(1:12, nrow=3) # specifying nrow as 3
m

0,1,2,3
1,4,7,10
2,5,8,11
3,6,9,12


In [37]:
m <- matrix(1:12, ncol=3) # can also have 3 columns instead
m

0,1,2
1,5,9
2,6,10
3,7,11
4,8,12


In [38]:
matrix(1:12, ncol=3, byrow=TRUE) # will set it so that it goes by row instead

0,1,2
1,2,3
4,5,6
7,8,9
10,11,12


### 1.3.1 Non-numeric data (new)

We didn't go over this last class, but matrices can store non-numeric data as well, they just all have to be the same type.

In [27]:
matrix(people2, nrow=3)

0,1,2
august,,paul c
ellen,,paul s
not paul,paul h,paul x


### 1.3.2 Matrix indexing

Matrix indexing works much like `vector`, except that now you choose the elements in each dimension with `[,]` notation. Additionally, you can specify a whole row or a whole column.

In [28]:
m[1,2] # first row, second column

In [29]:
m[2,] # just second row

## Review Questions

### 1. How would we make a vector with the following elements?

 * `dog`
 * `cat`
 * `bird`
 * `potato`

In [None]:
# code for review q
animal_vec <- 0 # replace 0 with what?

### 2. Matrix indexing

Suppose we have the matrix `m` below. How do we get the 2nd and 4th elements from the 3rd column? 

In [39]:
m <- matrix(1:12, ncol=3)
m

0,1,2
1,5,9
2,6,10
3,7,11
4,8,12


In [None]:
# your code here

Let's say we want to replace these 2 elements with 100 and 100, respectively. How would we do that?

In [40]:
# your code here

### 3. Lists and Dataframes

Last class we also went over `list` and `dataframe` as data structures.

 * How would we make a list with `dog`, `cat`, `bird`, `potato` instead of a vector?
 * How about a list with `dog`, `cat`, `bird`, `potato`, 1, 2, 3?
 * How would we make a dataframe with the same number of columns as `m` above?

In [None]:
# your code here

### 4. Indexing conceptuals

What is the result of indexing a vector with positive integers, negative integers (this was touched on briefly last week but if you've forgotten, create a vector and try it out), a logical vector, or a character vector?

### 5. Challenge Problem

Given a linear model, e.g., `mod <- lm(mpg ~ wt, data = mtcars)`, extract the residual degrees of freedom. Then extract the R squared from the model summary (`summary(mod)`). Hint: what does `attributes(mod)` return?

# 2. Functions

 * Small, re-usable code chunks
 * Takes input (arguments) and returns output

In [1]:
print("print is a function!") # what is the argument into print?
v <- c(4, 137, 151) # c() is a function.
v

[1] "print is a function!"


## 2.1 Defining a function

 * Goals: reusability, code clarity, scope cleanliness

In [2]:
add_two <- function(n) {
    z <- n + 2
    return(z)
}

In [3]:
add_two(5021)
add_two(137)

## 2.2 More complex functions

 * Functions can have multiple arguments
 * They can also have variable numbers of arguments (known as variadic)
 * They can also have default arguments (you would have to specify argument and value to change)

In [5]:
area_of_square <- function(length, width) {
    area <- length * width
    return(area)
}

area_of_square(4,5)

In [8]:
area_of_square <- function(length, width=length) {
    area <- length * width
    return(area)
}
area_of_square(4)
area_of_square(4,5)

## 2.3 Function Scope and Best Practices

 * `local` variable is inside given local scope (i.e. a function)
 * `global` variable is variable defined outside of a function
 * `free` variable is a variable searched for in the environment that the function is defined

In [15]:
a <- 5 # a is a global variable

f <- function(x, y) {
    # x and y are local variables that exist in the function body
    x^2 + y / z  # z is a free variable
 }

 * For modularity, function should be given all the variables it needs as arguments.
 * Free variables are discouraged because they are hard to track down in code.
 * When updating global variables, best practice is to use an assignment operator in its largest scope.
 * Additionally for functions: best to provide descriptive function names and arguments, just like you would want to provide descriptive variable names

 What distinguishes these functions below? Why is one bad practice, one ok practice, and one best practice?

In [14]:
# bad practice
a <- 5
f <- function(x) {
    return(a + x)
}
a <- f(2)
print(a)

# ok practice
a <- 5
f <- function(x, y) {
    return(x + y)
}
a <- f(a,2)
print(a)

# best practice
num_students <- 5
update_students <- function(current_num, new_num) {
    return(current_num + new_num)
}
num_students <- update_students(num_students, 2)
print(num_students)

[1] 7
[1] 7
[1] 7


## 2.4 Variable shadowing

Variable shadowing occurs when a variable is declared in a certain scope that has the same name as a variable declared in an outer scope.

In [18]:
x <- 0 # Global Variable

outer <- function() {
    x <- 1 # outer x
    
    inner <- function() {
        # print(paste("before assignment in inner:", x))
        
        x <- 2 # inner x
        print(paste("inner:", x))
    }
    
    inner() # Call function inner
    
    print(paste("outer:", x))
}

outer() # Call function outer
print(paste("global:", x))

[1] "inner: 2"
[1] "outer: 1"
[1] "global: 0"


## Review Questions

### 1. Basic function creation

Write a function named `square` that takes one argument, and returns the square of that argument. Write a test: what kind of code line will output `TRUE` for `square(5)`?

### 2. More complex functions

Define a function named `greet` that takes one argument, `name`, and prints a greeting message. The function should also have an optional argument, `punctuation`, which defaults to `"!"`. Test the function by calling it with and without the `punctuation` argument.

### 3. Function best practices

Look at the code block below. Answer the following questions:

 * What does `rnorm(10)` do? If you don't know, try running it in its own code block, as well as running `?rnorm`
 * What does the code do?
 * What mistake can you see in the code? How could this have been prevented?
 * Which lines of codes are a candidate for their own function? Why?
 * Write a function to modularize the code block and make it easier to read and run. Try to follow function best practices.

In [16]:
df <- tibble::tibble(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
)

df$a <- (df$a - min(df$a, na.rm = TRUE)) / 
  (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))
df$b <- (df$b - min(df$b, na.rm = TRUE)) / 
  (max(df$b, na.rm = TRUE) - min(df$a, na.rm = TRUE))
df$c <- (df$c - min(df$c, na.rm = TRUE)) / 
  (max(df$c, na.rm = TRUE) - min(df$c, na.rm = TRUE))
df$d <- (df$d - min(df$d, na.rm = TRUE)) / 
  (max(df$d, na.rm = TRUE) - min(df$d, na.rm = TRUE))

### 4. Variable Shadowing

Create a function `shadow_example` that takes one argument `x`. Inside the function, define a local variable `x` with a different value. The function should return the local `x`. Test the function with the argument 5 and print the result.

In [None]:
# your code here

# 3. Control Flow Part 1: Conditional Logic

## 3.1 Comparison Operators

Used to compare 2 values.

| Operator | Description | Example |
| --- | --- | --- |
| == | Equal to | a == b |
| != | Not equal | a != b |
| > | Greater than | a > b |
| < | Less than | a < b |
| >= | Greater than or equal to | a >= b |
| <= | Less than or equal to | a <= b |

## 3.2 Logical Operators in R

| Operator | Description |
| --- | --- |
| & | Element-wise logical AND |
| && | Logical AND with short-circuiting. Compares two logical expressions and returns TRUE if both statements are TRUE | 
| &#124; | Element-wise logical OR |
| &#124; &#124; | Logical OR with short-circuiting. Compares two logical expressions and returns TRUE if one of the statement is TRUE |
| ! | Logical NOT. Returns FALSE is the statement is TRUE |

Short circuiting means that if any of the values from left-to-right are determinative, the rest will not be computed. For example, with `FALSE && some_value`, since `FALSE &&` anything else will always evaluate to `FALSE`, the rest of the expression is not computed. This can lead to performance improvements and is why `&&` and `||` are preferred for control-flow operations (covered next).

### 3.2.1 Order of operations

The following is the order of operations from highest priority to lowest priority, use parentheses as necessary

| Operator | Description |
| --- | --- |
| ! | Logical NOT |
| &, && | Logical AND |
| &#124;, &#124; &#124; | Logical OR |

## 3.3 `if` and `else`

`if` and `else` statements allow you to conditionally execute code. It looks like this:

```
if (condition) {
  # code executed when condition is TRUE
} else {
  # code executed when condition is FALSE
}
```

`condition` is usually an expression formed from comparison and logical operators.

In [20]:
# this function uses if to evaluate whether names(x) is null or not
has_name <- function(x) {
  nms <- names(x)
  if (is.null(nms)) {
    rep(FALSE, length(x))
  } else {
    !is.na(nms) & nms != ""
  }
}

### 3.3.1 Multiple conditions and `switch` (new)

 * You can nest `if` statements inside of each other as well as chain multiple `if` statements together.
 * An `ifelse` shorthand function exists to cover what to do in either `TRUE` or `FALSE` conditions
 * There is also a function called `switch()` which allows you to evaluate code conditions based on position or name.

In [22]:
# mested if else statements
x_option <- function(x) {
  if (x == "a") {
    "option 1"
  } else if (x == "b") {
    "option 2" 
  } else if (x == "c") {
    "option 3"
  } else {
    stop("Invalid `x` value")
  }
}

In [23]:
# same function, but with switch instead
x_option <- function(x) {
  switch(x,
    a = "option 1",
    b = "option 2",
    c = "option 3",
    stop("Invalid `x` value")
  )
}

## Review Questions

### 1. Functions with conditional logic

Create a function named `classify_number` that takes one argument and returns "positive", "negative", or "zero" depending on whether the argument is greater than, less than, or equal to zero, respectively. Test the function with different values.

In [24]:
# your code here

### 2. Functions with multiple conditional statements

Implement a `fizzbuzz` function. It takes a single number as input. If the number is divisible by three, it returns “fizz”. If it’s divisible by five it returns “buzz”. If it’s divisible by three and five, it returns “fizzbuzz”. Otherwise, it returns the number. Make sure you first write working code before you create the function.

In [25]:
# your code here

### 3. Trying out `switch()`

What does thie `switch()` call do? What happens if `x` is "e"?

Note 1: Testing out this `switch` call will require you to either set `x` as a global variable first, or create a function with this statement.

Note 2: There are lines in this `switch` statement that won't make sense because we didn't cover it. R (and every other programming language) is full of syntax details with things like this that we won't get to in class because it's not necessarily core to learning how to write code for data science, but you can always learn more about what they do by executing the code yourself, or looking at the help with `?function`, in this case, `?switch`.

In [36]:
switch(x, 
  a = ,
  b = "ab",
  c = ,
  d = "cd"
)

### 4. Combining conditional logic with indexing

Since we can do logical indexing, we can use conditional logic to subset data structures. The following code block attempts to do so, but contains errors. Fix the errors so that the indexing works.

In [None]:
mtcars[mtcars$cyl = 4, ]
mtcars[-1:4, ]
mtcars[mtcars$cyl <= 5]
mtcars[mtcars$cyl == 4 | 6, ]

# 4. Control Flow Part 2: Iteration with loops

 * Iteration is the process of repeating an operation many times
 * Many methods of completing tasks with iteration, but looping is most intuitive

## 4.1 `for` Loops

`for` loops are used to iterate over items in a vector. They have the following basic form:

```for (item in vector) perform_action```

For eahc item in `vector`, `perform_action` is called once. `item` is the variable that gets updated with a new value from `vector` each loop.

In [37]:
for (i in 1:5) {
    i2 <- i*i             # this code block is executed for each iteration
    print(i2)
}

[1] 1
[1] 4
[1] 9
[1] 16
[1] 25


## 4.2 `while` Loops

`while` loops are similar to `for` loops, but do not have a pre-defined stopping point. Instead they rely on a condition being fulfilled.

In [41]:
m <- 1

while (m > 0) {
    print(m) 
    m <- m - sample(-1:1, 1) # -1, 0, 1 
}

[1] 1
[1] 2
[1] 1


In [43]:
m <- 5


while (m > 0) {
    print(m)
    m <- m - sample(-2:2, 1)
    
    if (m > 6) {
        print("uh, oh... exiting")
        break
    }
}

[1] 5
[1] 6
[1] 6
[1] 6
[1] "uh, oh... exiting"


In [49]:
for (x in 1:10) { # go back here
    if (x < 3 | x == 7) {
        next  
    }
    print(x)
}

[1] 3
[1] 4
[1] 5
[1] 6
[1] 8
[1] 9
[1] 10


### 4.3 Using `break` and `next`

 * `break`: exit a loop at will. In some cases, it's important to be able to exit a loop "early"
 * `next`: skip the current iteration and go to the next. In other cases, we just want to skip the current iteration.

## Review Questions

### 1. `for` loop

`mtcars` is a preloaded dataset in R. Write a function that uses a `for` loop to compute the mean of every column in `mtcars`. Make sure to use best practices for function writing.

In [38]:
head(mtcars, 6) # to give you an idea of what's in mtcars

Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Mazda RX4,21.0,6,160,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225,105,2.76,3.46,20.22,1,0,3,1


### 2. `while` loop

Create a function called `generate_until_target` that continuously generates random integers between 1 and 10 and adds them to the total until a specified target number is reached or exceeded. The function should keep track of the total sum of the generated numbers and the count of how many numbers were generated. Once the target is reached, the function should print the total sum, the number of generated numbers, and the final value that reached or exceeded the target.

### 3. Using `break` and `next`

Create a function named `filter_even` that takes a vector of numbers and prints only the even numbers. Use a `for` loop and implement break to stop the loop if you encounter a negative number, and next to skip odd numbers.

In [None]:
# your code here

### 4. Synthesizing it all

You are tasked with simulating an ATM withdrawal system. Write a function called `atm_withdrawal` that simulates the following scenario:
 * The ATM starts with a balance, `user_balance`.
 * A vector of withdrawal amounts, `withdrawals` will be provided as the argument, simulating the user making multiple withdrawals.
 * With each withdrawal, check if the withdrawal amount will leave the remaining balance as less than $0. If so, notify that the balance is insufficient and move to the next amount. If not, withdraw the amount from the current balance and print the remaining balance.
 * If the remaining balance is $0, print that all funds have been withdrawn and exit the loop.
 * The function should check for valid withdrawal amounts as well.

In [None]:
# your code here

In [None]:
atm_withdrawal(1000, c(100,800,900,-50,100))