<center><h1>Numeric and String Variables in R</h1></center>
<center><h3>Ellen Duong</h3></center>
<center><h3>August Guang</h3></center>
<center><h3>2024-09-11</h3></center>

# 1. Basic Math in R

In [1]:
13 + 7 + 10 + 12 - 85

In [3]:
13 + 7 * 100         # R knows about order of operations

In [4]:
(13 + 7) * 100       # Can also use parentheses

## 1.1 Floating-Point (i.e., decimal) Number 

In [None]:
2.15^3                # taking cube of a float

In [None]:
86/pi                 # R knows about Pi 

In [None]:
tan(pi/2)            # R knows about trig, too

# 2. Variable Assignment

It is generally more useful to assign your values to **variables**, which are R objects.

In [5]:
a <- 13 + 7         # Use the `<-` for assignment

 * You can assign to R variables using `=`, but we suggest sticking to `<-` as it will give more of the behavior you expect.
 * R comes with a `print()` function that we can use to look at our variables.
 * We will talk more about functions later, but a **function** is a series of statements that work together for a specific task.
 * All functions need pieces of information (or **arguments**) to perform their particular function, some arguments can be required and some optional.
 * `print()` takes a single required argument - the thing you want to print.

<div class="alert alert-block alert-info"> <b>Tip:</b> You can use `?function` in R to learn more about a particular function (for example `?print`)</div>

In [6]:
print(a)            # Now `a` stores value of `13 + 7` expression

[1] 20


In [7]:
5 + a               # We can continue to use `a` in subsequent code

In [8]:
b <- 5 + a          # We can use `a` as part of new assignment expressions

In [None]:
a <- 17

print(a)

In [None]:
a <- a + 1

print(a)

In [None]:
b <- 23

k <- 1 + b            # `k` is now 24

b <- k + 2            # `b` is now 26

print(b) 

**We can name our variable any combination of letters, numbers, or underscores (_) with a few exceptions:**

 * R has a few reserved words that cannot be used as variable names in R:
    * if, else, repeat, while, function, for, in
    * see the whole list of reserved words here: https://stat.ethz.ch/R-manual/R-devel/library/base/html/Reserved.html).
* Variables **cannot** start with a number or an underscore
* You can technically use . in your variable names, but this is best avoided for now.

Here are some examples of appropriate variable names:

In [None]:
fruit_string <- "mango peach"
dog_ <- TRUE
egg <- (1L)
foo <- 2i

In [None]:
upper_fruit <- toupper(fruit_string)   # convert to uppercase

In [None]:
print(upper_fruit)                     # original variable is unchanged

In [None]:
print(a + foo)

### Variable naming conventions

Generally descriptive names like `fruit_type` will be better than things like `a`, `b` and `x`. In R, `snake_case_names` are preferred over `camelCaseNames`, although this is not the case in all languages.

# 3. Data Classes and Data Structures

## 3.1 Data Classes

The variables you assign have some sort of data class associated with them. Data classes impact how functions will interact with your variables. R has 5 basic data classes, including:
 * character, which is a character
 * numeric, which can be real (a rounded number) or decimal (a number including a decimal point).
 * integer, which can be a rounded number (but not a decimal)
 * logical, which can be either TRUE or FALSE
 * complex, allows you to use imaginary numbers

You can also have missing data, which we will also talk about later.

Let's look at the data classes of our variables we have just assigned:

In [None]:
class(fruit_string)
class(dog_)
class(egg)
class(foo)

* The L we included when we assigned `egg` tells R that this object is an integer (as opposed to a numeric)
* the `i` we included when we assigned `foo` indicates an imaginary namber, making `foo` a comples data class object.

## 3.2 R Data Structures

 * R objects can also contain more than one element.
 * Objects that contain more than one element are organized into different data structures.
 * Data structures in R include vectors (also referred to as 'atomic vectors' in R), lists, matrices, arrays, and data frames.

### 3.2.1 Vectors

 * Probably the simplest R object that contains more than one element is a vector.
 * You can create a vector using the concatenate function, c(), or directly assigning them.
 * The c() function will coerce all of the arguments to a common data type and combine them to form a vector.
 
Here's a few examples of how you can assign vectors:

In [None]:
numeric_vector <- c(12,11,10,9,8) 
character_vector <- c('one', 'two', 'three', 'four', 'five') 
integer_vector <- (7:1) 
logical_vector <- c(TRUE, TRUE, FALSE)
character_vector_2 <- c('a', 'pug', 'is', 'not', 'a', 'big', 'dog')

Note that I used `:` when assigning `integer_vector`, which just generates a list from 6 through 12.

In [None]:
print(numeric_vector)
print(character_vector)
print(integer_vector)
print(logical_vector) 
print(character_vector_2)

Vectors also have class:

In [None]:
print(class(numeric_vector))
print(class(character_vector))
print(class(integer_vector))
print(class(logical_vector)) 
print(class(character_vector_2))

You can combine vectors using `c()`

In [None]:
combined_vector <- c(numeric_vector, integer_vector)

In [None]:
print(combined_vector)

You can use the `length()` function to see how long your vectors are:

In [None]:
print(length(combined_vector))

You can also access elements of the vector based on the index (or its position in the vector):

In [None]:
print(combined_vector[2])

You can combine these operations, but note that R code evaluates from the inside out:

In [None]:
print(combined_vector[length(combined_vector)])

Here, R is reading `length(combined_vector)` first. The value returned by the `length()` function is then used to access the last entry in the `combined_vector` vector.

You can also name vector elements and then access them by their names:

In [None]:
names(numeric_vector) <- c('one', 'two', 'three', 'four', 'five')
print(numeric_vector)

In [None]:
print(numeric_vector['three'])

We can use `-c` to remove vector elements:

In [None]:
print( combined_vector[-c(4)]   )

If a vector is numerical, we can also perform some math operations on the entire vector. Here, we can calculate the sum of a vector:

In [None]:
print(combined_vector)
print(sum(combined_vector))

In [None]:
print(combined_vector/sum(combined_vector)) 

Use the `round()` function to specify you only want 3 digits reported and assign it to a variable called `rounded`

In [None]:
rounded <- round((combined_vector/sum(combined_vector)), digits = 3)
print(rounded)

You can also perform math operations on two vectors...

In [None]:
print(rounded + combined_vector)

but you'll get weird results if the vectors are different lengths:

In [None]:
print(combined_vector)
print(numeric_vector)

In [None]:
print(combined_vector + numeric_vector)

It looks like R will give you an error message and then go back to the start of the shorter vector.

**Coercing between classes**

Let's say you're trying to import some data into R, maybe a vector of measurements:

In [None]:
your_data <- c('6','5','3','2','11','0','9','9')
class(your_data)

Your vector is a character vector because the elements of the vector are in quotes. You can coerce them back into numeric values using `as.numeric()`:

In [None]:
your_new_data <- as.numeric(your_data)
print(your_new_data)
class(your_new_data)

What happens if we try to `as.numeric` things that aren't numbers?

In [None]:
as.numeric(character_vector_2)

<div class="alert alert-block alert-warning">
<b>Example:</b> <b>NA</b> indicates that these are missing values, so be careful when converting between classes.
</div>

### 3.2.2 Missing values
Missing values can result from things like inappropriate coersion, Excel turning everything into a date, encoding format problems, etc.

In [None]:
here_is_a_vector <- as.numeric(c(4/61, 35/52, '19-May', 3/40))

We can use the `is.na()` function to see if our vector has any `<NA>` values in it:

In [None]:
is.na(here_is_a_vector)

You can combine this with the `table()` function to see some tabulated results from `is.na()`:

In [None]:
table( is.na( here_is_a_vector ) )

You might also encounter an `NaN`, which means 'not a number' and is the result of invalid math operations:

In [None]:
0/0

`NULL` is another one you might encounter, and it is the result of trying to query a parameter that is undefined for a specific object. For example, you can use the `names()` function to retrieve names assigned to an object. What happens when you try to use this function on an object you haven't named?

In [None]:
names(here_is_a_vector)

You might also see `Inf` or `-Inf` which are positive or negative infinity, which result from dividing by zero or operations that do not converge:

In [None]:
1/0

: 

### 3.2.3 Matrices
- A matrix in R is a collection of elements organized into rows and columns.
- All columns must be the same data type and be the same length.
- Generate a matrix using the following general format:

```
my_matrix <- matrix(
    vector, 
    nrow = r, 
    ncol = c, 
    byrow = FALSE)
```

For example:

In [None]:
my_matrix <- matrix(
    c(1:12), 
    nrow = 3, 
    ncol = 4, 
    byrow = FALSE)

print(my_matrix)

In the above code, we made `my_matrix`, we specified it should be populated by the vector `c(1:12)`, with 3 rows (`nrow = 3`) and 4 columns (`ncol = 4`) and be populated by column, not by row (`byrow = FALSE`)

We can access the rows and columns by their numerical index using a `[row, column]` format.
For example, here's how we access row 3 and column 4:

In [None]:
my_matrix[3,4]

Access entire row 3:

In [None]:
print(my_matrix[3,])

Access entire column 4:

In [None]:
print(my_matrix[,4])

You can also name the rows and columns and then access them by name. 
For example, lets name the rows and columns of `my_matrix`

In [None]:
dimnames(my_matrix) <- list(
    c('row_A', 'row_B', 'row_C'), 
    c('column_1', 'column_2', 'column_3', 'column_4'))

You can also name the rows and columns separately using `rownames()` and `colnames()` 

In [None]:
rownames(my_matrix) <- c('row_1', 'row_2', 'row_3')
colnames(my_matrix) <- c('column_1', 'column_2', 'column_3', 'column_4')

In [None]:
print(my_matrix)

In [None]:
print(my_matrix['row_2',])

In [None]:
print(my_matrix[,'column_2'])

We can use `%in%` to filter matrix rows, like this -- we are filtering `my_matrix` to get rows where `rownames(my_matrix)` are in the vector `c('row_2', 'row_3)`. Note the position of the comma, which indicates we want rows back.

In [None]:
my_matrix[rownames(my_matrix) %in% c('row_2', 'row_3'),]

### 3.2.4 Lists

- Lists in R are very flexible, they are collections of elements that can be different classes, structures, whatever. You can even have lists of lists.
- You make lists using the `list()` function (or by coersion using `as.list()`.

In [None]:
my_list <- list(character_vector, my_matrix)
print(my_list)

Use `[[]]` to access list elements:

In [None]:
print(my_list[[2]])

Add more brackets to access sub-elements of a list:

In [None]:
print(my_list[[2]][1])

In [None]:
print(my_list[[2]][1,])

Name the list elements:

In [None]:
names(my_list) <- c('character_vector', 'my_matrix')
print(my_list)

Use `unlist()` if you want to convert a list to a vector, let's make a new list (`list_1`)

In [None]:
list_1 <- list(1:5)
print(list_1)

Use `str()` to look at the structure

In [None]:
str(list_1)

Then `unlist()` and look at the structure

In [None]:
print(unlist(list_1))
str(unlist(list_1))

### 3.2.5 Data Frames 

- A data frame is another way to organize a collection of rows and columns.
- Tabular data structure (i.e., like Excel spreadsheet)
- Canonical data structure for data analysis
- It is a collection of lists organized into columns.
- It is similar to a matrix, except data frames allow different data types in different columns.
- We can use the `data.frame()` function to create a data frame from vectors using the following format:

```
dataframe <- data.frame(column_1, column_2, column_3)
```

In [8]:
example_df <- data.frame(
    c('a','b','c'), 
    c(1, 3, 5), 
    c(TRUE, TRUE, FALSE))

print(example_df)

  c..a....b....c.. c.1..3..5. c.TRUE..TRUE..FALSE.
1                a          1                 TRUE
2                b          3                 TRUE
3                c          5                FALSE


Use `names()` or `colnames()` to name columns,  `rownames()` to name rows, or `dimnames()` to assign both column and row names to the data frame:

In [9]:
colnames(example_df) <- c('letters', 'numbers', 'boolean')
rownames(example_df) <- c('first', 'second', '')
print(example_df)

       letters numbers boolean
first        a       1    TRUE
second       b       3    TRUE
             c       5   FALSE


In [10]:
names(example_df) <- c('_letters_', '_numbers_', '_boolean_')
print(example_df)

       _letters_ _numbers_ _boolean_
first          a         1      TRUE
second         b         3      TRUE
               c         5     FALSE


In [11]:
dimnames(example_df) <- list(c('__first', '__second', '__third'), c('__letters', '__numbers', '__boolean'))
print(example_df)

         __letters __numbers __boolean
__first          a         1      TRUE
__second         b         3      TRUE
__third          c         5     FALSE


We can use the `attributes()` and `str()` functions to get some information about our data frame:

In [12]:
attributes(example_df)

In [13]:
str(example_df)

'data.frame':	3 obs. of  3 variables:
 $ __letters: chr  "a" "b" "c"
 $ __numbers: num  1 3 5
 $ __boolean: logi  TRUE TRUE FALSE


The remainder of our discussion surrounding data frames can be found in our next lecture, `03-1_tidydata`.

### 3.2.6 Factors
- In some situations, you might be dealing with categorical variable, which is known as a factor variable in R. 
- A factor is a type of variable that has a set number of distinct categories into which all observations fall, which are the levels.
- So first we will create a data frame object containing a factor variable, then we'll add a row to the data frame
- In addition to `cbind()` for adding columns, there is another function in R called `rbind()`, which adds new rows to a data frame.
- Let's see what happens when we create a data frame and then try to add a new row to our data frame:

In [None]:
patients_1 <- data.frame(
    as.factor(c('Boo','Rex','Chuckles')), 
    c(1, 3, 5), 
    c('dog', 'dog', 'dog'))
names(patients_1) <- c('name', 'number_of_visits', 'type')
print(patients_1)
patients_1_rbind <- rbind(patients_1, c('Fluffy', 2, 'dog'))
print(patients_1_rbind)

- The `patients_1$name` column is classed as a `factor`, and the factors levels are `Boo`, `Chuckles`, and `Rex`. 
- Recall that a factor is a type of variable that has a **set number of distinct categories into which all observations fall, which are the levels.**
- R isn't sure what to do with the new level we are trying to add (`Fluffy`), so we have to turn those factors into strings.

We can convert the `patients_1$name` column to a character as follows:

In [None]:
patients_1$name <- as.character(patients_1$name)
str(patients_1)

Now we can use `rbind()` to add a new row:

In [None]:
patients_1 <- rbind(patients_1, c('Fluffy', 2, 'dog'))
print(patients_1)

# Re-ordering factor levels
- You might have ordinal data, like the following:

In [3]:
sizes <- factor(c('extra small', 'small', 'large', 'extra large', 'large', 'small', 'medium', 'medium', 'medium', 'medium', 'medium'))

Use the `table()` function to look at the vector:

In [None]:
table(sizes)

We might not necessarily want the factor levels in alphabetical order. You can re-order them like so:

In [None]:
sizes_sorted <- factor(sizes, levels = c('extra small', 'small', 'medium', 'large', 'extra large'))
table(sizes_sorted)

You can also use the `relevel()` function to specify that there's a single factor you'd like to use as the reference factor, which will now be the first factor:

In [None]:
sizes_releveled <- relevel(sizes, 'medium')
table(sizes_releveled)

You can also coerce a factor to a character:

In [None]:
character_vector <- as.character(sizes)
class(character_vector)
print(character_vector)

Notice that print doesn't return the `Levels` and each element of the vector is now in quotes.
It is also possible to convert a factor into a numeric vector if you want to:

In [None]:
print(sizes)
numeric_vector <- as.numeric(sizes)
print(numeric_vector)

This assigns numerical values based on alphabetical order of `sizes`

**Warning:** If you have a factor variable where the levels are numbers, as.numeric() is not appropriate! Please see `?factor` for more information about this problem (under "Warning" section). 

# 4. Functions
- We have already used a few functions (like print). 
- Functions are a series of statements that work together to form a specific task.
- All functions need pieces of information (or arguments) to perform their particular function. 
- Sometimes arguments are required, sometimes arguments are optional -- for example, `print()` requires only one argument -- the thing you want to print.
- R comes with some pre-loaded data sets -- you see the list by typing `print(data())`, but it is quite long.



The summary function, which can be applied to either a vector or a data frame (in the latter case, R applies it separately to each column in the data frame) yields a variety of summary statistics about each variable. 

In [8]:
# head provides the first n columns (n=5 by default) of a data frame
head(iris, n=6)

Unnamed: 0_level_0,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<fct>
1,5.1,3.5,1.4,0.2,setosa
2,4.9,3.0,1.4,0.2,setosa
3,4.7,3.2,1.3,0.2,setosa
4,4.6,3.1,1.5,0.2,setosa
5,5.0,3.6,1.4,0.2,setosa
6,5.4,3.9,1.7,0.4,setosa


In [9]:
# can also use tail to look at the last n columns
tail(iris)

Unnamed: 0_level_0,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<fct>
145,6.7,3.3,5.7,2.5,virginica
146,6.7,3.0,5.2,2.3,virginica
147,6.3,2.5,5.0,1.9,virginica
148,6.5,3.0,5.2,2.0,virginica
149,6.2,3.4,5.4,2.3,virginica
150,5.9,3.0,5.1,1.8,virginica


In [2]:
summary(iris)

  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
       Species  
 setosa    :50  
 versicolor:50  
 virginica :50  
                
                
                

`summary()` is informative for numerical data, but only provides counts for `factor` data, as in the `Species` column in the `iris` dataset.

## 4.1 Control flow functions

### 4.1.1 Comparison Operators
Used to compare two values

| Operator | Description | Example |
| --- | --- | --- |
| == | Equal to | a == b |
| != | Not equal | a != b |
| > | Greater than | a > b |
| < | Less than | a < b |
| >= | Greater than or equal to | a >= b |
| <= | Less than or equal to | a <= b |

### 4.1.2 Logical Operators
Used to combine conditional statements

AND - return TRUE when both conditions are TRUE

| P | Q | P AND Q |
| --- | --- | --- |
| 0 | 0 | 0 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 1 |

OR - return TRUE when either condition is TRUE

| P | Q | P OR Q |
| --- | --- | --- |
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 1 |

NOT - return TRUE when the condition is FALSE

| P | NOT P |
| --- | --- |
| 0 | 1 | 
| 1 | 0 |

### 4.1.3 Combining Logical Operators

In groups of 2, work on combining logical operators

https://tinyurl.com/mpa2065-fa2023-br-3

### 4.1.4 Logical Operators in R

| Operator | Description |
| --- | --- |
| & | Element-wise logical AND |
| && | Logical AND with short-circuiting. Compares two logical expressions and returns TRUE if both statements are TRUE | 
| &#124; | Element-wise logical OR |
| &#124; &#124; | Logical OR with short-circuiting. Compares two logical expressions and returns TRUE if one of the statement is TRUE |
| ! | Logical NOT. Returns FALSE is the statement is TRUE |

Short circuiting means that if any of the values from left-to-right are determinative, the rest will not be computed. For example, with `FALSE && some_value`, since `FALSE &&` anything else will always evaluate to `FALSE`, the rest of the expression is not computed. This can lead to performance improvements and is why `&&` and `||` are preferred for control-flow operations (covered next).

In [None]:
c(TRUE, FALSE, TRUE) & c(TRUE, TRUE, FALSE)

In [None]:
a <- c(1, 2, 3) > 2
print(a)

[1] FALSE FALSE  TRUE


In [None]:
b <- c(3, 4, 5) < 4
print(b)

[1]  TRUE FALSE FALSE


In [None]:
a & b

In [None]:
# Warning message in a && b:
# “'length(x) = 3 > 1' in coercion to 'logical(1)'”
a && b

“'length(x) = 3 > 1' in coercion to 'logical(1)'”


### 4.1.5 Order of Operations - Logical Operators

The following is the order of operations from highest priority to lowest priority, use parentheses as necessary

| Operator | Description |
| --- | --- |
| ! | Logical NOT |
| &, && | Logical AND |
| &#124;, &#124; &#124; | Logical OR |

### 4.1.6 The `if` Statement

 - The `if` statement lets us execute code given that a condition is true. 
 - Similar to many other languages and tools (e.g., Excel)

In [None]:
if (5 > 4) {
    print("yep!!")
    x <- 555
    print(x)
}

[1] "yep!!"
[1] 555


In [None]:
x <- c("foo", "bar", "baz")

if (length(x) == 3) {
    print("Yes, this is a vector of length 3! Hooray!!")
}

[1] "Yes, this is a vector of length 3! Hooray!!"


If `if` returns FALSE, what happens?

In [None]:
if ("potato" == "fries") {
    print("you will never see this print")      # this never gets executed
}

In [20]:
z <- 171

if (z > 10 && z %% 2 != 0) {
    print(z)
}

[1] 171


### 4.1.7 The `else` Statement
  - `else` gives us a way to execute code when `if` block doesn't get executed

In [None]:
if ("potato" == "fries") {
    print("you will never see this print")      # this never gets executed
} else {
    print("both are delicious")                 # this DOES get executec
    v <- rnorm(10)
    print(v)
}       

[1] "both are delicious"
 [1]  0.203588116 -0.004069187 -0.544309143  0.604663966 -0.920926788
 [6] -0.598511931 -0.992327171  0.420509953  0.389283796 -0.030583657


### 4.1.7 Combining `else` and `if`
  - We can use `else` and `if` together in sequence 

In [None]:
coin_value <- 5

if (coin_value == 25) {
    print("washington")
} else if (coin_value == 10) {
    print("fdr")
} else if (coin_value == 5) {
    print("jefferson")
} else {
    print("lincoln")
}

[1] "jefferson"


### 4.1.8 Ifelse() function 

The ifelse() function is a shorthand function to the traditional if…else statement used in other programming languages. It takes a vector as an input and outputs a resultant vector. The general syntax for the ifelse statement is as follows: 

`returned_vector <- ifelse(test_expression, x, y)`   

This returned vector (i.e., returned_vector) has element from x if the corresponding value of test_expression is TRUE or from y if the corresponding value of test_expression is FALSE.
Specifically, the i-th element of returned_vector will be x[i] if test_expression[i] is TRUE else it will take the value of y[i]. In other words, if the vectors [i]th element is even (evenly divisible by 0), return `even`, if  its odd then return `odd`.

### Example of ifelse() use 

In [None]:
a = c(5,7,2,9)
ifelse(a %% 2 == 0,"even","odd")


We can also specify that we want to know if the value is even -- otherwise, just report back what was in a originally

In [None]:
ifelse(a %% 2 == 0,"even",a)

## 4.2 User defined functions
- In addition to the already available functions in R, you can also create your own functions. 
- Generally, if you find yourself re-writing the same pieces of code over and over again, it might be time to write a function. 

Functions take the following basic format:

```
myfunction <- function(argument_name){
  stuff <- this is the body of the function(
    it contains statements that use argument_names
    to do things and make stuff)
  return(stuff)
}
```

More formally, R functions are broken up into 3 pieces:
1. formals() - the list of arguments
2. body() - code inside the function
3. environment() - how the function finds the values associated with function names

Here's an example of a function called `roll()` that rolls any number of 6-sided dice:

In [None]:
roll <- function(number_of_dice){
    rolled_dice <- sample(
        x = 6, 
        size = number_of_dice, 
        replace = TRUE)
    return(rolled_dice)
}

- The built-in R function `sample()` is nested inside our `roll()` function.
- `roll()` uses the argument `number_of_dice` as the `size`, `x` is the number of sides on the die, which we have hard-coded as `6`, and `replace = TRUE` means that we are sampling the space of all potential die roll outcomes with replacement.
- Lastly, we tell the function what it should return (`rolled_dice`).

To call that function and print the output:

In [None]:
print(roll(number_of_dice = 10))

Lets look at the `formals()`

In [None]:
formals(roll)

What about `body()`?

In [None]:
body(roll)

What about `environment()`? 

In [None]:
environment(roll)

So, the function itself is called `roll`, it takes the argument or formals `number_of_dice` and the body of the function uses the built-in `sample` function in R to simulate dice rolls (use `?sample` to learn more about the `sample()` function). 

### 4.2.1 More on user defined functions
- We can also have functions that take more than one argument. 
- Lets say we want to roll different numbers of dice (`number_of_dice`) and we want to change the size of the dice we roll (`number_of_sides`).

In [None]:
roll <- function(
    number_of_dice, 
    number_of_sides){
    rolled_dice <- sample(
        x = number_of_sides, 
        size = number_of_dice, 
        replace = TRUE)
    return(rolled_dice)
}

- The new `roll()` uses the `sample()` function again, but this time it uses the `number_of_dice` and `number_of_sides`

In [None]:
print(roll(number_of_dice = 5, number_of_sides = 20))

 ## Default values in functions

In [4]:
roll <- function(
    number_of_dice=2, 
    number_of_sides=6){
    rolled_dice <- sample(
        x = number_of_sides, 
        size = number_of_dice, 
        replace = TRUE)
    return(rolled_dice)
}

In [5]:
roll()      # call our function without an argument

In [6]:
roll(number_of_dice=3)     # call function with `number_of_dice` equal to 3

### 4.2.2 Function Scope
* A *local variable* is a variable in a given local scope (i.e. inside a function).
* A *global variable* is a variable defined outside of a function
* A *free variable* is a variable searched for in the environment that the function is defined.

In [None]:
a <- 5 # a is a global variable

f <- function(x, y) {
    # x and y are local variables that exist in the function body
    x^2 + y / z  # z is a free variable
 }

### 4.2.3 Function Scope -- Best Practices
* To maintain modularity, a function should be given all the variables it needs as arguments.
* Free variables are discouraged because they are hard to track down in code
* When updating global variables, best practice is to use an assignment operator in it's largest scope. For example

In [None]:
a <- 5 # Global Variable

f <- function(x, y) {
    return(x + y) # Returns x + y, with no mention of `a`. 
}
a <- f(a, 2) # Update `a` as a result of the function

print(a)

### 4.2.4 Variable Shadowing
Variable shadowing occurs when a variable is declared in a certain scope has the same name as a variable declared in an outer scope.

In [7]:
x <- 0 # Global Variable

outer <- function() {
    x <- 1 # outer x
    
    inner <- function() {
        # print(paste("before assignment in inner:", x))
        
        x <- 2 # inner x
        print(paste("inner:", x))
    }
    
    inner() # Call function inner
    
    print(paste("outer:", x))
}

outer() # Call function outer
print(paste("global:", x))

[1] "inner: 2"
[1] "outer: 1"
[1] "global: 0"


## Challenge problem

In this problem we will write a function that computes the Euclidean (i.e., $L^2$) norm of a `vector` object. Recall the formula for the Euclidean norms below. 

Let's write a function called `l2_norm()` that takes a single argument, a vector, `v`, and computes the $L^2$ norm of that vector. The formula below describes the computation that our function should complete. Note that we will likely want to use the `sum()` and `sqrt()` functions as part of our function.  

$$ \left\| v \right\|_2 = \sqrt{v_1^2 + v_2^2 + v_3^2 + ... + v_n^2} $$

# 5. Iteration

 - What is iteration?
 - It is often important to repeat some operation many times
 - There are many methods of completing tasks that require iteration
   + but looping is often the most intuitive
 - _caveat emptor_
   + R is known as  "slow" language when it comes to loops
   + Generally, it is faster to do things using variations of `apply` which are vectorized, but this is rather specific to R
   + We will cover `apply` and variations next lecture.

## 5.1 The `for` Loop

In [None]:
for (i in 1:5) {
    print(i)               # this code block is executed for each iteration
}

In [None]:
for (i in 1:5) {
    i2 <- i*i             # this code block is executed for each iteration
    print(i2)
}

### 5.1.1 Counting Down using `for` Loop 

In [None]:
for (j in 5:-5) {
    print(j)
}

### 5.1.2 Range-based `for` Loop

In [None]:
animals <- c("cat", "dog", "bird", "fish")

for (a in animals) {
    print(a)
}

## 5.2. Examples

### 5.2.1 Printing Even Numbers

 - Demonstrate use of function and `if` statement in loop

In [None]:
is_even <- function(m) {
    ans <- m %% 2 == 0             # if m/2 has remainder 0, then m is even
    return(ans)
}

In [None]:
max_num <- 16

for (n in 1:max_num) {
    if (is_even(n)) {              # this entire block is execute each iteration
        print(n)
    }
}

### 5.2.2 Example: Function to Sum Even Numbers

  - Demonstrate loop inside of function

In [None]:
# This function has a single parameter, `max_num`. The function iterates from 1 
# to `max_num` and takes the sum of all the even values between 1 and `max_num`.

sum_even_nums <- function(max_num = 5) {
    res <- 0
 
    for (n in 1:max_num) {
        if (is_even(n)) {
            res <- res + n
        }
    }
    return(res)
}

In [None]:
sum_even_nums()                      # sum of 2, 4

## 5.3 The `while` Loop

  - The second way to iterate is using a `while` loop
  - Very similar to `for` loop, but without pre-defined stopping point

In [None]:
m <- 10

while (m > 0) {
    print(m) 
    m <- m - 1
} 

In [None]:
for (i in 10:1) {
    print(i)
}

### 5.3.1 Another `while` Loop Example

  - Introduce random component

In [None]:
m <- 1

while (m > 0) {
    print(m) 
    m <- m - sample(-1:1, 1) # -1, 0, 1 
}

## 5.4 Using `break` and `next`

  - In some cases, it's important to be able to exit a loop "early"
      + The `break` keyword allows us to exit a loop at will
  - In other cases, we want to merely skip an iteration
      + We can use `next` to accomplish this

### 5.4.1 Exiting Loop with ` break`

In [None]:
m <- 5


while (m > 0) {
    print(m)
    m <- m - sample(-2:2, 1)
    
    if (m > 6) {
        print("uh, oh... exiting")
        break
    }
}

In [None]:
sample(-2:2, 1)

### 5.4.2 Skipping and Iteration with `next`

In [None]:
for (x in 1:10) { # go back here
    if (is_even(x)) {
        next  
    }
    print(x)
}

# 6. Importing and Exporting files
- There are a few different ways to read and write files in R. This will be covered more in later lectures.
- We will use `read.table()` and `write.table()`.
- Lets use some of the pre-loaded data that comes with R. 
- First, let's import the `iris` data as a data frame and use `head()` to look at the first few lines

In [1]:
iris <- data.frame(iris)
head(iris)

Unnamed: 0_level_0,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<fct>
1,5.1,3.5,1.4,0.2,setosa
2,4.9,3.0,1.4,0.2,setosa
3,4.7,3.2,1.3,0.2,setosa
4,4.6,3.1,1.5,0.2,setosa
5,5.0,3.6,1.4,0.2,setosa
6,5.4,3.9,1.7,0.4,setosa


You can write the output to a file using `write.table`:

In [None]:
write.table(iris, file = '~/iris_table.txt')

Use `read.table()` to pull data into R:

In [None]:
iris_table_2 <- read.table('~/iris_table.txt')

In [None]:
head(iris_table_2)
str(iris_table_2)

Another convenient function is `list.files()`, which you can use with a wildcard (`*`) to return a list of all files in a directory (specified in `path =`) that start with `iris_`:

In [None]:
list.files(path = '~', pattern = 'iris_*')

# 7. R packages
- Although R comes with many built in functions, you will probably want to install and use various R packages.
- You can install the packages using `install.packages('package_name_here')` (where you would replace 'package_name_here' with your package of choice, in quotes). 
- This will download the package and any additional required dependencies. 
- Run the next cell to install the `ggplot2` package:

In [None]:
#install.packages('ggplot2')

Before you can actually use the package, you have to load it as follows:

In [2]:
library('ggplot2')