# Week 3: User-Defined Functions (UDFs) and Lists

## Lesson Summary: 

**Built-in Functions**
    - Numerical / Statistical
    - Segway into students making the functions themselves.

**UDFs**
    - Make your code DRY
    - Divide and conquer

**Exercise: Mean and Variance**

**Lists**
    - Generating
        - Named lists
    - Subsetting
    - Use cases

## Functions:


### What are functions? 

- Functions are what makes coding, *coding*. The primary purpose of a function is to manipulate the different objects created in R.

- Creating your own functions will ultimately help you more efficiently accomplish tasks that would normally take you more time than you need to exhaust. 

- Functions have an input and an output. 

- Functions act as an operator, just like basic arithmetic symbols (e.g. +, -, /, etc.).

- You call a function when you want to complete a specified task.



## Built-In Functions

To begin, let's first see what functions can do in the R programming language. 



**General**

- length(x)      *# Return no. of elements in vector x*

- range(x)       *# Returns the minimum and maximum of x*




**Math**

- abs(x)         *# The absolute value of "x"*

- log(x), exp()  *# Fairly obvious*

- cos(),sin(),tan(),acos(),asin(),atan(),atan2()       *# Usual stuff*

- eigen()      *# Computes eigenvalues and eigenvectors*

- deriv()      *# Symbolic and algorithmic derivatives of simple expressions*

- integrate()  *# Adaptive quadrature over a finite or infinite interval*




**Graphical**

- plot()                *# Generic function for plotting of R objects*

- curve(5*x^3,add=T)    *# Plot an equation as a curve*

- points(x,y)           *# Add another set of points to an existing graph*

- hist(x)               *# Plot a histogram of x*

- pdf()                 *# Plot to pdf file*

- png()                 *# Plot to PNG file*

- jpeg()                *# Plot to JPEG file*




**Statistical**

- mean(x)

- var(x)

- median(x)

- min(x)

- max(x)

- summary(x)            # Returns a summary of x: mean, min, max etc.



Visit ["A list of useful functions in R"](http://www.sr.bham.ac.uk/~ajrs/R/r-function_list.html) for more advanced and detailed built-in functions!



## User-Defined Functions -- Creating Your Own Functions

Sometimes, R just won't have the function you need. Thus, you must make the function you *deserve*. 

How do you do so? Look no further!



**Note**

- Make sure that the variable that you place in the input is defined, either in the local environment of the function or the global environment. 



### UDFs


In [None]:
# anatomy of a function

function_name <- function(argument_1, argument_2) {
    body
}

### How To Use Your Functions

**It's really simple. Follow the steps below!**



In [None]:
# 1) Make your function using the structure above.
function_name <- function(x) {
    x + 1 * 10
}

# 2) Once you are done, you need to call on it to send it to your environment. 
# Run the function!
function_name <- function(x) {
    x + 1 * 10
}

# 3) Create an arbitary variable. It can literally be anything.
anything <- 'literally'
valid <- 3

# 4) Now call the function with the variable made earlier.
function_name(anything)
function_name(valid)

# 5) Now look at your environment, scroll to the function category, and check to see if your function appeared! This should appear regardless if your variable was a valid input of the function. (e.g. you receive something like this 'argument is not numeric or logical: returning NA') 

## Useful Things To Know When Creating A Function



**Ctrl + C**

- Pressing 'Ctrl + C' will quit a processing function, if-loop, while-loop, or any operation that prevents you from utilizing R. (Also called killing a process/killing running code)


**return(value)**

- return() ==> under certain circumstances, you might want to use this to quit a function early


**?(function_name)**

- ?(function_name) ==> will bring you to R documentation and show you usage, **arguments**, values, and *examples*


**args(function_name)**

- args() ==> shortcut which gives you arguments of a function


**Miscellaneous**

- You can manipulate R in the same way you would treat an object. 
    - Examples of this would be:
        - Reassigning the function to a variable with a different name        
        

## Excercise #1:

### Create your own function: 

Convert the temperature units, Celcius, to the units, Fahrenheit. 

Once you are done, convert Fahrenheit to Celsius.

You are given the following formula:

![Image of C to F/F to C Formula](http://www.101computing.net/wp/wp-content/uploads/fahrenheit_to_celsius_formulas.png)



## Exercise #2:

**Create a function that helps you more quickly calculate the averages in a vector more quickly. (Yes, this is already a built-in function. Try to figure out its anatomy on your own though! It's good practice.)**

In [None]:
#  _________ EXERCISE _________


# Create a function that makes calculating the average of the values in a 
 # vector quicker. (Yes, this is already built in.)


# Take a look at the function, mean(), break down how the function works, then
 # reconstruct it in function form.



# Keep in mind that there are always multiple ways to create a function! 
 # Some may take some more time to execute; other strategies may be quicker to 
 # execute, even by just a few nanoseconds!



# Let's break down the function, ' mean() ':
 # As you might know, the function, mean, *means* average--the sum of a group 
 # of numbers, divided by the number of numbers in the group. 



# Thus you might think of something similar to the following:

mean_UDF <- function(<FILL IN>) {
  <FILL IN>
}

## Excercise #3:

Now let's step it up a notch. Let's try **variance**! [The variance is the average of the squared differences from the mean.](http://www.mathsisfun.com/data/standard-deviation.html) If you are not familiar with variance or need to be refreshed on the topic, click on the link prior!

Try to figure out how to make a function for variance. Remember, there are multiple solutions for this excercise!

## Excercise #4:

### Challenge Problem:

Create a function that outputs the second largest number in a matrix. 


**Hint 1:**

Here a quick way to generate a random matrix to see if your function works!
- <code> replicate(5, rnorm(5)) </code> 
- Expect something along the lines of this, but different numbers


In [None]:
>         [,1]        [,2]        [,3]        [,4]

> [1,] -0.4386910   -1.6518913  0.6486602   0.35054001

> [2,] -0.5263067   -0.1038436  -0.6189868  1.60853842

> [3,] -0.7413896   0.3776236   0.3189630   -0.13850185

> [4,] -0.4696661   0.6092193   0.6877337   1.68501153

> [5,]  0.9611014   0.0975538   0.8344805   -0.05673202

>             [,5]

> [1,]  0.37456409

> [2,] -1.03342478

> [3,] -0.70093245

> [4,] -0.02589409

> [5,]  0.94647790

**Hint 2:** 

If you don't quite understand the question, here's some clarification. Above is a 5x5 matrix. You are asked to find the number with the *second largest* value in the matrix. By observation, you'll find that 1.60853842 is the second largest number. How did you find that? Can you put your logical thoughts in the form of a code?


**Hint 3:** 

Can you find the second largest number in a vector?



**Hint 4:**

In [None]:
# you use a 'for loop' when you don't want to keep writing a command over and over and over...
# use 'for loops' when you're lazy or smart :3

for (*index* in *array*){
  statement
}

**Hint 5:**

In [None]:
# an 'if statement' is used you want a command to be executed after a condition is fulfilled.
# you can combine this tool with the 'for loop'

if (test_expression) {
   statement
}

In [None]:
# ONLY LOOK AT THIS IF YOU ARE COMPLETELY STUCK AND YOU HAVE NO IDEA WHAT TO DO!!
# Below is a fill-in-the-blank hint!
 







# ARE YOU SURE? THINK A LITTLE HARDER IF YOU'RE STRUGGLING.









# Hint 6:
second_largest <- <FILL IN>(rand)
  
rand <- replicate(5, rnorm(5))

largest <- <FILL IN>
for (i in <FILL IN>){
  if(<FILL IN>) {
    <FILL IN>
  }
}

second <- <FILL IN>
for (<FILL IN>){
  if(<FILL IN>) {
    <FILL IN>
  }
}

print(rand)
print(second)

#### Your output should look something like this (but probably with different numbers:

Check yourself to see if your code actually output the second largest number!


In [None]:
>     print(rand)
            [,1]       [,2]        [,3]       [,4]       [,5]
[1,]  1.22459413 -0.6863706 -0.05042041 -0.1114614  0.3662244
[2,] -0.19765781  1.9087161  0.60332108  0.3450471  0.9855776
[3,] -0.01638604  0.8648472  1.78158580  0.7674530 -2.1440592
[4,]  0.16362135  1.5585552  1.45021696  0.5705995 -1.0773751
[5,] -0.82824386  1.0014212 -1.00584427 -1.3149331 -0.3898746

>     print(second)
[1] 1.781586

## Lists:

Transitioning into from the basics of R, we will now learn about lists, one of the crucial fundamentals in future data science endeavors.

Lists will open a gateway to group different objects, whether individually or in a vector, into a succinct bundle. This can be especially helpful when you want to compare differences in a neat format.



### Quick Review of Vectors:

Each vector will be comprised of either characters, numerics, integers, boolean, or complex objects. They will follow the syntax below...

    - <code> vector_variable <- c('objects', 'separated', 'by', 'commas') </code>

==========================================================

- character ... <code> dms <- c("s", "e", "n", "d", "n", "u") </code>
- numeric ..... <code> int <- c(1.1, 2.2, 3.3, 5.5, 8.8) </code>
- integer ..... <code> num <- c(1, 2, 3, 5) </code>
- boolean ..... <code> boo <- c(TRUE, FALSE, T, F) </code>
- complex ..... <code> cpx <- c(3 + i, -1 + 3i) </code>

Notice how dms has 6 elements, int has 5 elements, num has 4 elements, boo has 4 elements, and cpx has 2 elements.



### Lists: "Bundling" Vectors"

When you hear the word list, one may think of a group of individual components in a vector-like form, like a grocery list! In the most rudimentary sense, a "list" in the context of the R programming language is actually a "list of lists".
    
Each list will follow the format 

- <code> list_variable <- list(vector_1, vector_2, vector_3) </code>

For example,

(input)  ==> <code> bundle <- list(dms, int, num, boo, cpx) </code>

(output) ==> <code> bundle
                    [[1]]
                    [1] "s" "e" "n" "d" "n" "u"
                    [[2]]
                    [1] 1.1 2.2 3.3 5.5 8.8
                    [[3]]
                    [1] 1 2 3 5
                    [[4]]
                    [1] TRUE FALSE T F
                    [[5]]
                    [1] 3+i -1+3i
            </code>


**NOTE**: By deduction, you can see that dms corresponds to [[1]], int to [[2]], and etc. Also if you notice, there are number in single square brackets and numbers in double square brackets. The single brackets indicate vectors, matrices, arrays, and data frames (all of which are either 1D vectors or 2D squares). The double brackets indicate subscripts on lists.


For example, if you want to pull the fourth vector from "bundle", simply type...
<code> 
    bundle[[4]] 
    
    [1] TRUE FALSE T F   
</code>


If you want to extract the second element from the fourth vector of the list, simply type...
<code>  
    bundle[[4]][2]
    
    [1] FALSE    
</code>


Luckily, if you be slippin', R has got yo back. Worry not if you forget the double brackets when you want to extract a vector from a list. However, know that you will encounter an error message if you try to extract an element from a vector from a list. See below...

    You is GUCCI:
    INPUT : bundle[4]
    OUTPUT: [1] TRUE FALSE T F

    You is NOT gucci:
    INPUT : bundle[4][2]
    OUTPUT: NULL
    



### Subscripting/Subsetting:

In programming, you use a subscript to identify elements in an array. In this context, we will also learn how to use subscripts in a list. 

Now say you want to have a "matrix-like" structure to organize your data. Now you want to extract some data. 

In [None]:
# _____Example_____

#Given:
people <- list(name = "Danny", friend = "Kenny", common_interests = 3, interest_names = c("basketball", "video games", "21"))

# **Notice how 21 in the vector, interest_names, is displayed as a character! Not a numeric.

# Remember that each element in the list has a number assigned to it and each vector element has a number assigned to it based 
## on the order the elements are written.
# Usually, subscripts will be placed in brackets as an input, following the array's name. Outputted will be whatever is assigned
##under that array's name.

people[[1]] == name
people[[2]] == friend
people[[3]] == common_interests
people[[4]] == interest_names

people[[4]][1] == "basketball"
people[[4]][2] == "video games"
people[[4]][3] == "21"

**WAIT!!!** Okay, this is cool if you have a small list with not a lot of items. What if your list is enormous??
Luckily, R has a way to solve this problem if you know what the name of the vector in your list is, but not its' position.

====================================================================

**The format follows,** 

<code> list_name$component_name </code>



In [None]:
#For example, let's say we forget how to count to 4 but we know that "interest_names" is SOMEWHERE in our list! Simply type...
people$interest_names
#...and the vector assigned under interest_names should be outputted!

#For more emphasis,
people$name             == people[[1]] == "Danny"
people$friend           == people[[2]] == "Kenny"
people$common_interests == people[[3]] == 3
people$interest_names   == people[[4]] == c("basketball", "video games", "21")

#additionally,
people$interest_names[1] == "basketball"
people$interest_names[2] == "video games"
people$interest_names[3] == "21"

### Modifying Lists

When you want to add an object to a pre-existing list, follow the steps below...


In [None]:
# _____Example______

# 1) current list:
list_name <- list(name = "Danny", 
                  friend = "Kenny", 
                  common_interests = 3, 
                  interest_names = c("basketball", "video games", "21"))


# 2) you want your new list to look like this
list_name <- list(name = "Danny", 
                  friend = "Kenny", 
                  common_interests = 3, 
                  interest_names = c("basketball", "video games", "21", "coding R"))


# 3) What do you do?? Follow the code below!
list_name     == original_list
new_list      == modified_list
interest_names== vector_you_want_to_add_object
"coding R"    == element_want_added

modified_list <- c(original_list, vector_you_want_to_add_object = element_want_added)


# 4) which is equal to...
new_list <- c(list_name,  interest_names = "coding R")


# 5) which should output...
new_list <- list(name = "Danny", 
                  friend = "Kenny", 
                  common_interests = 3, 
                  interest_names = c("basketball", "video games", "21", "coding R"))



## Congratulations!
You're done with tonight's exercises! Check back to [the syllabus](https://github.com/JasonFreeberg/R_Tutorials/blob/master/README.md) for this week's homework and a quick look at next week's topics. 

**Suggested Homework:**
    - DataCamp: **Introduction** to R
        - Chapters 4 and 5: Factors and Dataframes
    - DataCamp: **Intermediate** R
        - Chapters 1 to 3: Contiditionals and Control Flow, Loops, and Functions