## Introduction to R Programming

## Setup 

This guide was written in R 3.4, so if you currently do not have R installed, please download it [here](https://www.r-project.org/). 

### What is R?

R is a powerful language used primarily for data analysis and statistical computing. R has what we call `packages`, which can used for almost any data science task. Packages like `dplyr`, `tidyr`, `readr`, `data.table`, `SparkR`, `ggplot2` have made data manipulation, visualization and computation much easier *and* faster.

### Why use R?

- It's open source
- 7800 packages available for computation tasks
- High performance computing experience 

### Comments 

In the context of computer science, comments are used for providing details throughout your code. They're particularly useful when you're working on something complex and want to remember why or what you did, as well as for when other people need to read your code and don't have you to explain it to them. 

In R, we denote comments with the `#` symbols, such as follows:

``` R
# This is a comment!
```


### Print 

When interacting with a user, we might want to send messages to them. To do that, we send messages to the console via the `print` command.


In [1]:
print("The Dog is Cute")

[1] "The Dog is Cute"


# Data in R

At the core of R is the data we use, and its different forms. In this section, we'll review the different data types R supports and when to use each. But first, we'll begin with variables.


## Variables

Imagine you had no memory to store information you need on a regular basis. That would be miserable, right? You'd have to relearn everything so you can reference it in whatever context you need it for. In R and any other programming language, the form of 'memory' we use to reference information (or data) is with *variables*. 

Variable are composed of two parts: its variable name, and its variable value. The variable name is how you reference whatever piece or collection of data you need. The are names we assign values to. Why do we want to do this? Because without variables, we don't have a way of referencing and using data. Value can be many things, including another variable, but in most cases, the value is a <b>data type</b>. 

In R, there are actually <b>two</b> ways of assigning values: `=` and `<-`. Typically though, we use `<-`, such as `my_val <- 4`.


### Data Types and Operators

Every programming language needs to store data and a way to work with this data. R, like other languages, breaks these data into types and provides different ways to interact with them. 

Everything you see or create in R is an <b>object</b>. A vector, matrix, data frame, even a variable is an object. R treats it that way. So, R has 5 basic classes of objects, including:

- Character
- Numeric (Real Numbers)
- Integer (Whole Numbers)
- Complex
- Logical (True/False)

These classes have attributes, such as the following:

- names 
- dimension names
- dimensions
- class
- length

Attributes of an object can be accessed using `attributes()` function. We will get into what functions are later.


In [3]:
my_var <- -4
my_var = -4
print(my_var)
my_var = 8

[1] -4


# Challenge

Assign three variables called `var1`, `var2`, and `var3` to the values `1`, `"Byte"`, and `5.43`. 

# Data Collections

Frequently, your program will require that you store multiple data items together. This might be because you have a group of data that should be referenced together, or even to reduce the number of variables you have to define. Regardless of why, there are four data collections that you can utilize in R: Vectors, Matrices, Lists, and DataFrames. 

## Vectors

The most basic object in R is known as vector, which contains objects of the same class. Let's try creating vectors of different classes. We can create vector using `c()`:

In [2]:
a <- c(1.8, 4.5)   # numeric
b <- c(1 + 2i, 3 - 6i) # complex
d <- c(23, 44)   # integer

# Challenge

Using the variable `vec1`, create a vector with <b>5</b> numerical values.

### Matrices

When a vector is introduced with row and columns (the dimension attribute), it becomes a matrix. It consist of elements of the same class, such as the following:

In [3]:
my_matrix <- matrix(1:6, nrow=3, ncol=2)
print(my_matrix)

     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6


# Challenge

1. Create two vectors with the values 1 to 5 and 10.5 to 12.5, respectively. Then concatinate these two vectors into 1 vector, named `vec1`. What is the class? Call the function and assign its result to the variable `class1`.

2. Change the 4th element of the above vector to the word 'four' and assign it to the vector `vec2`. Did this change the class? Call the function and assign its result to the variable `class2`.

3. Using the `rep()` function, create a vector that repeats the values 1 2 3 twice. Assign this vector the variable `vec3`. (Result: 1 2 3 1 2 3)

4. Create a 3 by 4 matrix where each row has the same value. Assign this to the variable `matrix1` (hint: use the rep function)

5. Create a 4 by 3 matrix where each row has the same value. Assign this matrix to the variable `matrix2`. (hint: use the `rep()` function)

In [1]:
# Question 1

In [2]:
# Question 2

In [3]:
# Question 3

In [4]:
# Question 4

In [5]:
# Question 5

## Lists

Lists are present in R, as well as most other programming languages. A list is a data structure that can hold any number of any types of other data structures. For example, if you have vector, a dataframe, and a character object, you can put all of those into one list object. 

### Constructing a List

To begin constructing a list, we'll create three variables with different data types. Since lists support mixed types, we'll use these to add to a list.

In [1]:
vec <- 1:4
num <- 17
char <- "Hello!"

Then you can add all three objects to one list using `list()` function:


In [3]:
list1 <- list(vec, num, char)

print(list1)

[[1]]
[1] 1 2 3 4

[[2]]
[1] 17

[[3]]
[1] "Hello!"



You can also turn an object into a list by using the `as.list()` function. Notice how every element of the vector becomes a different component of the list.

#### Manipulating a List

We can put names on the components of a list using the `names()` function, which is useful for extracting components. We could have also named the components when we created the list.


In [4]:
names(list1) <- c("Numbers", "Some.data", "Letters")

#### Extracting Components

The first way you can extract an object from the list is by using the [[ ]] operator. 

In [5]:
list1[[3]]

It's also possible to extract components using the component’s name, as shown below:

In [6]:
list1$Letters

#### Subsetting a List

If you want to take a subset of a list, you can use the `[ ]` operator and `c()` to choose the components: 

In [8]:
list1[c(1, 3)]

We can also add a new component to the list or replace a component using the [[ ]] operator, such as the following example: 


In [9]:
list1[[5]] <- "new component"

Finally, we can delete a component of a list by setting it equal to NULL:

In [10]:
list1$Letters <- NULL

### Describing Lists

Now we'll go over ways in which we can extract list properties.  

#### Class

The class of the list and the class of one of the components of the list.

In [11]:
class(list1)

In [12]:
class(list1[[1]])

#### Size 

You can find the size of a list with the `length()` method, like in the following:


In [13]:
length(list1)

#### Converting

Finally, we can convert a list into a matrix, dataframe, or vector in a number of different ways. The first, most basic way is to use unlist(), which just turns the whole list into one long vector:

In [14]:
unlist(list1)

# Challenge 

1. Create a new vector that contains 0 to 6. Assign this vector to the variable `f`.

2. Create a new vector that contains the value 0 repeated 5 times. Assign this vector to the variable `r`.

3. Create a list with vectors `f` and `r`, as well as with the element, 'hello'. Assign this list to the variable `list1`.


In [None]:
# Question 1

In [None]:
# Question 2

In [None]:
# Question 3

## DataFrame

DataFrames are used to store tabular data. It's similar to a matrix in that there are rows and columns, but it's different because every element does <b>not</b> have to be the same class. In a dataFrame, you can put list of vectors containing different classes. This means that every column of a data frame acts like a list. 


In [15]:
df <- data.frame(name = c("ojas","jacob","mary","helen"), score = c(67,56,87,91))
print(df)

   name score
1  ojas    67
2 jacob    56
3  mary    87
4 helen    91


DataFrame objects are incredibly useful when working with data that has relational relationships, such as a csv file. You'll soon see the extent to which these become useful soon enough!


# Challenge

Using the variable `df1`, create a 3x3 dataframe using three lists.


To summarize this succinctly, 


| Structure | Multidimension | Multiple Types |
| --------- |:--------------:| --------------:|
| Vector    | Not Capable    | Not Capable    |
| Matrix    | Capable        | Not Capable    |
| List      | Not Capable    | Capable        |
| DataFrame | Capable        | Capable        |



## Functions

A function is a block of code that we invoke by using the function name with parenthesis `()`.

We initialize a function with the reserved function `function()`. In the function's parenthesis we state the parameters it takes when called.

We call this "function declaration". It looks like this: `y <- function(x)`. Recall that x is the input and y is the output.  

More broadly put, we have something like: 

```R
function_name <- function( arguments ) {
  body - returns some computation of the arguments 
}
```

The arguments are the input and the body is the output.

# Challenge 

1.  Write a function called `fun1` that takes x as in input and returns 2x.  what value do we get when we run 5 through the function? 

2. Write a function called `fun2` that takes two inputs, a and b,  and returns (a + b) ^2. 

3.  Write a function called `fun3` that takes two inputs, a and b,  it returns a list.  the first element of the list return the opperation a+b (call this `add`)  and second element will return the opperation a-b ( call it `sub`).  


In [None]:
# Question 1

In [None]:
# Question 2

In [None]:
# Question 3

# Control Flow

In the context of programming, control flow allows a programmer to specify when a specific block of code is executed. Similar to how functions execute their block of code whenever they're called, control flow statements are executed according to our specifications.

### Booleans

We briefly reviewed booleans a data type in R. To expand further on this concept, let's review what this actually means for us. Booleans are true or false values, and are generally output as a result of a **condition**. 

## If/Else

R, among many other programming languages, provides us with an if/else statement to test a <b>condition</b> in our code. Conditions allow us to have control flow in our R programs, which means we have control over whether a particular piece of code is run or not. Our code decides this by testing the condition. 

A condition is an expression that functions similar to a question and evaluates to either True or False. Below is the syntax:

``` R
if (condition) {
         ## statement 1
} 
else {
         ## statement 2
}
```

Multiple conditions can be combined with repeating if - else. Below is the syntax for 4 conditions: 

```R
if (condition 1) {
         ## statement 1
} else if (condition 2) {
         ## statement 2
} else if (condition 3) {
         ## statement 3
} else
         ## statement 4
}
```
   

# Challenge 

In the following challenges, write if-else statements and test them with the values -1, 0, 1.

1. If an input is greater than zero, print "this value is positive", else print "this value is negative or zero".

2. If an input is greater than zero, print double the value, else return triple the value.

3. If x is greater than 0, it prints "positive", if x is less than 0, it prints "negative", and  if x is equal to 0, it prints "zero".

4. If x is greater than 0, it prints the value doubled, but if x is less than 0, it returns the value tripled. Lastly, if x is equal to 0, it prints -100.


In [None]:
# Question 1

In [None]:
# Question 2

In [None]:
# Question 3

In [None]:
# Question 4

### For Loops

Remember those lists and vectors we made earlier? The ones that hold multiple values? If you don't, here is an example.

In [17]:
list1 <- list("dog", "cat", "bird", "turtle", "fish", "hamster", "lizard")

What if we  wanted to print every value in that list? We could do something like this:


In [18]:
print(list1[1])
print(list1[2])
print(list1[3])
print(list1[4])
print(list1[5])
print(list1[6])

[[1]]
[1] "dog"

[[1]]
[1] "cat"

[[1]]
[1] "bird"

[[1]]
[1] "turtle"

[[1]]
[1] "fish"

[[1]]
[1] "hamster"



But there is an easier way to do this: For loops. Loops are fundamental to all programming languages. Their purpose is to iterate through a data structure and interact with each element one by one.

Loops are very powerful programming tools, and you'll use them fairly frequently. They're useful because computers are very good at repeating identical or similar tasks without making errors.

Let's use a for loop to print each pet from the above example:

In [19]:
for (i in list1) { 
   print(i)
}

[1] "dog"
[1] "cat"
[1] "bird"
[1] "turtle"
[1] "fish"
[1] "hamster"
[1] "lizard"


Awesome! Let's look at another example below. 

#### Example

Let's write a for loop to print all numbers between 0 and 10. 

In [20]:
for (i in 1:10) {
    print(i)
}

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10


# Challenge

FizzBuzz is a common interview brain teaser. 

- The `fizz_buzz` function will take a number as an argument.
- The function should print all integers starting at one, and going up to, and including, the input number.
- When you print the numbers, if the number you're printing is divisible by 3, print "Fizz" instead.
- When you print the numbers, if the number you're printing is divisible by 5, print "Buzz" instead.
- If the number is divisible by both 3 and 5, print "FizzBuzz".
- If the number is not divisible by 3 or 5, simply print the integer.

Your program's output should look like this:

```
1
2
Fizz
4
Buzz
...
```

### While Loops

Another common loop is the while loop.

This is similar to a for loop, but instead of iterating through a data structure, this loop will continue to run until a condition is no longer true.

Let's look at an example:

In [21]:
i <- 0
while (i < 10) {
   print(i)
   i <- i + 1
}

[1] 0
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9


Here's the flow of execution in a while loop: * Evaluate the condition, i < 10, yielding False or True. * If the condition is false, exit the while statement and continue execution at the next statement. * If the condition is true, execute each of the statements in the body and then go back to step 1.

In the example above, notice that at the end of every turn the loop will increment "i" by 1, so eventually "i" will be greater than 10.

Be careful when using while loops. If your condition always remains true, the loop will never end. This is known as an infinite loop

Below is a similar example except the condition was changed and it has now become an infinite loop:

``` R
i <- 10
while (i > 5) {
   print(i)
   i <- i + 1
}
```

# Challenge

Using a while loop, print all the even numbers between 1 and 10 (including 10)

## Input/Output

Input and Output refers to how our code interacts with a user or computer There are many different methods to input/output data for R. In this section, we'll review the different ways in which R handles input and output.

### Source Command


Source will load and execute a script of R commands.  For instance - if you have saved functions in another file -  you can use source to access the file instead of rewriting the function. Make sure to have both files in the current working directory or to include the path. 

In [22]:
print ("Hello World")

[1] "Hello World"


This function, as shown above, allows us to print "Hello World". 


### Read a CSV

There are many formats and standards of text documents for storing data.  One common format for storing data are delimiter-separated values (CSV or tab-delimited).


### Read a Table

This function can read delimited files and store the results in a data frame.  


# Challenge

1. Use the source command to load the `using_source_example.R` script and define a new variable `var` that runs this function for the value 7. Also print `var`. 

2. Load the file `values_squared.csv` as a variable `csv_var`. Print `csv_var` to see the results. Make sure to not include the first line of the csv file in the header. (hint: look at the header argument)

3. Load the file `values_squared.csv` as a variable `table_var`. Print `table_var` to see the results. Make sure to not include the first line of the csv file in the header. (hint: look at the header argument)

