## binder link to this notebook:

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ciakovx/ciakovx.github.io/master?filepath=RBasics.ipynb)

# R Basics

<div>
    <br>
    <a>
        <img src="images/R.png" style="width: 200px;">
    </a>
    <br>
</div>


## Writing &amp; Evaluating Expressions

The *prompt* is the blinking cursor in a Code cell prompting you to take action. We type *expressions* into the prompt, and press Ctrl + Enter to *evaluate* those expressions.

Evaluate this expression:

In [None]:
# Press Ctrl + Enter or click the Run button to evaluate 2 + 2
2 + 2

## Assigning values

The first operator you're going to come across is the assignment operator. This is the angle bracket (AKA the "less than"" symbol): `<`, which you'll get by pressing **Shift + comma** and the hyphen `-` which is located next to the zero key. There is no space between them, and it is designed to look like a left pointing arrow `<-`. 



In [None]:
# assign 5 to y
y <- 5

Here I am creating a symbol called `y` and I'm *assigning* it the numeric value 5. Some R users would say "y *gets* 5." Lowercase `y`, is now a *numeric vector* with one element. Or you could say `y` is a numeric vector, and the first element is the number 5. When you assign something to a symbol, nothing happens in the console, but in the Environment pane in the upper right, you will notice a new object, y.

If you now type y into the console, and press Enter on your keyboard, R will evaluate the expression. In this case, R will print the elements that are assigned to y (the number 5). We can do this easily since y only has one element, but if you do this with a large dataset loaded into R, it will obliterate your console because it will print the entire thing. The [1] indicates that the number 5 is the first element of this vector.

In [None]:
# evaluate y
y

You can also use the `print()` function:

In [None]:
# using print() will do the same thing as just typing the variable in, it just makes it explicit
print(y)

---
**TRY IT YOURSELF**

1. Use the new code cells below. Make sure that 5 is assigned to `y` by typing in `y <- 5`
2. Assign the number 10 to variable `x`. Add `x` and `y` and evaluate the expression.
3. Assign `x + y` to variable `myTotal`. 

In [None]:
# assign 5 to y


In [None]:
# assign 10 to x


In [None]:
# assign the sum of x and y to myTotal. Print myTotal to the console.




### Tips for assigning values

* **Do not use names of functions that already exist in R:** The assignment operator assigns a value to a symbol. We can pretty much pick any symbol, or name, for that variable, as long as it is not already a function in R. For example, you wouldn't want to name a variable `sum` because if you might end up in a confusing situation writing `sum(sum)`
* **R is case sensitive**: It is important to note that R is *case sensitive.* If you try evaluating a capital `Y`, you will be told `Error in eval(expr, envir, enclos): object 'Y' not found`.
* **No blank spaces or symbols other than underscores**: R users get around this in a couple of ways, either through capitalization (e.g. `myData`) or underscores (e.g. `my_data`). 
* **Do not begin with numbers or symbols**: Try to evaluate `1z &lt;- 4` or `%z &lt;- 4` and read the error message.
* **Be descriptive, but make your variable names short**: It's good practice to be descriptive with your variable names. If you're loading in a lot of data, choosing `myData` or `x` as a name may not be as helpful as, say, `ebookUsage`. Finally, keep your variable names short, since you will likely be typing them in frequently.



## Calling a function

R is a “functional programming language,” meaning it contains a number of functions you use to do something with your data. Call a function on a variable by entering the function into the console, followed by parentheses and the variables. 

In [None]:
# take the sum of 2 + 2
sum(2, 2)

In [None]:
?sum

Typing a question mark before a function will pull the help page up in the Navigation Pane in the lower right. Type `?sum` to view the help page for the `sum` function. You can also call `help(sum)`. This will provide the description of the function, how it is to be used, and the arguments. 

In the case of `sum()`, the ellipses `. . .` represent an unlimited number of numeric elements. `sum()` also takes the argument `na.rm`. This is a logical (`TRUE/FALSE`) argument specifying if NA values (missing data) should be removed when the argument is evaluated.

The function `is.function()` will check if an argument is a function in R. If it is a function, it will print `TRUE` to the console.


In [None]:
# confirm that sum is a function
is.function(sum)

In [None]:
# sum takes an unlimited number (. . .) of numeric elements
sum(3, 4, 5, 6, 7)

In [None]:
# evaluating a sum with missing values will return NA
sum(3, 4, NA)

In [None]:
# look at the help file for sum
?sum

In [None]:
# but setting the argument na.rm to TRUE will remove the NA
sum(3, 4, na.rm = TRUE)

Functions can be nested within each other. For example, `sqrt()` takes the square root of the number provided in the function call. Therefore you can run `sum(sqrt(9), 4)` to take the sum of the square root of 9 (3) and add it to 4. Or you could write the quadratic formula: `[(-b) + sqrt((b^2) - 4ac)] / (2*a)`.

## The `c()` function

A vector is a sequence of elements of the same *type*. Vectors can only contain "homogenous" data--in other words, all data must be of the same type. The type of a vector determines what kind of analysis you can do on it. For example, you can perform mathematical operations on `numeric` objects, but not on `character` objects.

Another important function is `c()` which will combine arguments to form a vector. In some programs, such as Excel, this is called *concatenation*. If you read the help files for `c()` by calling `help(c)`, you can see that it takes an unlimited `. . .` number of arguments.

In [None]:
# use c() to combine three numbers into a vector, myFives
myFives <- c(5, 10, 15)

In [None]:
# call str() to see that myFives is a numeric vector of length 3
str(myFives)

In [None]:
# adding 5 will operate on each element of the vector. More on vectors below.
myFives + 5

## Missing values

If our data is missing values, we can use `NA` to represent those. R functions have special actions when they encounter NA. How you deal with missing data in your analysis is a decision you will have to make--do you remove it entirely? Do you replace it with zeros? That will depend on your own methodological questions.

You can use `is.na()` to test if a value is NA or not. Conversely, use `complete.cases()` to test if a value is not missing.

In [None]:
# Create a vector combining a sample of the first 5 letters of the alphabet and a repetition of NA five times `rep(NA, 5)`
my_nas <- c(letters[1:5], rep(NA, 5))

# print my_nas to the console
my_nas

In [None]:
# Which values are NA?
is.na(my_nas)

In [None]:
# Which values are complete? 
complete.cases(my_nas)

In [None]:
# use the brackets to subset the data and include complete cases only
my_nas[complete.cases(my_nas)]

Helpful functions to provide information about vectors:

* `length()` : number of elements in the vector
* `class()` : returns the data type
* `str()` : compactly displays infomration about the vector
* `is.logical()`, `is.numeric()`, `is.character()`, etc. : verifies the data type (TRUE or FALSE)
* `as.logical()`, `as.numeric()`, `as.character()`, etc. : coerces the vector from one data type to another
* `is.na()` or `complete.cases()` : returns logical (TRUE/FALSE) vector of values that are NA or, conversely, that are complete

---
**TRY IT YOURSELF**

1. Use `is.function()` to check if `average` is a function. Use it to see if `mean` is a function.
2. Combine 5, 10, 15, NA into a vector `my_vec`
3. Write an expression to get the average of the numbers in `my_vec`. Remove the NA if necessary.

In [None]:
# Assign the sum of 20 and 40 to variable `mySum`


In [None]:
is.function(average)

In [None]:
is.function(mean)

In [None]:
# Combine 5, 10, 15, NA into a vector `my_vec`


In [None]:
# Write an expression to get the average of the numbers in `my_vec`. Remove the NA if necessary.


## Subsetting vectors

You can use the brackets to subset a vector. Brackets can take either numeric values (which will correspond to the element in the order it exists in the vector) or logical (T/F) values. You can also use functions such as `which()`, that return numeric values.

In [None]:
# state.name is a built in vector in R of all U.S. states
state.name

In [None]:
# use [1] after the state.name vector to print the first state
state.name[1]

In [None]:
# use [1:5] to print the first 5 states
state.name[1:5]

In [None]:
# You must use the `c()` function if you have more than one non-consecutive value
state.name[c(1, 10, 20)]

In [None]:
# Using two equals == will scan through each value in the vector and check to see if the "Alabama" string matches.
# Notice it returns only one TRUE: the first item
state.name == "Alabama"

In [None]:
# which elements in the state.name vector match "Alabama?"
which(state.name == "Alabama")

In [None]:
# subset state.name using the same TRUE/FALSE index
state.name[state.name == "Alabama"]

---
**TRY IT YOURSELF**

In [None]:
# Subset `state.name` to include only element number 25. What state is element 25?



In [None]:
# Subset `state.name` to include elements 3, 12, and 40. You will need to use the `c()` function.


In [None]:
# Alabama is element number 1. What element number is New York?
