<div style="display:flex;margin: 20px 10%">
<img src="http://i.imgur.com/BV0CdHZ.png?2" alt="ADSA logo" style="height:200px;vertical-align:middle"/>
</div>

# Fall 2018 ADSA Workshop - Intro to R Programming

The workshop is adapted from:
* http://www.tutorialspoint.com/r/r_tutorial.pdf
* https://stat.ethz.ch/R-manual/R-devel/library/graphics/html/hist.html
* https://www.tutorialspoint.com/r/r_scatterplots.htm
* https://stat.ethz.ch/R-manual/R-devel/library/graphics/html/abline.html
* https://stat.ethz.ch/R-manual/R-devel/library/base/html/replace.html

-----

## Data Types in R:

 There are several data types in R including logical, numeric, integer, complex, and character types. 
 
### Logical:

In [None]:
v <- TRUE
print(class(v))

### Integer and Numeric:

The difference between these two types is that integers can not have decimals in them. R by default assumes all values with numbers are numeric unless specified with an "L"

In [None]:
v <- 2
print(class(v))

In [None]:
v <- 2L
print(class(v))

### Complex:

In [None]:
v <- 3+2i
print(class(v))

### Character:

In [None]:
v <- "ADSA"
print(class(v))


------
## Operations in R:

R allows for multiple different opperations such as:

### Simple Arithmetic:

In [None]:
z <- 2
y <- 4

In [None]:
y + z

In [None]:
y - z

### More complex operations:

In [None]:
sqrt(y)

In [None]:
y ^  (z + y)
y ** (z + y)

------

## Assignments in R:

Assign values in R using the '<-' symbol. In R values of the same data type can be stored together using combine represented by 'c' into vectors.

In [None]:
values <- c(1,2,3,4)
values

In [None]:
names <- c("Jake", "Jessica", "Rachel", "Alex")
names

In [None]:
vector <- c(TRUE,FALSE,23,24,25)
vector

Assign values in functions(arguements) using the '=' symbol. 

In [None]:
median(x=1:10)
x # x will be NULL

In [None]:
median(x<-1:10)
x # has been defined in the user workspace, so it will return 1 2 3 4 5 6 7 8 9 10

R allows accessing values within vectors by using square brackets [] allowing for us to fetch and modify values in the vector.

In [None]:
values[3]
names[2]

In [None]:
values[3] <- 2
values

------

## Lists, Matrixes, and Arrays:

Lists are an R-Object type which can contain many different types of elements such as vectors, functions, and even lists inside.

In [None]:
list1 <- list("Hello", c(1,2,3), 22, cos)
print(list1)


Matrixes are 2D data sets which can be created using vector input.

In [None]:
m = matrix(c('a','b','c','d'), nrow = 2, ncol = 2, byrow = "TRUE")
print(m)

Arrays allows more dimentions to the array.

In [None]:
a <- array(m, dim = c(2,2,2))
print(a)

----

# Using R to Visualize Data


R's primary use however is using many data points rather than a few, and these data points are usually stored in CSV files. CSV stands for comma seperated values and each line contains a row of values and each column is seperated by a comma. The first line is assumed to be the labels for each column. This data can be visualize in many different ways as we will explore below.

`setwd(dir)`

*dir*: path to the folder you want to be working in

In [None]:
# Set up the current working directory
setwd("/home/nbuser/library")

In [None]:
# Read csv into a data frame
data = read.csv("tips.csv")
head(data)



Single Boxplot:

In [None]:
boxplot(data$totbill, main="How Much People Spent For a Meal", ylab = "Amount of money spent in US dollars")

Exercise: Boxplot for the tips people given for a meal

Exercise: Class-conditional Boxplot money spent for a meal by day

Exercise: Class-conditional Boxplot based on day or night

Class Conditional Boxplot for money spent based on table size: Total and Per Person

In [None]:
boxplot(totbill~size, data = data, main = "How Much People Spent For a Meal - Table Size", xlab = "Number of people for one table", ylab = "Amount of money spent in US dollars")
data$billPerPerson = data$totbill / data$size
boxplot(billPerPerson~size, data = data, main = "How Much People Spent For a Meal - Table Size", xlab = "Number of people for one table", ylab = "Amount of money spent per person")

### Constructing Histograms:

In [None]:
hist(data$totbill)

Setting up breakpoints in the histogram

In [None]:
breakpoints <- seq(0, 56, 2)
hist(data$totbill, breaks = breakpoints, main = "How Much People Spent For a Meal", xlab = "Amount of Money Spent in US Dollars", ylab = "Number of Transactions")

In [None]:
# Setting Up the limit on x/y-axis
hist(data$totbill, breaks = breakpoints, main = "How Much People Spent For a Meal", xlab = "Amount of Money Spent in US Dollars", ylab = "Number of Transactions", xlim = c(0, 20))
hist(data$totbill, breaks = breakpoints, main = "How Much People Spent For a Meal", xlab = "Amount of Money Spent in US Dollars", ylab = "Number of Transactions", ylim = c(0, 40))

Creating Conditional Histograms based on How much spent for a Meal across days and across time

In [None]:
library(lattice)
histogram(~totbill | day, data = data, main = "How Much People Spent For a Meal - Across Days", xlab = "Amount of Money Spent", ylab = "Percent of Total")
histogram(~data$totbill | data$time, main = "How Much People Spent For a Meal - Across Time", xlab = "Amount of Money Spent", ylab = "Percent of Total")

### Constructing Scatter Plots:

In [None]:
data = read.csv("Stat100_2013fall_survey01.csv")
head(data)
# Scatter plot
plot(data$height, data$weight, xlab = "Height", ylab = "Weight", main = "Height-Weight Relationship for Students in Stat 100")
# Regression Line
# Notice that plot(x, y) and abline(lm(y~x))
abline(lm(data$weight~data$height))
# lm(data$weight~data$height)

In [None]:
# Label for Scatter Plots
data$genderChar = substr(data$gender, 1, 1)
data$genderChar = toupper(data$genderChar)
plot(data$height, data$weight, xlab = "Height", ylab = "Weight", main = "Height-Weight Relationship for Students in Stat 100 - Gender", type="n")
text(data$height, data$weight, labels=as.character(data$genderChar))

In [None]:
# Bonus: Color Different Categories in Scatter Plot
data$genderChar = replace(data$genderChar, data$genderChar == 'M', 1)
data$genderChar = replace(data$genderChar, data$genderChar == 'F', 2)
class(data$genderChar) <- "numeric"
plot(data$height, data$weight, xlab = "Height", ylab = "Weight", main = "Height-Weight Relationship for Students in Stat 100", col = c("blue", "red")[data$genderChar])
legend("topright", legend = c("M", "F"), col=c("blue", "red"), pch=1)

In [None]:
plot(data$ageMother, data$ageFather, main = "Age for Mother and Father", xlab="Age of Mother", ylab = "Age of Father", col = c("blue", "red")[data$genderChar])
legend("bottomright", legend = c("M", "F"), col=c("blue", "red"), pch=1)