## Topics Covered

1. The case for R
2. The command line + variables
3. Types of objects - numeric, int, character, data frames
    - indexing
4. Functions
    - creating functions
5. Loops + if/else blocks
6. Break! (I think)
7. Importing data
8. Tidying data + intro to the tidyverse packages
9. Data vis

### Why R?

While spreadsheet environments like Excel are still quite powerful and useful for data work, the ability to programatically manipulate data brings with it some unique advantages.

- **Scalable** - Code that analyses a certain dataset is size-agnostic - the same code could theoretically be used to work with larger inputs that still have the same structure. 
- **Reproducible** - A code-based analysis inherently creates workflow documentation, which can be useful to refer back on. 
- **Efficient** - Code can easily be converted into an executable script, which helps automate tasks.

Why R specifically?

- it's designed specifically by statisticians to be a statistical programming environment
- its base functionality allows for publication-quality figures
- its functionality has been expanded with hundreds of useful packages
- it's entirely free and open source!

### RStudio or Jupyter Notebook?

Either works, honestly - it's all a matter of preference.

Jupyter is arguably more conducive to documenting a research workflow with its inline outputs and Markdown functionality, but RStudio's interface makes it great for putting together scripts and debugging. RStudio also contains a host of useful point-and-click features for things like importing data and viewing data frame structures.

### The Command Line

In [None]:
1 + 1

In [None]:
my.num <- 1

In [None]:
my.num + 2

In [None]:
class(my.num) 

### Types of Objects

In [None]:
class(my.num) 

In [None]:
class(2) 

In [None]:
my.word <- 'hey'
class(my.word)

In [None]:
as.integer(2) 

In [None]:
as.integer(2.5) 

In [None]:
as.numeric(2) 

In [None]:
as.character(2) 

In [None]:
# R allows for easy vectorization
my.nums <- c(1,2,3) 
my.nums

In [None]:
length(my.nums) 

In [None]:
class(my.nums) 

In [None]:
my.nums[1] 

In [None]:
# can also check object type
is.numeric(my.nums) 

In [None]:
class(TRUE) 

In [None]:
class(is.numeric(my.nums)) 

In [None]:
my.list <- list(1, 2, 'hey')

In [None]:
class(my.list)

In [None]:
mydf <- data.frame(col1 = c(1,2,3), col2 = c(5,6,7)) 

In [None]:
mydf

In [None]:
mydf[1] 

In [None]:
mydf[1,2] 

In [None]:
mydf$col1

In [None]:
mydf[,1] 

In [None]:
mydf[1,] 

In [None]:
otherdf <- data.frame(a = c(1,5,7), b = c(2,5,8)) 
otherdf

In [None]:
otherdf[which(otherdf$a == otherdf$b),] 

In [None]:
# a brief aside on comparison operators
1 == 1

In [None]:
2 < 3

In [None]:
2 <= 3

In [None]:
# careful!
2 =< 3

### Functions

In [None]:
my.nums

In [None]:
mean(my.nums) 

In [None]:
?seq

In [None]:
seq(from = 0, to = 10, by = 2) 

In [None]:
# args can also be positional
seq(0, 10, 2) 

In [None]:
# creating functions

sumsquares <- function(num1, num2){
    out <- num1**2 + num2**2
    return(out)
} 

In [None]:
sumsquares(2, 3) 

### Loops

In [None]:
my.nums

In [None]:
for (num in my.nums){
    print(num) 
}

In [None]:
for (num in my.nums){
    print(num * 2)
} 

In [None]:
my.nums <- c(my.nums,4,5,6) 

In [None]:
my.nums

In [None]:
for (num in my.nums){
    if (num < 5){
        print(num)
    }
} 

In [None]:
for (num in my.nums){
    if (num %% 2 == 0){
        print(num)
    } 
    else {
        print('odd')
    }
} 

In [None]:
for (num in my.nums){
    if (num %% 2 == 0){
        print(num)
    } 
    else {
        next
    }
} 

### Importing Data + Tidying Data

Now for the actual fun stuff!

In [None]:
mydata <- read.csv('weather072003-122006.csv') 

In [None]:
head(mydata)

Well, this doesn't look good. How can we import this without adding in the metadata?

In [None]:
mydata <- read.csv('weather072003-122006.csv', skip = 18, header = TRUE) 

In [None]:
head(mydata) 

In [None]:
dim(mydata) 

In [None]:
colnames(mydata) 

In [None]:
library(tidyverse) 

In [None]:
?select

In [None]:
mydata <- select(mydata, -ends_with('Flag')) 

In [None]:
colnames(mydata) 

In [None]:
head(mydata) 

In [None]:
colnames(mydata)[1:6] <- c('date', 'year', 'month', 'meanmax', 'meanmin', 'mean')

In [None]:
head(mydata) 

In [None]:
ggplot(mydata, aes(x = date, y = mean)) +
    geom_point() 

In [None]:
# lots of data - let's filter it
unique(mydata$year) 

In [None]:
data03 <- filter(mydata, year == 2003) 

In [None]:
head(data03)

In [None]:
str(data03) 

In [None]:
plot03 <- ggplot(data03, aes(x = date, y = mean))

plot03 + geom_col() 

In [None]:
ggplot(mydata, aes(x = date, y = mean, group = 1)) + 
geom_line() +
geom_line(aes(y = meanmax), col = 'red') +
geom_line(aes(y = meanmin), col = 'green') +
geom_smooth(method = lm, se = FALSE) 

In [None]:
# looking at outbreak trends in Ontario

outbreaks <- read.csv('outbreaks20112017.csv', header = TRUE) 

In [None]:
dim(outbreaks) 

In [None]:
head(outbreaks) 

In [None]:
outbreaks <- separate(outbreaks, Year.Week, c('year', 'week')) 

head(outbreaks) 

In [None]:
colnames(outbreaks) 

In [None]:
outs2013 <- filter(outbreaks, year == 2013) 

In [None]:
head(outs2013) 

In [None]:
yearouts <- outbreaks %>%
    select(year:end.date, entero.rhinovirus.outbreaks) %>%
    group_by(year) %>%
    summarise(count = sum(as.numeric(entero.rhinovirus.outbreaks))) 

yearouts

In [None]:
ggplot(yearouts, aes(x = year, y = count)) + geom_col() 

In [None]:
ggplot(yearouts, aes(x = year, y = count, group = 1)) + geom_line() 

In [None]:
fluA <- outbreaks %>%
    select(year:end.date, influenza.A.outbreaks) %>%
    group_by(year) %>%
    summarise(count = sum(as.numeric(influenza.A.outbreaks))) 

fluA

In [None]:
ggplot(yearouts, aes(x = year, y = count, group = 1)) + 
    geom_line(col = 'green') +
    geom_line(data = fluA, aes(x = year, y = count, group = 1), col = 'red') 