# Introduction to R
## Chris Hodapp <hodapp87@gmail.com>

## CincyFP, 2016 December 13

## Initial notes

This is all done in Jupyter (formerly IPython) and IRkernel.
- https://jupyter.org/
- https://irkernel.github.io/
- Or "`docker run -d -p 8888:8888 jupyter/r-notebook`" and knock yourself out

![](r-matey.png)

## What is R?

- An interpreted, dynamically-typed language based on S and made mainly for interactive use in statistics and visualization

- Sort of like MATLAB, except statistics-flavored and open source

- A train-wreck that is sometimes confused with a real programming language.
  - *"R is a dynamic language for statistical computing that combines lazy functional features and object-oriented programming. This rather unlikely linguistic cocktail would probably never have been prepared by computer scientists, yet the language has become surprisingly popular."*
  - The R Inferno (Patrick Burns), http://www.burns-stat.com/pages/Tutor/R_inferno.pdf

## So... why use it at all?

- Stable and documented extensively!
- Excellent for exploratory use interactively!
- Epic visualization!
- Magical, fast, and elegant for arrays, tables, vectors, and linear algebra!
- Huge standard library!
- Packages for everything else on CRAN!
- Still sort of FP!
- Excellent tooling! (Sweave, Emacs & ESS mode, RStudio, Jupyter...)

## How do I use R?

*Do you need plotting or visualization?*
Use [ggplot2](http://ggplot2.org/). Completely ignore built-in plotting.

*Do you need to transform tabular/vector/list/array/matrix/DataFrame data somehow?*
Just use [dpylr](https://cran.r-project.org/package=dplyr) or [reshape2](http://seananderson.ca/2013/10/19/reshape.html). Completely ignore built in `*apply` functions.

*Do you need something else?* Search [CRAN](https://cran.r-project.org/).

*Does no CRAN package solve your problems? Do you need to write "real"(tm) software for production?* Strongly consider giving up.

# Rough Outline

## Core
- Vectors, Boolean vectors, indexing
- Matrices & indexing
- Defining functions
  - Function composition
  - Function scoping
  - Pass-by-value
- Dataframes (mtcars)
- Missing values
- Factors

## Data transformation
- Wide-format vs. long-format

## Visualization

## reshape2, dplyr

## Odds and ends
- `help(...)` or `? ...`
- Tab-completion in Jupyter
- SparkR (if possible)

# Other references

- Official R intro: https://cran.r-project.org/doc/manuals/R-intro.html
- Evaluating the Design of the R Language (Morandat, Hill, Osvald, Vitek): http://r.cs.purdue.edu/pub/ecoop12.pdf
- Impatient R, http://www.burns-stat.com/documents/tutorials/impatient-r/
- R: The Good Parts, http://blog.datascienceretreat.com/post/69789735503/r-the-good-parts
- ISLR (Intro. to Statistical Learning in R): http://www-bcf.usc.edu/~gareth/ISL/
- ESL (Elements of Statistical Learning): http://statweb.stanford.edu/~tibs/ElemStatLearn/

In [5]:
x <- 1:10

In [6]:
x < 5

In [7]:
x[x < 5]

In [8]:
sum(x)

In [9]:
x*10

In [10]:
x + 10

In [11]:
x * x

In [13]:
intersect(x, 5:20)

In [14]:
union(x, 5:20)

In [15]:
x

In [16]:
mean(x)

In [54]:
sqrt(-1)

“NaNs produced”

In [57]:
NA

[1] NA

In [60]:
v_na <- c(0,1,2,3,4,5,6,7,NA,NA,NA)

In [67]:
mean(v_na, na.rm = TRUE)

In [19]:
matrix(c(0,1,2,3), nrow=2)

0,1
0,2
1,3


In [20]:
class(x)

In [21]:
typeof(x)

In [22]:
v <- c(0,1,2)

In [23]:
names(v)

NULL

In [24]:
names(v) <- c("x", "y", "z")

In [25]:
v

In [26]:
v

In [27]:
0:2

In [28]:
v["x"]

In [87]:
c(0,20)

In [90]:
1:10

In [91]:
c(0,20) + 1:10

In [30]:
mtcars

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1
Duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4
Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
Merc 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4


In [35]:
mtcars["gear"]

Unnamed: 0,gear
Mazda RX4,4
Mazda RX4 Wag,4
Datsun 710,4
Hornet 4 Drive,3
Hornet Sportabout,3
Valiant,3
Duster 360,3
Merc 240D,4
Merc 230,4
Merc 280,4


In [39]:
class(mtcars)

In [40]:
L <- mtcars$am == 0 

In [41]:
L

In [43]:
mtcars[L,]

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1
Duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4
Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
Merc 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4
Merc 280C,17.8,6,167.6,123,3.92,3.44,18.9,1,0,4,4
Merc 450SE,16.4,8,275.8,180,3.07,4.07,17.4,0,0,3,3
Merc 450SL,17.3,8,275.8,180,3.07,3.73,17.6,0,0,3,3


In [44]:
? mtcars

In [68]:
f <- function(x) {
    x^2
}

In [71]:
compose <- function(f1, f2) {
    function(x) { f1(f2(x)) }
}

In [73]:
compose(f, function(x) { x + 2 })(5)

In [52]:
sum(mtcars$carb)

In [77]:
library(ggplot2)

In [82]:
? save

In [83]:
cbind(mtcars, mtcars)

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,⋯,cyl.1,disp.1,hp.1,drat.1,wt.1,qsec.1,vs.1,am.1,gear.1,carb
Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,⋯,6,160.0,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,⋯,6,160.0,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,⋯,4,108.0,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,⋯,6,258.0,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,⋯,8,360.0,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,⋯,6,225.0,105,2.76,3.46,20.22,1,0,3,1
Duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,⋯,8,360.0,245,3.21,3.57,15.84,0,0,3,4
Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,⋯,4,146.7,62,3.69,3.19,20.0,1,0,4,2
Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,⋯,4,140.8,95,3.92,3.15,22.9,1,0,4,2
Merc 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,⋯,6,167.6,123,3.92,3.44,18.3,1,0,4,4


In [84]:
rbind(mtcars, mtcars)

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX4,21.0,6,160.0,110,3.90,2.620,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160.0,110,3.90,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108.0,93,3.85,2.320,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360.0,175,3.15,3.440,17.02,0,0,3,2
Valiant,18.1,6,225.0,105,2.76,3.460,20.22,1,0,3,1
Duster 360,14.3,8,360.0,245,3.21,3.570,15.84,0,0,3,4
Merc 240D,24.4,4,146.7,62,3.69,3.190,20.00,1,0,4,2
Merc 230,22.8,4,140.8,95,3.92,3.150,22.90,1,0,4,2
Merc 280,19.2,6,167.6,123,3.92,3.440,18.30,1,0,4,4


In [97]:
pbv <- function(df) {
    df$carb = 0
}

In [98]:
pbv(mtcars)

In [100]:
head(mtcars)

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX4,21.0,6,160,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225,105,2.76,3.46,20.22,1,0,3,1


In [105]:
mtcars[,c("mpg", "cyl")]

Unnamed: 0,mpg,cyl
Mazda RX4,21.0,6
Mazda RX4 Wag,21.0,6
Datsun 710,22.8,4
Hornet 4 Drive,21.4,6
Hornet Sportabout,18.7,8
Valiant,18.1,6
Duster 360,14.3,8
Merc 240D,24.4,4
Merc 230,22.8,4
Merc 280,19.2,6


In [110]:
mtcars[1:3,]

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX4,21.0,6,160,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108,93,3.85,2.32,18.61,1,1,4,1


In [113]:
mtcars$disp > 100

In [117]:
mtcars[mtcars$disp > 150 & mtcars$disp < 200,]

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
Merc 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4
Merc 280C,17.8,6,167.6,123,3.92,3.44,18.9,1,0,4,4
