# Introduction to ![R][Rlogo]

[Rlogo]:https://jupyterhub.med.utah.edu/user/halpo/kernelspecs/ir/logo-64x64.png



## What is R?

* R is what comes after S
* Homepage and downloads: https://cran.r-project.org
* R is a language for data analysis.
* R is an ecosystem of packages
    + [Comprehensive R Archive Network](https://cran.r-project.org)
    + Currently over 10,000 packages
    + Wild Wild West.
    
![Wild Wild West](https://media.giphy.com/media/fKKFjFX2CJd7y/giphy.gif)

Wondering why R vs Python, or which is better?  The great folks at [data camp](https://www.datacamp.com) gave us the rundown.

![R-vs-Python](http://blog.datacamp.com/wp-content/uploads/2015/05/R-vs-Python-216-2.png)


They just got one thing wrong.  
R is clearly the better language.

## How I'll teach R

### I will teach
* state of the aRt
* tidyverse
* ggplot2
* jupyter
* through examples

### I will **not** teach
* legacy methods
* base graphics
* bad form
* less than optimal methods

## Using R

Just use [Rstudio](http://rstudio.com).

* Full IDE
* R Notebooks ↔ Jupyter
* Includes performance enhancements

## The [tidyverse](http://tidyverse.org) ![tidyverse](http://tidyverse.tidyverse.org/logo.png)

The tidyverse is the state of the art for data analysis in R.

### ![dplyr](http://dplyr.tidyverse.org/logo.png)

For all your data manipulation needs.

* `mutate()` - data transformations
* `select()` - select variables of data
* `filter()` - subsetting
* `summarise()` - summarizing data or groups of data.
* `arrange()` - sorting data
* `group_by()` - perform opperations on subsets.

#### Addendum: dbplyr

The dbplyr package is also needed for our work.  It provides the database backend to perform many data operations in place on the server.

### ![forcats](http://forcats.tidyverse.org/logo.png)
**For cat**egorical variables.

* relabels
* grouping
* setting control levels
* etc.

### ![ggplot2](http://ggplot2.tidyverse.org/logo.png)

The grammar of graphics package.  This provides ready to publish high quality graphs with an intuitive interface that puts the emphasis on the data not on the plot.

### ![haven](http://haven.tidyverse.org/logo.png)

To read in data from:

* SAS
* SPSS
* Stata

*We won't use this*

### ![lubridate](http://lubridate.tidyverse.org/logo.png)

Date manipulations:

* reading dates
* extracting parts
* intervals
* calculating duration (age)

### ![magrittr](http://magrittr.tidyverse.org/logo.png)

* The pipe `%>%`
* building unary function pipelines
    + . %>% f1 %>% f2
### ![purrr](http://purrr.tidyverse.org/logo.png)

Functional programming extensions

* `map()` - functional map apply
* `partial()` - partial argument specification

### ![readr](http://readr.tidyverse.org/logo.png)

Reading rectangular data.

*We won't use this*

### ![readxl](http://readxl.tidyverse.org/logo.png)

Guess!

### ![stringr](http://stringr.tidyverse.org/logo.png)

String manipulations.

#### note: the foundation package [`stringi`](https://cran.r-project.org/package=stringi) is also the foundation of many packages.

### ![tibble](http://tibble.tidyverse.org/logo.png)

Smart data abstraction, foundation of the tidyverse, and dplyr in particular.

### ![tidyr](http://tidyr.tidyverse.org/logo.png)

Data reshaping:

* `gather()` - from wide to long
* `spread()` - from long to wide

### ![broom](https://d21ii91i3y6o6h.cloudfront.net/gallery_images/from_proof/13592/small/1466619575/rstudio-hex-broom.png)

Data cleanup.

* `tidy`
* `fix_data_frame`


![It All fits together](https://aberdeenstudygroup.github.io/studyGroup/lessons/SG-T2-JointWorkshop/tidyverse.png)


## Other packages we need or will use


### [`RMySQL`](https://cran.r-project.org/package=RMySQL)

Provides the MySQL DBI comppatible backend used by dbplyr and plyr.

### [`tidytext`](https://cran.r-project.org/package=tidytext)

Text/NLP processing in R


# What you need to know

### Assignment
This is assignment.  When you want to assign something use the arrow.

In [2]:
# this is assignment
x <- 1

### Function calls
calls to functions act very much the way they do in C, C++, Python, etc.

In [3]:
rnorm(5)

Calls can pass arguments by position as above, or by name as below.

In [4]:
rnorm(n=5, mean=100, sd=30)

You can mix them and reorder them

In [5]:
rnorm(5, sd=30)

### Variables
+ Letters, digits, period, underscore, etc
+ Most unicode is fine
+ no spaces
+ Cannot start with digit or underscore, but can start with period


In [6]:
परिवर्तनशील <- "hello world"
परिवर्तनशील

Variables are how everything is stored, data, functions, class definitions, etc.

### Functions

* Functions are created with the `function` keyword and stored in a variable.
* Body of the function is encloded in curly braces `{}`
* Call it by `name(arguments)`

In [8]:
percent <- function(x){
    sprintf("%2.1f%%", x*100)
}
percent(0.10)

functions are just variables so this is cool.

In [12]:
ᚙ <- function(a,b){
    c <- matrix( nrow = nrow(a) * nrow(b)
           , ncol = ncol(a) * ncol(b)
           )
    for (i in 1:nrow(c)) for(j in seq(ncol(c))) {
        c[i,j] <- 
            a[(i-1) /  nrow(b) + 1, (j-1) /  ncol(b) + 1]  *
            b[(i-1) %% nrow(b) + 1, (j-1) %% ncol(b) + 1]
    }
    return(c) #< return statement could have been writen just c.
}
a <- matrix(1:3, 3, 1)
b <- matrix(4:7, 2, 2)

ᚙ(a,b)

0,1
4,6
5,7
8,12
10,14
12,18
15,21


### Element Access
* `x[i]` by position or name
* `x[[i]]` might return something different
* **`x$name` by name**
* `x[["name"]]` equivalent to above
* `x[,"name"]` equivalent to above
* `x[row,]` for row(s) (use `x[start:finish,]` for a range of rows)
* `x[row,col]` for specific location


In [13]:
# x is a data frame, we only deal with 
# data structures that act the same
x <- head(iris)
# gives the first column
x[1]

Sepal.Length
5.1
4.9
4.7
4.6
5.0
5.4


In [15]:
# gives the first row
x[1,]

Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
5.1,3.5,1.4,0.2,setosa


In [19]:
# first 5 rows
x[1:5,]

Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
5.1,3.5,1.4,0.2,setosa
4.9,3.0,1.4,0.2,setosa
4.7,3.2,1.3,0.2,setosa
4.6,3.1,1.5,0.2,setosa
5.0,3.6,1.4,0.2,setosa


In [16]:
# first column as a vector
x$Sepal.Length

In [17]:
# Same
x[["Sepal.Length"]]

In [18]:
# same
x[, "Sepal.Length"]

## Major differences between R and Python

* in R you cannot subset with `x[start:]` like you can in python, this is a syntax error.
* Everything in R is a vector,
    + everthing has a `length`
    + but only some things have a `dim` (dimension); data frames, tables, matrices, etc.
* In R indent means absolutely nothing, but is good for code readability.
* Indexing with negative numbers
    + in python `x[-1]` gives the last element.
    + in R `x[-1]` drops the first element.
    + it is better form to use `head` and `tail` for these.

## Getting help for R
* The `?` operator is quick help but works in interactive sessions, not so helpful in Jupyter.
* Rstudio, which you should be using for lots of R programming, has help as a pane built in.
* [RDocumentation.org](https://rdocumentation.org) a helpful site, done by data camp that gives the documentation for all packages on CRAN.
* [StackOverflow](https://stackoverflow.com/questions/tagged/r)
* [Utah R Users Group](https://sites.google.com/site/utahrug/)
* Mailing lists and special interest groups.
* JFGI (Just <span title="we are still in utah right?">*freaking*</span> [Google](https://google.com) it)