# R workshop - types and syntax

* Dots in identifier names are just part of the identifier. They are not scope operators. They are not operators at all. They are just a legal character to use in the names of things. 

* seq_along(x) rough equivalent of enumerate
* typeof()
* class()

## Resources
* [aRrgh](http://arrgh.tim-smith.us/atomic.html)
* [Hyperpolyglot: Matlab, R , Python](http://hyperpolyglot.org/numerical-analysis)
* [Advanced R](https://adv-r.hadley.nz/) - by Hadley Wickham
    * "According to Wickham's "tidy" approach, each variable should be a column, each observation should be a row, and each type of observational unit should be a table."
* [The R Inferno](http://www.burns-stat.com/pages/Tutor/R_inferno.pdf) - "If you are using R and you think you're in hell, this is a map for you"

# Data types

## Five Main Data Types in R
1. Atomic vector
2. Matrix
3. Array
4. List
5. Dataframe


* Everything in R is referred to as an object.
* All data in R consists of a header of metadata - the object's attributes - and the data structure itself.
* The fundamental data structure in R is the vector, which is essentially a one-dimensional array with attributes. Even the primitive data types in R are vectors. For example, 2 is a single-element vector.
* To reference a single vector element you use v[ [i] ] .
* To reference a sub vector you use v[i].
* For a vector v[i] and v[ [i] ] are almost the same thing as primitive data types are vectors.
* All arithmetic in R is vector-oriented.
* If a vector doesn't have enough elements in a vector expression, its elements are reused.
* Attributes can be used to change the way data structures are used by the system.
* The dim attribute can be used to interpret a one dimensional vector as an n dimensional array.
* A matrix is a 2 x 2 array.
* A one-dimensional vector is not the same as a one-dimensional array because it lacks a dim attribute.


# vector: a 1-D array with homogenous datatype

* "atomic vector" - the simplest R data type.
* Linear vectors of a single primitive type
    * numeric vector - integer literals are suffixed by L
    * character vector
    * logical - TRUE, FALSE, NA means "not available"
        * aRrgh: "Do not use T and F for TRUE and FALSE. You will see people doing it but they’re not your friend; T and F are just variables with default values. Set T <- F and source their code and laugh as it burns."
    * complex
* Extend the vector by assigning past the end of a vector

## "Combine" Functions
* c() - combine into a vector
* cbind() combine objects as columns
* rbind() - combine objects as rows


In [1]:
a<-c(4,5,1,3,4,5)

In [2]:
class(a)

In [3]:
a<-c(4,5,'asfd',1,3,4,5)

In [4]:
class(a)

In [5]:
a

In [28]:
attributes(a)

NULL

In [8]:
dim( cbind( 2,3, 4,5,6) )

In [9]:
class( cbind( 2,3, 4,5,6) )

In [59]:
length( "hello" ) # WTF?? a character atomic vector with length 1

In [60]:
nchar( 'hello')

# matrix: 2-D array with optional row/column names

In [30]:
y<-matrix(1:20, nrow=5,ncol=4)

In [31]:
cells <- c(1,26,24,68)
rnames <- c("R1", "R2")
cnames <- c("C1", "C2")
mymatrix <- matrix(cells, nrow=2, ncol=2, byrow=TRUE,
  dimnames=list(rnames, cnames))

In [32]:
mymatrix

Unnamed: 0,C1,C2
R1,1,26
R2,24,68


In [33]:
attributes( mymatrix)

## Reshape a matrix by assigning to dim

In [36]:
x<-c(1,2,3,4)
dim(x)<- c(2,2)

In [37]:
x

0,1
1,3
2,4


# array 3+-D array

see help(array)

In [44]:
x<- array(0,c(2,3,4))

In [45]:
x

In [49]:
class( attributes(x) )

# List: an ordered collection of objects

In [None]:
w <- list(name="Fred", mynumbers=a, mymatrix=y, age=5.3)

In [None]:
as.matrix(w)

In [None]:
cbind( w, 'hello')

In [50]:
n = c(2, 3, 5) 
s = c("aa", "bb", "cc", "dd", "ee") 
b = c(TRUE, FALSE, TRUE, FALSE, FALSE) 
x = list(n, s, b, 3)   # x contains copies of n, s, b

## List Slicing with single bracket

In [11]:
x[2]

In [12]:
x[2:3]

## List member reference using double bracket [[]]

In [15]:
x[[2]][4] <- "ppppp"

In [16]:
x

In [20]:
x[2][4]

## Subsetting

* \[ - when applied to a list, always returns a list
* [[ - only returns a single value
    * use to pull pieces out of a list
    * use when the var name is stored in a variable
* \$ - shorthand for [[ combined with character subsetting
    * use it for partial matching

### Integer vs Logical Subsetting
* positive integers - return elements at specified positions
* negative integers omit elements at the specified positions
* logical vectors get you the elements where TRUE


### Simplifying vs. preserving: Comparing "[" and "[["

* Atomic vector- "[" keeps names, whereas "[[" does not:
* List - return the object inside the list, not a single element
* factor - drop any unused levels
* matrix or array - if any of the dimensions has length 1, drop that dimension
* data frame - if output is a single column, return a vector and not a data frame

In [1]:
nx <- c(Abc = 123, pi = pi)
nx[1] ; nx["pi"] 
nx[[1]] ; nx[["pi"]]

In [64]:
class( nx[1] )

In [65]:
typeof( nx[1] )

In [67]:
attributes( nx[1]  )

In [68]:
class( nx[[1]] )

In [69]:
typeof( nx[[1]] )

In [70]:
attributes( nx[[1]] )

NULL

## str() function is like glimpse

In [52]:
str( x )

List of 4
 $ : num [1:3] 2 3 5
 $ : chr [1:5] "aa" "bb" "cc" "dd" ...
 $ : logi [1:5] TRUE FALSE TRUE FALSE FALSE
 $ : num 3


## unlist() flattens list into a vector

In [56]:
unlist( x )

## Extraction operator

* The \$ allows you extract elements by name from a named list
* The main difference is that $ does not allow computed indices, whereas [[ does.
* see ?Extract

# data.frame - lists of columns

# Attributes - Object Metadata

* names, dimensions, dimnames, classes, time series attributes
## Get and set attributes with attributes() and attr()

* attributes() function returns a list
* length(): nrow() ncol() for matrices, dim for arrays()
* names(): rownames() colnames(), dimnames()

In [25]:
attr( list, 'names')

NULL

# Casting

* as.integer()
* as.character()
* as.numeric()

# Formula data Type

* express relationship between variables 
* typeof = language, class = formula
* Captures an unevaluated expression
    * The data values that have been assigned to the symbols in the formula are not accessed when the formula itself is created
    * "capture the meaning of this code without evaluating it right away."
    * Captures the context or environment in which the expression was created. Captures the values of variables without evaluating them so they can be interpreted by the function
* Characterized by the tilde operator
    * two-sided formula
        * left hand side of tilde is dependent variable and independent variables on the right hand side
        * one-sided formula has no left side
    * check sidedness using length()
    * access elements of formula using [[]] operator for indices 1, 2, and 3
## Symbols
### Operators built into R
* + - for using multiple indepent variables
* - - for ignoring variables
* : - for inteaction
* * - for crossing
* %in% - for nesting
* ^ - for limit crossing to the specified degree
* I() - the "as-is" operator - "inhibit the interpretation of operators such as "+", "-", "*" and "^" as formula operators, so they are used as arithmetical operators"
* . operator - everything else, all the rest of the variables in the matrix/data.frame
### Additional operators/functionality provided by 3rd party packages
* Multi-response formulas
* |
* ||

## Inspecting Formiulas in R
    * terms()
    * all.vars()
    * update( y ~ x1 + x2, ~ . + x3 ) . # y ~ x1 + x2 + x3

# Magrittr pipes

* %>%
* %\$%
* . placeholder

# foreach
* [A Guide to parallelism in R](https://privefl.github.io/blog/a-guide-to-parallelism-in-r/)

## foreach + %do%
* equivalent to lapply
* nested foreach's with %:%

### Return lists
* foreach(i=1:3) %do% sqrt(i)
* foreach(a=1:3, b=rep(10, 3)) %do% (a + b)
* Can use parens for predicate
### Return things other than lists using .combine arg
* .combine='c' makes vector
* .combine='cbind' makes matrix: Matrix foreach(i=1:4, .combine='cbind') %do% rnorm(4)

## foreach + %dopar%
* tell children which packages to require using .packages arg
* Or better to be explicit and use :: scoping like dplyr::count

## List comprehensions - allows you to add an if clause
* foreach(a=irnorm(1, count=10), .combine='c') %:% when(a >= 0) %do% sqrt(a)

# doParallel

## Ex. 1
```
cl <- parallel::makeForkCluster(2)
doParallel::registerDoParallel(cl)
foreach(i = 1:3, .combine = 'c') %dopar% {
  sqrt(i)
}
parallel::stopCluster(cl)
```

## Ex. 2: Using doParallel::parLapply
```
library(doParallel)  
no_cores <- detectCores() - 1  
registerDoParallel(cores=no_cores)  
cl <- makeCluster(no_cores, type="FORK")  
result <- parLapply(cl, 10:10000, getPrimeNumbers)  
stopCluster(cl) 
```

# Dplyr do()

* always returns a dataframe
* always needs specification of . placeholder
* use with group_by()
* can extract out of . placeholder, i.e., .\$varname