In [7]:
library(xtable)
library(IRdisplay)
library(repr)
data(tli)
tli.table <- xtable(tli[1:20, ])
digits(tli.table) <- matrix( 0:4, nrow = 20, ncol = ncol(tli)+1 )
options(repr.vector.quote=FALSE);

In [5]:
display_html('<link href="https://fonts.googleapis.com/css?family=Open+Sans" rel="stylesheet">
<style>#notebook-container{font-size: 13pt;font-family:\'Open Sans\', sans-serif;} div.text_cell{max-width: 104ex;}</style>')

# R Book 2012

The book can be downloaded with this link:

http://cmas.siu.buap.mx/portal_pprd/work/sites/biologia/resources/PDFContent/168/R.%20Book%202012.pdf

## Week 1

Essentials of the R-language:

* 2.1 Calculations
* 2.3 Generating sequences
* 2.6 Vectors
* 2.7 Vector functions
* 2.8 Matrices and arrays

### Calculations

We can run simple calculations:

In [1]:
log(42,7.3)

In [2]:
5+6+7+8+9

In [3]:
2+3; 5*7; 3-7

#### Complex numbers

In [4]:
z = 3.5-8i
z

In [5]:
Re(z)

In [6]:
Im(z)

In [7]:
Mod(z)

In [8]:
Arg(z)

In [9]:
Conj(z)

In [10]:
is.complex(z)

In [11]:
as.complex(3.8)

#### Rounding

In [12]:
floor(5.7)

In [13]:
ceiling(5.7)

In [14]:
rounded = function(x) floor(x+0.5)

In [15]:
rounded(5.7)

In [16]:
rounded(5.4)

You have to decide how to handle negative numbers, because the concept of up and down is more subtle.

In [17]:
ceiling(-5.7)

In [18]:
floor(-5.7)

To simply strip of the decimal part.

In [19]:
trunc(5.7)

In [20]:
trunc(-5.7)

Rounding can be done with `round(x, decimals)`.

In [21]:
round(5.7, 0)

In [22]:
round(5.75, 1)

The number of digits is not the same as the number of significant digits. You can control the number of significant digits using the function `signif`.

In [23]:
signif(12345678,4)

In [24]:
signif(12345678,3)

#### Arithmetic

In [25]:
7 + 3 - 5 * 2

In [26]:
3^2 / 2

In [27]:
log(10)

In [28]:
exp(1)

In [29]:
log10(6)

In [30]:
log(9,3)

Mathematical functions in R are:

* `log(x)`
* `exp(x)`
* `log(x,n)`
* `log10(x)`
* `sqrt(x)`
* `factorial(x)`
* `choose(n, x)`
* `gamma(x)`
* `lgamma(x)`
* `floor(x)`
* `ceiling(x)`
* `trunc(x)`
* `round(x, digits=0)`
* `signif(x, digits=6)`
* `runif(n)`, generates $n$ random numbers between 0 and 1 from a uniform distribution.
* `sin(x)`, `cos(x)`, `tan(x)`
* `asin(x)`, `acos(x)`, `atan(x)`
* `asinh(x)`, `acosh(x)`, `atanh(x)`
* `abs(x)`

Trigonometric function in R are measured in radians:

In [31]:
pi

In [32]:
sin(pi/2)

In [33]:
cos(pi/2)

#### Modulo and integer quotients

Integer quotients and remained are obtained using the notation `%/%` and `%%` respectively.

In [34]:
21 %/% 10

Module is done like

In [35]:
21 %% 10

Modulo is very useful for testing whether numbers are odd or even.

In [36]:
9 %% 2

In [37]:
8 %% 2

In [38]:
15421 %% 7 == 0

#### Variables names and assigment

* Variable names in R are **case sensitive**.
* Variable names should not begin with numbers.
* Variables names should not contain blank spaces.

In [39]:
x = 5; x

#### Operators

R uses the following operator tokens:

* Arithmetic: `+`, `-`, `*`, `/`, `%/%`, `%%`, `^`.
* Relational: `>=`, `<`, `<=`, `==`, `!=`.
* Logical (not, and, or): `!`, `&`, `|`.
* Model formulae (is modelled as a function of): `~`.
* Assigment: `<-`, `->`.
* List indexing: `$`.
* Create a sequence: `:`.

Several of these operators have different meaning inside model formulae. Thus `*` indicates the main effects plus interaction (rather than multiplication), `:` indicates the interaction between two variables (rather than generate a sequence) and `^` means all interactions op to the indicated power (rather than raise to the power).

#### Integers

The range of integers is from $-2,000,000,000$ to $+2,000,000,000$.

In [40]:
x = c(5,4,7,8); x

In [41]:
is.numeric(x)

Applying the integer function `x <- integer(x)` replaces all your numbers with zeros; definitely not what you intended!

In [42]:
x = c(5,3,7,8)
x = as.integer(x)
is.integer(x)

The integer function works as `trunc` when applied to real numbers, and removes the imaginary part when applied for complex numbers.

In [43]:
as.integer(5.7)

In [44]:
as.integer(-5.7)

In [45]:
as.integer(5.7 -3i)

"imaginary parts discarded in coercion"

#### Factors

Factors are categorical variables that have a fixed number of levels. A simple example of a factor might be a variable called gened with two levels: `female` and `male`.

In [46]:
gender = factor(c('female', 'male', 'female', 'male', 'female'))
class(gender)

In [47]:
mode(gender)

More often, you will create a dataframe by reading your data from a file using `read.table`. When you do this, all the variables contained one or more character strings will be converted automatically into factors.

In [48]:
data = read.table('daphnia.txt')

In [49]:
attach(data)

In [50]:
head(data)

Growth.rate,Water,Detergent,Daphnia
2.919086,Tyne,BrandA,Clone1
2.492904,Tyne,BrandA,Clone1
3.021804,Tyne,BrandA,Clone1
2.350874,Tyne,BrandA,Clone2
3.148174,Tyne,BrandA,Clone2
4.423853,Tyne,BrandA,Clone2


To check if a variable is a factor:

In [51]:
is.factor(Water)

To discover the *names* of the factor, levels:

In [52]:
levels(Detergent)

To discover the *number* of levels of a factor.

In [53]:
nlevels(Detergent)

In [54]:
length(levels(Detergent))

### Generating sequences

An important way of creating vectors is to generate a sequence of numbers.

In [55]:
0:10

In [56]:
15:5

In [57]:
seq(0, 1.5, 0.5)

In [58]:
seq(6, 4, -0.2)

In many cases, you want to generate a sequence to match an existing vector in length.

In [59]:
N <- c(55,76,92,103,84,88,121,91,65,77,99)
seq(from=0.04, by=0.01, length=11)

In [60]:
seq(0.04, by=0.01, along=N)

Alternatively, you can get R to work out the increment (0.01) in this example.

In [61]:
seq(0.04, to=0.14, along=N)

An important application of the last option ist o get the $x$ values for drawing smooth lines through a scatterplot of data using predicted values from a model.

Notice that when the increment does not match the final value, then the generated sequence stops short of the last value (rather then overstepping it).

In [62]:
seq(1.4, 2.1, 0.3)

In [63]:
sequence(c(4,3,4,4,4,5))

#### Generating repeats

You will often want to generate repeats of numbers or characters, for which the function is `rep`.

In [64]:
rep(9,5)

In [65]:
rep(1:4, 2)

In [66]:
rep(1:4, each = 2)

In [67]:
rep(1:4, each = 2, times = 3)

In [68]:
rep(1:4, 1:4)

In [69]:
rep(1:4, c(4,1,4,2))

In [70]:
rep(c("cat", "dog", "gerbill", "goldfish", "rat"),  c(2,3,2,1,3))

This is the most general, and also the most useful form of the `rep` function.

#### Generating factor levels

The function `gl` (generate levels) is useful when you want to encode long vectors of factor levels. The three arguments are:

* up to
* with repeats of 
* to total length

In [71]:
gl(4,3)

In [72]:
gl(4,3,24)

If you want text for the factor levels, rather than numbers, use labels like this:

In [74]:
Temp = gl(2,2,24, labels=c('Low','High'))
Soft = gl(3,8,24, labels=c('Hard', 'Medium', 'Soft'))
M.user = gl(2,4,24, labels=c('N', 'Y'))
Brand = gl(2,1,24, labels=c('X', 'M'))

In [75]:
data.frame(Temp,Soft,M.user,Brand)

Temp,Soft,M.user,Brand
Low,Hard,N,X
Low,Hard,N,M
High,Hard,N,X
High,Hard,N,M
Low,Hard,Y,X
Low,Hard,Y,M
High,Hard,Y,X
High,Hard,Y,M
Low,Medium,N,X
Low,Medium,N,M


#### Membership: Testing and coercing in R

The concept of membership and coercion may be unfamiliar. Membership relates to the class of an object in R. coercion changes the class of an object. for instance, a logical variable has class ` logical` and mode `logical`. This is how we create the variable.

In [80]:
lv = c(T,F,T)

In [81]:
is.logical(lv)

In [82]:
levels(lv)

NULL

In [83]:
(fv = as.factor(lv))

In [85]:
is.factor(fv)

We can coerce a logical variable to be numeric: `TRUE` evaluates to $1$ and `FALSE` evaluates to $0$.

In [86]:
(nv = as.numeric(lv))

Functions for testing (`is`) the attribute of different categories of object (arrays, lists, etc.) and for coercing (`as`) the attributes of an object into a specified form. Neither operation changes the attributes of the object unless you overwrite its name.

We have the following data types: `array`, `character`, `complex`, `data.frame`, `double`, `factor`, `list`, `logical`, `matrix`, `numeric`, `raw`, `ts` (time series), `vector`. Which can be used with `is.<type>` and `as.<type>`.

#### Missing values, infinity and things that are not numbers

Calculations can lead to answers that are plus infinity, represented in R by `Inf`, or minus infinity, which is represented as `-Inf`.

In [87]:
3/0

In [88]:
-12/0

In [89]:
exp(-Inf)

In [90]:
0/Inf

In [92]:
(0:3)^Inf

In [93]:
0/0

In [94]:
Inf/Inf

In [95]:
is.finite(10)

In [96]:
is.infinite(10)

In [97]:
is.infinite(Inf)

#### Missing values

This syntax is useful in editing out rows containing missing values from large dataframes. Here is a simple example:

In [100]:
y1 = c(1,2,3,NA)
y2 = c(5,6,NA,8)
y3 = c(9,NA,11,12)
y4 = c(NA,14,15,16)
full.frame = data.frame(y1,y2,y3,y4)
full.frame

y1,y2,y3,y4
1.0,5.0,9.0,
2.0,6.0,,14.0
3.0,,11.0,15.0
,8.0,12.0,16.0


In [101]:
reduced.frame = full.frame[!is.na(full.frame$y1),]
reduced.frame

y1,y2,y3,y4
1,5.0,9.0,
2,6.0,,14.0
3,,11.0,15.0


Some functions do not work with their default settings when there are missing values in the data, and `mean` is a classic example of this.

In [102]:
x = c(1:8,NA)
mean(x)

In [103]:
mean(x,na.rm=T)

Here is an example where we want to find the locations of missing values within a vector called `vmv`.

In [105]:
vmv = c(1:6,NA,NA,9:12)
vmv

In [106]:
seq(along=vmv)[is.na(vmv)]

However, the result is achieved more simlpy using the `which` function like this:

In [107]:
which(is.na(vmv))

Or use the `ifelse` function like this:

In [109]:
vmv = c(1:6,NA,NA,9:12)
ifelse(is.na(vmv),0,vmv)

### Vectors and subscripts