# R - Week 1 (sequences, vectors, matrices)

The book can be downloaded with this link:

http://cmas.siu.buap.mx/portal_pprd/work/sites/biologia/resources/PDFContent/168/R.%20Book%202012.pdf

## Week 1

Essentials of the R-language:

* 2.1 Calculations
* 2.3 Generating sequences
* 2.6 Vectors
* 2.7 Vector functions
* 2.8 Matrices and arrays

### Calculations

We can run simple calculations:

In [4]:
log(42,7.3)

In [5]:
5+6+7+8+9

In [6]:
2+3; 5*7; 3-7

#### Complex numbers

In [7]:
z = 3.5-8i
z

In [8]:
Re(z)

In [9]:
Im(z)

In [10]:
Mod(z)

In [11]:
Arg(z)

In [12]:
Conj(z)

In [13]:
is.complex(z)

In [14]:
as.complex(3.8)

#### Rounding

In [15]:
floor(5.7)

In [16]:
ceiling(5.7)

In [17]:
rounded = function(x) floor(x+0.5)

In [18]:
rounded(5.7)

In [19]:
rounded(5.4)

You have to decide how to handle negative numbers, because the concept of up and down is more subtle.

In [20]:
ceiling(-5.7)

In [21]:
floor(-5.7)

To simply strip of the decimal part.

In [22]:
trunc(5.7)

In [23]:
trunc(-5.7)

Rounding can be done with `round(x, decimals)`.

In [24]:
round(5.7, 0)

In [25]:
round(5.75, 1)

The number of digits is not the same as the number of significant digits. You can control the number of significant digits using the function `signif`.

In [26]:
signif(12345678,4)

In [27]:
signif(12345678,3)

#### Arithmetic

In [28]:
7 + 3 - 5 * 2

In [29]:
3^2 / 2

In [30]:
log(10)

In [31]:
exp(1)

In [32]:
log10(6)

In [33]:
log(9,3)

Mathematical functions in R are:

* `log(x)`
* `exp(x)`
* `log(x,n)`
* `log10(x)`
* `sqrt(x)`
* `factorial(x)`
* `choose(n, x)`
* `gamma(x)`
* `lgamma(x)`
* `floor(x)`
* `ceiling(x)`
* `trunc(x)`
* `round(x, digits=0)`
* `signif(x, digits=6)`
* `runif(n)`, generates $n$ random numbers between 0 and 1 from a uniform distribution.
* `sin(x)`, `cos(x)`, `tan(x)`
* `asin(x)`, `acos(x)`, `atan(x)`
* `asinh(x)`, `acosh(x)`, `atanh(x)`
* `abs(x)`

Trigonometric function in R are measured in radians:

In [34]:
pi

In [35]:
sin(pi/2)

In [36]:
cos(pi/2)

#### Modulo and integer quotients

Integer quotients and remained are obtained using the notation `%/%` and `%%` respectively.

In [37]:
21 %/% 10

Module is done like

In [38]:
21 %% 10

Modulo is very useful for testing whether numbers are odd or even.

In [39]:
9 %% 2

In [40]:
8 %% 2

In [41]:
15421 %% 7 == 0

#### Variables names and assigment

* Variable names in R are **case sensitive**.
* Variable names should not begin with numbers.
* Variables names should not contain blank spaces.

In [42]:
x = 5; x

#### Operators

R uses the following operator tokens:

* Arithmetic: `+`, `-`, `*`, `/`, `%/%`, `%%`, `^`.
* Relational: `>=`, `<`, `<=`, `==`, `!=`.
* Logical (not, and, or): `!`, `&`, `|`.
* Model formulae (is modelled as a function of): `~`.
* Assigment: `<-`, `->`.
* List indexing: `$`.
* Create a sequence: `:`.

Several of these operators have different meaning inside model formulae. Thus `*` indicates the main effects plus interaction (rather than multiplication), `:` indicates the interaction between two variables (rather than generate a sequence) and `^` means all interactions op to the indicated power (rather than raise to the power).

#### Integers

The range of integers is from $-2,000,000,000$ to $+2,000,000,000$.

In [43]:
x = c(5,4,7,8); x

In [44]:
is.numeric(x)

Applying the integer function `x <- integer(x)` replaces all your numbers with zeros; definitely not what you intended!

In [45]:
x = c(5,3,7,8)
x = as.integer(x)
is.integer(x)

The integer function works as `trunc` when applied to real numbers, and removes the imaginary part when applied for complex numbers.

In [46]:
as.integer(5.7)

In [47]:
as.integer(-5.7)

In [48]:
as.integer(5.7 -3i)

"imaginary parts discarded in coercion"

#### Factors

Factors are categorical variables that have a fixed number of levels. A simple example of a factor might be a variable called gened with two levels: `female` and `male`.

In [49]:
gender = factor(c('female', 'male', 'female', 'male', 'female'))
class(gender)

In [50]:
mode(gender)

More often, you will create a dataframe by reading your data from a file using `read.table`. When you do this, all the variables contained one or more character strings will be converted automatically into factors.

In [51]:
data = read.table('daphnia.txt')

In [52]:
attach(data)

In [53]:
head(data)

Growth.rate,Water,Detergent,Daphnia
2.919086,Tyne,BrandA,Clone1
2.492904,Tyne,BrandA,Clone1
3.021804,Tyne,BrandA,Clone1
2.350874,Tyne,BrandA,Clone2
3.148174,Tyne,BrandA,Clone2
4.423853,Tyne,BrandA,Clone2


To check if a variable is a factor:

In [54]:
is.factor(Water)

To discover the *names* of the factor, levels:

In [55]:
levels(Detergent)

To discover the *number* of levels of a factor.

In [56]:
nlevels(Detergent)

In [57]:
length(levels(Detergent))

### Generating sequences

An important way of creating vectors is to generate a sequence of numbers.

In [58]:
0:10

In [59]:
15:5

In [60]:
seq(0, 1.5, 0.5)

In [61]:
seq(6, 4, -0.2)

In many cases, you want to generate a sequence to match an existing vector in length.

In [62]:
N <- c(55,76,92,103,84,88,121,91,65,77,99)
seq(from=0.04, by=0.01, length=11)

In [63]:
seq(0.04, by=0.01, along=N)

Alternatively, you can get R to work out the increment (0.01) in this example.

In [64]:
seq(0.04, to=0.14, along=N)

An important application of the last option ist o get the $x$ values for drawing smooth lines through a scatterplot of data using predicted values from a model.

Notice that when the increment does not match the final value, then the generated sequence stops short of the last value (rather then overstepping it).

In [65]:
seq(1.4, 2.1, 0.3)

In [66]:
sequence(c(4,3,4,4,4,5))

#### Generating repeats

You will often want to generate repeats of numbers or characters, for which the function is `rep`.

In [67]:
rep(9,5)

In [68]:
rep(1:4, 2)

In [69]:
rep(1:4, each = 2)

In [70]:
rep(1:4, each = 2, times = 3)

In [71]:
rep(1:4, 1:4)

In [72]:
rep(1:4, c(4,1,4,2))

In [73]:
rep(c("cat", "dog", "gerbill", "goldfish", "rat"),  c(2,3,2,1,3))

This is the most general, and also the most useful form of the `rep` function.

#### Generating factor levels

The function `gl` (generate levels) is useful when you want to encode long vectors of factor levels. The three arguments are:

* up to
* with repeats of 
* to total length

In [74]:
gl(4,3)

In [75]:
gl(4,3,24)

If you want text for the factor levels, rather than numbers, use labels like this:

In [76]:
Temp = gl(2,2,24, labels=c('Low','High'))
Soft = gl(3,8,24, labels=c('Hard', 'Medium', 'Soft'))
M.user = gl(2,4,24, labels=c('N', 'Y'))
Brand = gl(2,1,24, labels=c('X', 'M'))

In [77]:
data.frame(Temp,Soft,M.user,Brand)

Temp,Soft,M.user,Brand
Low,Hard,N,X
Low,Hard,N,M
High,Hard,N,X
High,Hard,N,M
Low,Hard,Y,X
Low,Hard,Y,M
High,Hard,Y,X
High,Hard,Y,M
Low,Medium,N,X
Low,Medium,N,M


#### Membership: Testing and coercing in R

The concept of membership and coercion may be unfamiliar. Membership relates to the class of an object in R. coercion changes the class of an object. for instance, a logical variable has class ` logical` and mode `logical`. This is how we create the variable.

In [78]:
lv = c(T,F,T)

In [79]:
is.logical(lv)

In [80]:
levels(lv)

NULL

In [81]:
(fv = as.factor(lv))

In [82]:
is.factor(fv)

We can coerce a logical variable to be numeric: `TRUE` evaluates to $1$ and `FALSE` evaluates to $0$.

In [83]:
(nv = as.numeric(lv))

Functions for testing (`is`) the attribute of different categories of object (arrays, lists, etc.) and for coercing (`as`) the attributes of an object into a specified form. Neither operation changes the attributes of the object unless you overwrite its name.

We have the following data types: `array`, `character`, `complex`, `data.frame`, `double`, `factor`, `list`, `logical`, `matrix`, `numeric`, `raw`, `ts` (time series), `vector`. Which can be used with `is.<type>` and `as.<type>`.

#### Missing values, infinity and things that are not numbers

Calculations can lead to answers that are plus infinity, represented in R by `Inf`, or minus infinity, which is represented as `-Inf`.

In [84]:
3/0

In [85]:
-12/0

In [86]:
exp(-Inf)

In [87]:
0/Inf

In [88]:
(0:3)^Inf

In [89]:
0/0

In [90]:
Inf/Inf

In [91]:
is.finite(10)

In [92]:
is.infinite(10)

In [93]:
is.infinite(Inf)

#### Missing values

This syntax is useful in editing out rows containing missing values from large dataframes. Here is a simple example:

In [94]:
y1 = c(1,2,3,NA)
y2 = c(5,6,NA,8)
y3 = c(9,NA,11,12)
y4 = c(NA,14,15,16)
full.frame = data.frame(y1,y2,y3,y4)
full.frame

y1,y2,y3,y4
1.0,5.0,9.0,
2.0,6.0,,14.0
3.0,,11.0,15.0
,8.0,12.0,16.0


In [95]:
reduced.frame = full.frame[!is.na(full.frame$y1),]
reduced.frame

y1,y2,y3,y4
1,5.0,9.0,
2,6.0,,14.0
3,,11.0,15.0


Some functions do not work with their default settings when there are missing values in the data, and `mean` is a classic example of this.

In [96]:
x = c(1:8,NA)
mean(x)

In [97]:
mean(x,na.rm=T)

Here is an example where we want to find the locations of missing values within a vector called `vmv`.

In [98]:
vmv = c(1:6,NA,NA,9:12)
vmv

In [99]:
seq(along=vmv)[is.na(vmv)]

However, the result is achieved more simlpy using the `which` function like this:

In [100]:
which(is.na(vmv))

Or use the `ifelse` function like this:

In [101]:
vmv = c(1:6,NA,NA,9:12)
ifelse(is.na(vmv),0,vmv)

### Vectors and subscripts

In [113]:
peas <- c(4,7,6,5,6,7)

In [114]:
class(peas)

In [115]:
length(peas)

In [116]:
mean(peas)

In [117]:
max(peas)

In [118]:
min(peas)

In [119]:
quantile(peas)

In [120]:
peas

In [121]:
peas[4]

In [122]:
pods <- c(2,3,6)
peas[pods]

In [123]:
peas[c(2,3,6)]

In [124]:
peas[-1]

In [125]:
peas[-length(peas)]

In [126]:
trim <- function(x) sort(x) [-c(1,2,length(x)-1,length(x))]

In [127]:
trim(peas)

In [128]:
peas[1:3]

In [129]:
peas[1:length(peas) %% 2 == 0]

In [130]:
y <- 4.3

In [131]:
z <- y[-1]

In [132]:
length(z)

In [133]:
x <- 0:10

In [134]:
x

In [135]:
sum(x)

In [136]:
sum(x<5)

In [137]:
sum(x[x<5])

In [138]:
x<5

In [139]:
x*(x<5)

In [140]:
sum(x*(x<5))

In [141]:
y <- c(8,3,5,7,6,6,8,9,2,3,9,4,10,4,11)

In [142]:
sort(y)

In [143]:
rev(sort(y))

In [144]:
rev(sort(y))[1:3]

In [145]:
sum(rev(sort(y))[1:3])

In [146]:
which(x == max(x))

In [147]:
which(x == min(x))

In [148]:
which.max(x)

In [149]:
which.min(x)

### Vector functions

In [150]:
x = 0:10

In [151]:
max(x)

In [152]:
min(x)

In [153]:
sum(x)

In [154]:
mean(x)

In [155]:
median(x)

In [156]:
range(x)

In [157]:
var(x)

In [158]:
cor(x,x)

In [159]:
sort(x)

In [160]:
order(x)

In [161]:
quantile(x)

In [162]:
cumsum(x)

In [163]:
cumprod(x)

In [165]:
cummax(x)

In [166]:
cummin(x)

In [167]:
pmax(x,x,x)

In [168]:
pmin(x,x,x)

In [169]:
colMeans(x)

ERROR: Error in colMeans(x): 'x' must be an array of at least two dimensions


In [174]:
(poisson <- rpois(105,0.7))

In [175]:
rle(poisson)

Run Length Encoding
  lengths: int [1:64] 1 1 1 2 4 3 2 1 2 1 ...
  values : int [1:64] 1 0 2 1 0 1 0 1 0 1 ...

In [176]:
max(rle(poisson)[[1]] == 7)

In [181]:
run.and.value <- function(x) {
    a <- max(rle(poisson)[[1]])
    b <- rle(poisson)[[2]][which(rle(poisson)[[1]] == a)]
    cat("length = ", a," value = ", b,"\n")
}

In [182]:
run.and.value(poisson)

length =  8  value =  0 


In [183]:
A <- c("a", "b", "c", "d", "e")
B <- c("d", "e", "f", "g")

union(A,B)

In [184]:
intersect(A,B)

In [185]:
setdiff(A,B)

In [186]:
setdiff(B,A)

In [187]:
A %in% B

In [188]:
B %in% A

In [189]:
A[A %in% B]

In [190]:
intersect(A,B)

### Matrices and arrays

In [191]:
y <- 1:24
dim(y) <- c(2,4,3)
y

In [194]:
dim(y) <- c(3,2,4)
y

#### Matrices

In [197]:
X <- matrix(c(1,0,0,0,1,0,0,0,1),nrow=3)

In [198]:
class(X)

In [199]:
X

0,1,2
1,0,0
0,1,0
0,0,1


In [200]:
attributes(X)

In [201]:
vector <- c(1,2,3,4,4,3,2,1)
V <- matrix(vector, byrow=T, nrow=2)
V

0,1,2,3
1,2,3,4
4,3,2,1


In [202]:
dim(vector) <- c(4,2)

In [203]:
is.matrix(vector)

In [204]:
vector

0,1
1,4
2,3
3,2
4,1


In [205]:
(vector <- t(vector))

0,1,2,3
1,2,3,4
4,3,2,1


In [206]:
X <- matrix(rpois(20,1.5), nrow=4)
X

0,1,2,3,4
0,3,1,4,0
1,1,0,1,2
3,1,2,2,1
3,1,3,3,3


In [207]:
rownames(X) <- rownames(X,do.NULL=FALSE,prefix="Trial.")

In [208]:
X

0,1,2,3,4,5
Trial.1,0,3,1,4,0
Trial.2,1,1,0,1,2
Trial.3,3,1,2,2,1
Trial.4,3,1,3,3,3


In [210]:
drug.names <- c("aspirin", "paracetamol", "nurofen", "hedex", "placebo")
colnames(X) <- drug.names
X

Unnamed: 0,aspirin,paracetamol,nurofen,hedex,placebo
Trial.1,0,3,1,4,0
Trial.2,1,1,0,1,2
Trial.3,3,1,2,2,1
Trial.4,3,1,3,3,3


In [211]:
dimnames(X) <- list(NULL,paste("drug.", 1:5,sep=""))

In [212]:
X

drug.1,drug.2,drug.3,drug.4,drug.5
0,3,1,4,0
1,1,0,1,2
3,1,2,2,1
3,1,3,3,3


#### Calculations on rows or columns of a matrix

In [213]:
mean(X[,5])

In [214]:
var(X[4,])

In [215]:
rowSums(X)

In [216]:
colSums(X)

In [217]:
rowMeans(X)

In [218]:
colMeans(X)

In [219]:
apply(X,2,mean)

In [220]:
X

drug.1,drug.2,drug.3,drug.4,drug.5
0,3,1,4,0
1,1,0,1,2
3,1,2,2,1
3,1,3,3,3


In [222]:
group=c("A","B","B","A")
rowsum(X, group)

Unnamed: 0,drug.1,drug.2,drug.3,drug.4,drug.5
A,3,4,4,7,3
B,4,2,2,3,3


In [223]:
tapply(X, list(group[row(X)], col(X)), sum)

Unnamed: 0,1,2,3,4,5
A,3,4,4,7,3
B,4,2,2,3,3


In [224]:
aggregate(X, list(group), sum)

Group.1,drug.1,drug.2,drug.3,drug.4,drug.5
A,3,4,4,7,3
B,4,2,2,3,3


In [225]:
apply(X,2,sample)

drug.1,drug.2,drug.3,drug.4,drug.5
1,3,0,3,0
0,1,1,1,2
3,1,3,4,3
3,1,2,2,1


In [226]:
apply(X,2,sample)

drug.1,drug.2,drug.3,drug.4,drug.5
3,1,2,3,1
0,3,1,4,3
1,1,0,2,0
3,1,3,1,2


#### Apply functions with `apply`, `sapply` and `lapply`

In [227]:
(X <- matrix(1:24, nrow=4))

0,1,2,3,4,5
1,5,9,13,17,21
2,6,10,14,18,22
3,7,11,15,19,23
4,8,12,16,20,24


In [228]:
apply(X,1,sum)

In [229]:
apply(X,2,sum)

In [230]:
apply(X,1,sqrt)

0,1,2,3
1.0,1.414214,1.732051,2.0
2.236068,2.44949,2.645751,2.828427
3.0,3.162278,3.316625,3.464102
3.605551,3.741657,3.872983,4.0
4.123106,4.242641,4.358899,4.472136
4.582576,4.690416,4.795832,4.898979


In [231]:
apply(X,2,sqrt)

0,1,2,3,4,5
1.0,2.236068,3.0,3.605551,4.123106,4.582576
1.414214,2.44949,3.162278,3.741657,4.242641,4.690416
1.732051,2.645751,3.316625,3.872983,4.358899,4.795832
2.0,2.828427,3.464102,4.0,4.472136,4.898979


In [232]:
apply(X,1,sample)

0,1,2,3
5,6,11,24
9,14,15,16
1,10,7,8
21,18,19,12
17,22,3,4
13,2,23,20


In [233]:
apply(X,1,function(x) x^ 2+x)

0,1,2,3
2,6,12,20
30,42,56,72
90,110,132,156
182,210,240,272
306,342,380,420
462,506,552,600


In [234]:
sapply(3:7, seq)