In [None]:
x <- c(1,2,3)
y <- x
y[3] <- 4

str(x)
str(y)

Variables are references. When copying x in y, we are not copying the data, but we have two references to the same object. If then $y$ changes, R creates a copy laso of the data.

To inspect the internal data structure and memory allocation we can use the package lobstr.

In [None]:
install.packages("lobstr")

In [None]:
lobstr::obj_addr(x) #we can use the package name as a namespace, with "::" to access its methods

In [None]:
help(package=lobstr) #Access documentation

In [None]:
library(lobstr) #loads the package, so that we do not need to use its namespace

In [None]:
#So, let's redo the above experiment and inspect it with lobstr

In [None]:
a <- c(1,2,3)
b <- a

obj_addr(a); obj_addr(b) #same address

In [None]:
b[1] <- 2

obj_addr(a); obj_addr(b) #address of b changes

## Lists

In [None]:
l1 <- list( 1:3, "list element", c(TRUE, FALSE, FALSE), c(3.5, 4, 6.2, -1.75))
typeof(l1)

Lists contain several different objects, as a list of references to them. R has a garbage collector, and so automatically deallocates memory when the last reference to an object goes out of scope.

In [None]:
ref(l1) #inspect the list: gives the address to each of its objects

## Matrices

In [None]:
X <- matrix (c(1,0,0,0,1,0,0,0,1) , nrow =3) ; X

In [None]:
class(X)
attributes(X)
str(X)

In [None]:
vct <- c(1,2,3,4,4,3,2,1)
V <- matrix ( vct , byrow =T , nrow =2) #equivalent to vct.reshape((4,2))
V

In [None]:
V <- matrix ( vct , byrow =F , nrow =2)
V

In [None]:
X <- matrix(rpois(n=20, lambda=1.5), nrow=4) #Fill matrix with samples from Poisson distribution
X

In [None]:
X[3,3]

In [None]:
X[4,] #row 4

In [None]:
X[,5] #column 5

In [None]:
rowSums(X)

In [None]:
rowMeans(X)

In [None]:
#Adding rows and columns

vct <- matrix(c(1,0,2,5,1,1,3,1,3,1,0,2,1,0,2,1), byrow=T, nrow=4)
vct

In [None]:
vct <- rbind(vct, apply(vct, 2, mean)) #rbind combines "by row", i.e. merges the vectors returned by the apply function as rows

In [None]:
vct <- cbind(vct, apply(vct, 1, var))

In [None]:
#apply(matrix, axis, functin): iterates over vectors on axis (1=rows, 2=columns) of that matrix

In [None]:
colnames(vct) <- c(1:4, "variance") #Name the fields
rownames(vct) <- c(1:4, "mean")
#Note that 1:4 are converted to characters (names are always characters)

In [None]:
vct

In [None]:
(Y <- matrix(rbinom(20, 9, 0.45), nrow=4)) #adding brackets around prints the result

In [None]:
apply(Y, MARGIN=2, FUN=sum) #sum the values in columns

In [None]:
apply(Y, 1, function(x) x^2+x) #can use lambda functions

In [None]:
sapply(12:14, seq) #applies function to a vector, generates three sequences: 1:12, 1:13 and 1:14 (seq(12), seq(13), seq(14))

In [None]:
#Set random seed
set.seed(2019)
runif(3) #uniform U(0,1) distribution

In [None]:
#at any moment we can save the current random seed
current.seed <- .Random.seed

In [None]:
runif(3)

In [None]:
runif(3)

In [None]:
current.seed -> .Random.seed #resets the seed
runif(5) #we get the same sequence as before!

In [None]:
y <- c(8,3,5,7,6,6,8,9,2,3,9,4,10,4,11)

sample(y) #permutates the elements (sampling WITHOUT replacement)

In [None]:
sample(y, 3) #samples only 3 elements

In [None]:
sample (y , replace =T ) #with replacement

In [1]:
x <- 1:10
sample(x[x>8])


In [6]:
sample(x[x>9]) #x[x>9] has only one element (10). In this case, sample generates a sequence 1:10 and samples from it instead

#to avoid this behaviour, use resample:
library(gdata) #need this package for resample function

resample(x[x>8])
resample(x[x>9])

In [7]:
?resample

From the resample docs: resample differs from the S/R sample function in resample always considers x to be a vector of elements to select from, while sample treats a vector of length one as a special case and samples from 1:x. Otherwise, the functions have identical behavior.

## R Subsetting

In [8]:
x <- c(2.1, 4, 6.7, 1.75)

In [9]:
x[c(1,3)] #get first and third element

In [10]:
x[c(1,1,3,3)] #duplicate indices work

In [11]:
x[sort(x)] #doubles are rounded to ints for indexing

In [12]:
#negative integers exclude elements
x[-c(1,3)]

In [13]:
#cannot use positive and negative at the same time
x[c(1,-3)]

ERROR: Error in x[c(1, -3)]: solo gli 0 si possono usare contemporaneamente con indici negativi


In [14]:
x[c(T, T, F, T)] #logical vectors work as masks

In [15]:
x[c(TRUE, FALSE)] #if the selection vector is too short, it is repeated (as c(T,F,T,F))

In [16]:
x[0] #returns a 0-length vector

In [18]:
#Named vectors are just like dictionaries
#some useful constant vectors: LETTERS (capital letters), letters (lowercase), month.abb, month.name

In [19]:
y <- setNames(x, LETTERS[1:length(x)])
y

In [21]:
y["A"] #access through the character indices

In [22]:
y[c('A', 'A', 'D')]

In [24]:
#Subsetting with factors uses the underlying integer vector, not the character!
y[factor("B")] #returns the first element, because "B" in the factor is stored as a 1
#In general, avoid using factors for subsetting

In [25]:
#outer = cartesian product
outer(1:3, 1:3)

0,1,2
1,2,3
2,4,6
3,6,9


In [27]:
#a function can be specified to edit the generated elements
#paste = converts arguments to characters, and concatenates them (with a separator if specified)

v <- outer(1:5, 1:5, FUN="paste", sep=",")
v

0,1,2,3,4
11,12,13,14,15
21,22,23,24,25
31,32,33,34,35
41,42,43,44,45
51,52,53,54,55


In [29]:
v[seq(3, 23, 5)] #we can access elements in the matrix as if it were a "flattened vector"
#(matrices are internally stored as sequential data)

In [30]:
#Preserve original dimension when subsetting: DROP=FALSE
(S <- matrix(1:6, nrow=2))

0,1,2
1,3,5
2,4,6


In [31]:
S[1,] #normally returns a vector

In [33]:
S[1, , drop=FALSE] #returns a (1,3) matrix instead

0,1,2
1,3,5


In [34]:
#Lists are like "trains". [[]] gets the content of a vagon, [] just the vagon itself
xl <- list(1:3, "one", c(T,F,F)) #3 "vagons"

In [37]:
xl[1] #first vagon, returned as a LIST

In [39]:
xl[[1]] #first vagon, returned as VECTOR (=content of the "vagon")

In [40]:
xl[1:2] #subset the list

In [None]:
xl[[1:3]] # = xl[[1]][[2]]

In [66]:
xl[[1]][[3]]

## Loops

In [50]:
#Define a factorial function
fact1 <- function(x)
{
    f <- 1
    if (x < 2) return (1)
    for (i in 2:x) {
        f <- f*i
    }
    return (f)
}

In [51]:
sapply(1:5, fact1)

In [52]:
#with a while loop instead
fact2 <- function(x) {
    f <- 1; t <- x
    while (t > 1) {
        f <- f*t
        t <- t-1
    }
    return(f)
}

In [53]:
sapply(1:5, fact2)

In [54]:
fac3 <- function(x) {
    f <- 1; t <- x
    repeat { #= while(1)
        if (t<2) break
        f <- f*t
        t <- t-1
    }
    return(f)
}
sapply(1:5, fac3)

In [55]:
cumprod(1:5) #cumulative product

In [56]:
#Better way: use already defined (vectorized) functions!
fac4 <- function(x) max(cumprod(1:x))

In [57]:
sapply(1:5, fac4)

In [58]:
#Also now there is the factorial function:
sapply(1:5, factorial)

In [59]:
#AVOID LOOPS!

In [67]:
#Es. use ifelse vectorized function
y <- log(rpois(20,1.5))
y

In [68]:
mean(y)

In [71]:
(y <- ifelse(y<0, NA, y)) #negative values are set to NA

mean(y, na.rm=TRUE) #so that they can be avoided by functions

In [72]:
x <- runif(50000000)
str(x)

 num [1:50000000] 0.0641 0.4256 0.8458 0.555 0.9636 ...


In [73]:
head(x)
tail(x)

In [74]:
system.time(max(x))

   user  system elapsed 
   0.06    0.00    0.06 

In [75]:
pc <- proc.time()
cmax <- x[1]
for (i in 2:length(x)) { if (x[i]>cmax) cmax <- x[i]}
proc.time()-pc

   user  system elapsed 
   2.68    0.00    2.67 

In [76]:
test1 <- function(n) {
    y <- 1:n #optimized
}

test2 <- function(n) {
    y <- numeric(n) #empty array
    for (i in 1:n)  #initialize with for loop
        y[i] <- i
}

test3 <- function(n) { 
    y <- NULL
    for (i in 1:n) #even worse
        y <- c(y,i) #creates a new array every time!
}

In [None]:
system.time(test1(10000000))
system.time(test2(10000000))
#system.time(test3(10000000)) too long

   user  system elapsed 
      0       0       0 

   user  system elapsed 
   0.71    0.00    0.71 