# Section 1 - R Basics

## First Steps in R - Some Basics

### 1. Objects

Objects are "things" stored in R, e.g. variables.

Conventions regarding the variable names: start with letter, no spaces, only lower case, underscores. Should not be predefined variables in R.

In [1]:
# assign values to variables
a <- 1     # standard: assignments via "<-"
b = 1      # "=" works too, but should be avoided
c <- -1

In [2]:
# print value of a variable
print(a)   # more explicit than the following two
b
c

[1] 1


### 2. Workspace

Set of all currently defined variables and functions. These symbols can be used in the terminal.

In RStudio: see `Enviroment` tab.

In [3]:
# print all symbols in the workspace
ls()

### 3. Functions

Functions often use parantheses for evaluation but some functions can be evaluated without parantheses.

Functions migth have neccessary and optional arguments. If an argument in a function is specifified, one must use `=`, not `<-`.

These function names should not be used to define other objects.

In [4]:
# some functions with parantheses
print("Hi!")
ls()
sqrt(4)

[1] "Hi!"


In [5]:
# some functions without parantheses
2 ^ 3
2 + 3
2 > 3

In [6]:
# how to determine which arguments a function needs and which might be optional
args(log)              # the argument `base` is optional, `x` is necessary

In [7]:
# different ways to execute `log` with different arguments
log(8)                 # implicitly setting `x=8`
log(x = 8)             # setting and specifying `x` (alternative)

log(8, base = 2)       # setting both arguments but specifying the optional one
log(base = 2, x = 8)   # specifying all arguments allows to change the input order of the arguments (alternative)
log(8, 2)              # implicitly setting `x=8` and `base=2` (alternative)

In [8]:
# get more information about a function
help(log)
?log

# see all arithmetic operators
?"+"

# see all relational operators
?">"

# see all logical operators
?Comparison

### 4. Prebuild Objects

R includes some prebuild datasets and mathematical quantities. These object names should not be used to define othe objects.

In [9]:
# some examples
pi       # pi
Inf+1    # infinity

cars     # a data set

data()   # all available data sets

speed,dist
4,2
4,10
7,4
7,22
8,16
9,10
10,18
10,26
10,34
11,17


### 5. Packages

In [10]:
# how to install and import a package
# install.packages("vegan")    # install new package -> DO IT ONLY ONCE!
library(vegan)                 # import/load the package -> in every script
browseVignettes("vegan")       # learn more about a package (not available for all packages)

Loading required package: permute
Loading required package: lattice
This is vegan 2.5-7
starting httpd help server ... done


In [11]:
# --------------------

## Data Types

In [12]:
# how to determine the type of an object
class(1)
class("Hi!")

# advanced: change class via e.g. `as.integer()` or by adding an `L` (1 -> 1L)

### 1. Data Frames

Tables with rows (obvervations) and columns (variables). Can combine different data types into one object, but one column should (but doesn't have to) consist of one data type.

In [13]:
# examples for (prebuild) data frames
install.packages("dslabs")
library(dslabs)

data(murders)     # load the `murders` data set

“unable to access index for repository https://cran.r-project.org/src/contrib:
“package ‘dslabs’ is not available (for R version 3.6.1)”

### 2. Examining an Object

There are different ways to find out more about a specific object, e.g. get information about the structure or the first elements.

In [14]:
# find out about the structure of an object
str(1)
str(murders)     # number of rows and columns, type of objects in the 5 columns, ...

 num 1
'data.frame':	51 obs. of  5 variables:
 $ state     : chr  "Alabama" "Alaska" "Arizona" "Arkansas" ...
 $ abb       : chr  "AL" "AK" "AZ" "AR" ...
 $ region    : Factor w/ 4 levels "Northeast","South",..: 2 4 4 2 4 4 1 2 2 2 ...
 $ population: num  4779736 710231 6392017 2915918 37253956 ...
 $ total     : num  135 19 232 93 1257 ...


In [15]:
# get the first 6 or n elements of an object
head(murders)         # get first six rows of a data frame
head(murders, n=3)    # get first 3 rows of a data frame

state,abb,region,population,total
Alabama,AL,South,4779736,135
Alaska,AK,West,710231,19
Arizona,AZ,West,6392017,232
Arkansas,AR,South,2915918,93
California,CA,West,37253956,1257
Colorado,CO,West,5029196,65


state,abb,region,population,total
Alabama,AL,South,4779736,135
Alaska,AK,West,710231,19
Arizona,AZ,West,6392017,232


In [16]:
# access the variable names
names(murders)

### 3. The Accessor: `$`

Access variables represented by columns.

Hint: order of rows in data table are preserved!

In [17]:
murders$population     # use `names(murders)` to get all variable names! (and use auto-completion via `tab`)

In [18]:
# hint: brackets with variable names or indices also work!
murders[["population"]]
murders[,4]

### 4. Vectors: Numerics, Characters and Logical

Array containing one or several objects of the same type.

Later: How to create a vector/list and more.

In [19]:
# types of vectors
class(murders$state)     # character strings
murders$state            # character strings

class(murders$total)     # numeric
murders$total            # numeric

class(3 == 2)            # logical
3 == 2                   # logical

### 5. Factors

Use to store categorical data, such as directions.

In background: levels (= categories) stored as integers as it ist more memory efficient.

Order of levels: alphabetical order per default. (Can be specified via the `levels` argument when creating a `factor`. Reorder via the `reorder` function.)

Hint: factors are confusing and a common source of bugs!

In [20]:
# create a factor
factor(c('A', 'B', 'B', 'C', 'D'))     # more examples: https://www.statology.org/create-categorical-variable-in-r/

In [21]:
# example for factor
class(murders$region)
murders$region             # here: printing also shows factors! (not always the case)

levels(murders$region)     # levels of the factor

In [22]:
# example: reorder values of a factor based on a sum
region <- murders$region
value <- murders$total
region <- reorder(region, value, FUN = sum)
levels(region)

### 6. Lists

Use to store any combination of (different) types and access elements similarly to data frames. (Similar to vectors)

In [23]:
# create a list
with_variables <- list(name = "Donata", age = 22, tired = TRUE)
without_variables <- list("Donata", 22, TRUE)

# extract components
with_variables$age          # use accessor (like data frames)
with_variables[["age"]]     # or use bracketes
without_variables[[2]]      # use bracketes and the index (starting with 1 instead of 0!) instead of the variable name

### 7. Matrices

Similar to data frames, but not as useful and powerful. Two dimensional objects based on rows and columns, but all elements must have the same type. Can be converted to data frames via `to.data.frame()`.

In [24]:
# create a matrix and access elements
mat <- matrix(1:12, 4, 3)     # matrix containing the numbers from 1 to 12 spread over 4 rows and 3 columns
mat

mat[2,3]                      # access a specific element
mat[ ,3]                      # access a specific column
mat[2, ]                      # access a specific row
mat[2:3,]                     # access multiple rows
mat[1:2,2:3]                  # access a sub-matrix

0,1,2
1,5,9
2,6,10
3,7,11
4,8,12


0,1,2
2,6,10
3,7,11


0,1
5,9
6,10


In [25]:
# --------------------

## Vectors

### 1. Creating Vectors

There are different ways to create vectors (arrays with only one type of elements).

In [26]:
c(1,2,3)                # numeric vector
c(a,b,c)                # numeric vector
c("Hello", "World")     # string vector

### 2. Names

Name the entries of a vector!

In [27]:
# assign names directly ...
ages <- c(donata = 22, philipp = 25)     # numeric vector
ages

# ... or via the `names()`
ages2 <- c(22, 25)                       # numeric vector
name <- c("donata", "philipp")
names(ages2) <- name
ages2

#access the names
names(ages)

### 3. Sequences

Create vectors based on sequences (faster).

In [28]:
# how to create a sequence
seq(1,10)         # sequence from 1 to 10 (integer)
seq(1,10,2)       # sequence from 1 to 10 with stepsize 2 (integer)
seq(1,10,0.5)     # sequence from 1 to 10 with stepsize 0.5 (numeric)

1:10              # sequence from 1 to 10 (integer)

Or create repeating vectors and sequences (faster)!

In [29]:
rep(1:3, times = 4)
rep(c("Cats", "are"), times = 10)     # repeat the input in the given order x times

rep(1:3, each = 4)
rep(c("Cats", "are"), each = 10)      # repeat each element of the input x times

### 4. Subsetting

Accessing elements in different assrays is similar to accesing elements in matrices.

In [30]:
ages[2]                          # access the second element in the list
ages[1:2]                        # access the elements from the the first to the second
ages["donata"]                   # access the element according to the variable
ages[c("donata", "philipp")]     # access elements according to the variables in the list

In [31]:
# --------------------

## Coercion

Attempt to be flexible with data types.

Example: wrong input type -> R tries to guess which type object should have before throwing an error.

Problem: may lead to confusion as R doesn'tbehave like other languages.

In [32]:
# example based on a list
x <- c(1, "2", 3)            # string vector although some inputs are ints
class(x)

### 1. Not Availables (NA)

Result of coercing an object to an impossible type, e.g. some `character` to `int`.

In [33]:
x <- c(1, "b", 3)            # string vector although some inputs are ints
as.numeric(x)                # convert string vector to numeric vector, but "b" cannot be converted!

“NAs durch Umwandlung erzeugt”

In [34]:
# --------------------

## Sorting

### 1. `sort`

Sort a vector in increasing or decreasing order.

In [35]:
x <- c(1,2,5,4,3)
sort(x)

sort(x, decreasing = TRUE)

### 2. `order`

Returns index that sorts array in increasing or decreasing order, i.e. "returns former indices of elements in the sorted vector".

In [36]:
# sort(x) is equivalent to
x <- c(1,2,5,4,3)
index <- order(x)
x[index]

index <- order(x, decreasing = TRUE)
x[index]

In [37]:
# one use case: order the abbreviations according to the total murders per state
index <- order(murders$total)
murders$abb[index]

### 3. `max` and `which.max` /  `min` and `which.min`

Equivalent relationship as `sort` and `order`. `max` return the maximal value of a vector, `which.max` return the according index.

`min` and `which.min` can be used in the same way.

In [38]:
x <- c(1,2,6,4,5)

max(x)

i_max <- which.max(x)
x[i_max]

In [39]:
# similar use case as above: get maximal murders and determine according state
max(murders$total)
i_max <- which.max(murders$total)
murders$state[i_max]

### 4. `rank`

Returns vector with the according ranks per entry, i.e. "indices of the entries in a sorted (ascending!) vector"

In [40]:
x <- c(9,8,7,6,5,4)
r_x <- rank(x)
x[r_x]                # returns the same as sort(x)
r_x

### 5. Beware of Recycling!

In [41]:
# an example
x <- c(10, 20)
y <- c(1,2,3,4,5,6,7)
x + y                     # x and y have different lengths -> entries of x are "recycled"

“Länge des längeren Objektes
 	 ist kein Vielfaches der Länge des kürzeren Objektes”

In [42]:
# --------------------

## Vector arithmetics

### 1. Rescaling a Vector

Arithmetic operations on vectors are elementwise.

In [43]:
x <- c(1,2,3,4,5)
x + 10                # e.g. for checking the difference of each element compared to the average
x * 2                 # e.g. for converting units

### 2. Two Vectors

Arithmetic operations are entry by entry, i.e.
a     d     a+d
b  +  e  =  b+e
c     f     c+f

In [44]:
x <- c(1,2,3,4,5)
y <- c(10,20,30,40,50)
y / x
x + y

In [45]:
# --------------------

## Indexing

### 1. Subsetting with Logicals

Define logicals to index vectors, i.e. "use a logical vector based on a logical operation (`comparion`s) of the same length to index entries (via `TRUE`)".

Can also be used to determine how many elements of a vector meet a certain condition, as in a `sum()` `TRUE` is decoded/coerced as `1` and `FALSE` as `0`.

In [46]:
x <- c(1,2,3,10,20,30,4,5,6)
index <- x > 9
index
x[index]
sum(index)

In [47]:
# use case: get all stated with a maximal murder rate of 0.71
murder_rate <- murders$total / murders$population * 100000
index <- murder_rate <= 0.71
murders$state[index]
sum(index)

### 2. Logical Operators

"Use the idea of subsetting with logical multiple time by combining the logical arrays with logical operators".

In [48]:
# logical operators in general
TRUE & TRUE       # only TRUE & TRUE evaluates to TRUE
TRUE & FALSE
FALSE & TRUE
FALSE & FALSE

In [49]:
# example
x <- c(1,2,3,10,20,30,4,5,6)
greater_3 <- x > 3
smaller_10 <- x < 10
between_4_and_9 <- greater_3 & smaller_10
x[between_4_and_9]

### 3. `which`

Similar to "subsetting with logical" but only keep indices of elements which meet requirement instead of keeping logical vector (more efficient!).

In [50]:
# use case: get murder rate of certain state
index <- which(murders$state == "California")
murder_rate[index]
index

### 4. `match`

Similar to `which` but comparison with multiple objects possible.

In [51]:
# use case: get murder rate of certain states
index <- match(c("New York", "Florida", "Texas"), murders$state)
murder_rate[index]
index

### 5. `%in%`

Similar to `match` but keeps the logical vector instead of just the indices.

In [52]:
c(9,10,11) %in% c(1,2,3,10,20,30,4,5,6)

### 6. Some Notes

Be careful with the returned indices/vectors. Depending on the function this value can be used for the "first" or "second" input only!

`match`, `%in%` and `which` are connected and can be combined in different ways.

In [53]:
# example: same output with different order
match(c("New York", "Florida", "Texas"), murders$state)
which(murders$state %in% c("New York", "Florida", "Texas"))

In [54]:
# --------------------

##### End of Section 1!