# Variables and basic types in R

## Creating variables

R is a dynamic programming languages. This means that you do not need to state the type that will be assigned to a variable and that you can assign different types to any variable, one after each other. In addition, the variable does not explicitly need to be created first. This makes the language initially very easy to learn and work with.

In [1]:
x <- 5 # here I assign the variable x with the value 5 using the arrow operator
x

In [2]:
6 -> x # not many people know that the arrow operator works both ways
x

In [3]:
x = 7 # to make things more interesting you can also use the equals operator to assign values to variables
x

People use various operators according to preference and organisational coding standards. This course sticks with using the arrow operator, because it was the original assignment operator in R and will aways work for assignment and is thus safer. *(There may be unforseen errors caused by using the equality operator)*

## Class, mode, typeof and Vectors: numeric, integer, character, logical, complex; matrices, arrays

The makers of R took the approach that all the basic data structures would be vectors from the start. Unlike many programming languages in R there is no such thing as a primitive type such as `integer`, `double` and so on. The most basic type in R is an array and the shortest array possible is of length zero. Individual values exists in arrays of length one.

In R all vectors are atomic, meaning that all the elements in the vector have the same type.

## Numeric vectors

In R, the `numeric` type is equivalent to a vector or array of what would be type `double` array in C/C++. 95 percent of the numbers you work with in R will be in this format, even if they first appeat to be integers. The `class` function allows us to obtain the class of the variable or value. The `mode` function returns the storage mode of an object (useful for introspection of complex objects).

In [4]:
class(1) # by default numbers are "numeric" class

In [5]:
mode(1) # the storage mode of an object

The combine function `c` allows us to create vectors:

In [6]:
x <- c(0, 1, 1, 2, 3, 5, 8, 21)
x
# Print the class
class(x)

The main difference between `class` and `mode` functions is that `class` is a loose label that can easily be "hacked" but the storage `mode` is always informative about the underlying type. The `typeof` function returns the R internal storage mode of an object:

In [7]:
class(x) <- "turtle"
class(x)
mode(x)
typeof(x)

From the above example it is clear that R has even looser type system than most dynamic languages, and does not have a type for the type label as is common in programming languages, there they type label is of type `character` - which we will come to later.

In addition, you can see the traditional function call syntax of calling one function after another on a variable in R:

In [8]:
mode(class(x))

In [9]:
# The seq function is very useful for creating a sequence of values:
y <- seq(from = 1, to = 5, by = 0.5)
y

In [10]:
# We can initialize numeric vectors by using the "vector" function
y <- vector(mode = "numeric", length = 10)
y

In [11]:
# Or by using the "numeric" function
z <- numeric(length = 5)
z

## Integer vectors

In R integers can be created either by using the colon operator `":"` or by coercing a numeric vector to interger:

In [12]:
x <- 1:10
class(x)

In [13]:
# The storage mode is also an integer
typeof(x)

In [14]:
# Here we create an integer by coercing a numeric vector using the "as.integer" function
x <- seq(1, 5, by = .5)
x
x <- as.integer(x)
x

In [15]:
# The "vector" function can be used to create integer vectors
vector(mode = "integer", length = 10)

In [16]:
# The "integer" function can also be used to create integer vectors
integer(length = 5)

## Character vectors

Character vectors in R can be likened to string arrays in C++. They can be created using single or double quotes:

In [17]:
x <- c("a", "b", "c")
x
y <- c('x', 'y', 'z')
y

In [18]:
# Character vectors can be created by coercing from all other basic types:
x <- 1:4
x
# By including "hi" into my original integer vector, R automatically converts the vector to character.
c(x, "hi")

As we go along the course, one of the themes that will emerge is that R makes alot of assumptions about what the analyst wants. The result of this is that there are many situations when what is expected is not always generated. For instance in the above case R assumes that appending a character to the numeric vector is not a mistake and that the rest of the vector should also be character (since vectors are atomic).

The `vector` and `character` functions are also used to create character vectors:

In [19]:
# Creating a character vector using the character function
vector(mode = "character", length = 10)

In [20]:
# Creating a character vector using the the character function
character(length = 5)

## Logical vectors

In R, logical vectors are the equivalent of boolean arrays in C/C++. They contain the values `TRUE` and `FALSE`. R is case sensitive so remember to use all caps when referring to boolean vectors in R.

In [21]:
# Creating a logical vector manually
x <- c(TRUE, FALSE, FALSE, TRUE)
x

As in the previous cases logical vectors can be created using the `vector` and `logical` function. In addition, note that the intialized value is always FALSE.

In [22]:
vector(mode = "logical", length = 10)

In [23]:
logical(length = 5)

`"T"` and `"F"` are shorthand for `"TRUE"` and `"FALSE"`:

In [24]:
T
F

## Complex vectors

R also has a type for complex numbers which can be created using the `vector` and `complex` functions.

# Exercise 1.2

**Question 1.**

Create two complex vectors of length 5 and 10 using the `vector` and `complex` functions assigning them to cVec1 and cVec2. Hint: ?complex

**Question 2.**

Search for help on the `rep` function and use it to create a vector `vec1` of length 9 containing 1, 2, 3 repeatedly. Now create a vector `vec2` containing `1, 2, 3` of length 3 . Sample 9 numbers using the `runif` function and assign it to `vec3` (accept the defaults of the `runif` function). What happens when you enter `set.seed(0)` before hand?

**Question 3**

Amend `vec3` in the previous question by using `set.seed(1)` before sampling 9 numbers using `runif` function (accept the defaults of the `runif` function.)

**Question 4**

Multiply `vec1` created previously with `vec3` using the multiply operator `"*"`. Now multiply `vec2` with `vec3`. What is happening here?

**Question 5**

There is a table called `iris` on your console. The first column is numeric and can be selected using `iris[,1]`. Evaluate the number of elements in this vector using the `length` function.

**Question 6**

Use R help to look up the `"rm"` function and use it to remove `vec1`, `vec2`, and `vec3`.

## Matrices and arrays

Matrices and arrays can be created containing any of the basic elements we just discussed. In R, the `matrix` function is used to create matrices, the `array` function is used to create arrays. Arrays can also be created by imposing a dimension on a vector using the `dim` function. You can think of vectors, matrices, and arrays of each type as being essentially the same construction underneath presented in a different way.

### Creating matrices

Matrices are created using the `matrix` function:

In [25]:
# Creating a matrix using a single value
matrix(1, nrow = 3, ncol = 1)

0
1
1
1


In [26]:
# Creating a matrix using repeated values
matrix(1:3, nrow = 3, ncol = 3)

0,1,2
1,1,1
2,2,2
3,3,3


In [27]:
# The use of the "byrow" parameter
matrix(1:3, nrow = 3, ncol = 3, byrow = TRUE)

0,1,2
1,2,3
1,2,3
1,2,3


In [28]:
# Initializing a matrix using a vector
matrix(runif(12), nrow = 3, ncol = 4)

0,1,2,3
0.7627644,0.36351269,0.5202621,0.9586567
0.1509929,0.14361724,0.3572365,0.210269
0.5558784,0.07595839,0.9163972,0.8896437


In [29]:
# Specifying only the number of columns or rows, R will make an assumption based on the number of items
# Notice the abbreviated use of "nc" for "ncol"
matrix(runif(10), nc = 2)

0,1
0.9767636,0.94389083
0.7390843,0.45050734
0.909489,0.70580405
0.1060696,0.03151858
0.182769,0.86224706


In [30]:
# Specifying the number of rows only
x <- matrix(runif(10), nr = 2)
x

0,1,2,3,4
0.9868853,0.1318381,0.57713,0.4412107,0.9614771
0.5055724,0.5383223,0.6960913,0.5174018,0.8925233


In [31]:
# The dimension of the matrix
dim(x)

In [32]:
# The number of rows
nrow(x)

In [33]:
# The number of colums
ncol(x)

### Creating arrays


In [34]:
# Creating arrays by coercing a vector
x <- seq(1, 27)
x
dim(x) <- c(3, 3, 3)
print(x)
class(x)

, , 1

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

, , 2

     [,1] [,2] [,3]
[1,]   10   13   16
[2,]   11   14   17
[3,]   12   15   18

, , 3

     [,1] [,2] [,3]
[1,]   19   22   25
[2,]   20   23   26
[3,]   21   24   27



In [35]:
# Creating arrays using the array function
x <- array(1:27, dim = rep(3,3))
print(x)

, , 1

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

, , 2

     [,1] [,2] [,3]
[1,]   10   13   16
[2,]   11   14   17
[3,]   12   15   18

, , 3

     [,1] [,2] [,3]
[1,]   19   22   25
[2,]   20   23   26
[3,]   21   24   27



In [36]:
# The dimension of the array
dim(x)

# Exercise 1.3

**Question 1**

Create a matrix with dimensions 4 columns and 10 rows populated with numbers from the `"runif"` function and assign it to `x`. Now use the matrix multiplication operator `"%*%"` to multiply the transpose (function `"t"`) of the matrix `x` by itself and assign the output to the variable `y`. Use the `dim` function to obtain the dimension of `y`. Now use the `"solve"` function to obtain the matrix inverse of `y`.

**Question 2**

The first 5 elements of a vector `x` can be selected using `x[1:5]`. A variable `letters` exists on your console. What is the `class`, `mode`, and `typeof` of this variable? Use the first 24 elements of this vector to create an array of dimension `c(2, 4, 3)` using the coercion method with `dim`. Create another array of the same dimension by reversing the first 24 elements of `letters` using the `rev` function and use the `array` function this time.

## Operators in R



Below are is a table describing operators in R

In [37]:
.operator <- c("( and {", "[ and [[", ":: and :::", "$ and @", "-, +, /, *", "!", "~", "?", ":",
               "^ or **", "%xx%", "%%", "%/%", "%*%", "%x%", "%in%", "<, >, ==, >=, <=, !=", "& &&", "| ||",
               "<-, ->,<<-, =")
.descriptions <- c("Function and expression parenthesis", "Subsetting and extraction", "Namespace variable access", 
                   "Subset lists, data.frames S4 objects", "Add, minus divide, multiply", "Logical negation",
                   " Formula operator", "Help", "Sequencing items/interactions", "Exponential", 
                   "User defined operator xx varies", "Modulus", "Integer divide", "Outer product", 
                   "Kronecker product", "Match operator", "Logical lt/gt, eq, gteq/lteq, not equal", "Logical AND", 
                   "Logical OR", "Assignments")
.DF <- data.frame(Operator = .operator, Description = .descriptions)
.DF

Operator,Description
( and {,Function and expression parenthesis
[ and [[,Subsetting and extraction
:: and :::,Namespace variable access
$ and @,"Subset lists, data.frames S4 objects"
"-, +, /, *","Add, minus divide, multiply"
!,Logical negation
~,Formula operator
?,Help
:,Sequencing items/interactions
^ or **,Exponential


### Comparison operators

Comparison operators return logical values:

In [38]:
# Equality operator
5/2 == 2.5

In [39]:
# Greater than and greater than or equal to
4 > 6
6 >= 30/6

### Logical operators

In [40]:
TRUE & TRUE

In [41]:
!TRUE

The `"&&"` and `"||"` operators are short-circuit operators

# Exercise 1.4

**Question 1**

Use the `sample` function to sample 50 numbers from 1 to 10 and name the output vec1. Hint: use the option `replacement = TRUE`.

** Question 2**

How many of the numbers in `vec1` are less than or equal to 7? Hint you'll also need the `sum` function.

**Question 3**

How many of the numbers in `vec1` are in `c(2, 5, 8)`. Hint use the `"%in%"` operator for comparing content.

**Question 4**

Create a numeric vector containing the numbers `23, 44, 32` and assign it to the variable `vec2`. Use the `names` function to get the names of the items in the vector. Now use the assign `"<-"` operator to change set the names to `"boyd"`, `"bright"`, `"kate"`. Print the vector and return the names again.

## Subsetting in vectors, matrices and arrays

It is useful to be able to extract parts of vectors, matrices and arrays as well as over-writing those parts. There are three main ways of indexing these data structures:

1. Using numeric vectors to select positions explicitly
2. Using logical vectors to subset
3. Using character vectors to subset by names.

The bracket operator `"["` is used for subsetting. 

### Subsetting vectors


In [1]:
# Consider the following vector
x <- sample(1:10, 20, replace = TRUE)
x

In [43]:
# We select items in position 1, 3, 5, 7 using a numeric vector
x[c(1, 3, 5, 7)]

In [4]:
# Negative values will omit selection
x[-(1:5)]

In [44]:
# We select items that are greater than 5
x[x > 5]

In [45]:
x <- sample(1:5, 10, replace = TRUE)
names(x) <- letters[1:10]
print(x)
names(x)

a b c d e f g h i j 
2 1 1 5 3 4 5 4 4 3 


In [46]:
# Selection by names
print(x[c("a", "c", "e", "g")])

a c e g 
2 1 3 5 


### Subsetting matrices

Subsetting matrices using integers

In [6]:
x <- matrix(1:20, nc = 4)
x

0,1,2,3
1,6,11,16
2,7,12,17
3,8,13,18
4,9,14,19
5,10,15,20


In [7]:
# Selecting rows 2, 4, 5
x[c(2, 4, 5), ] # empty index returns all values from that dimension

0,1,2,3
2,7,12,17
4,9,14,19
5,10,15,20


In [8]:
# Selecting columns 1, and 3
x[, c(1, 3)]

0,1
1,11
2,12
3,13
4,14
5,15


In [9]:
# Selecting rows 1 to 3 and columns 3 to 4
x[1:3, 3:4]

0,1
11,16
12,17
13,18


In [10]:
# Negative values omit row/column selection
x[-(1:2), -(3:4)]

0,1
3,8
4,9
5,10


In [51]:
# Single column selections collase to vectors by default
x[, 1]

Subsetting using logical vectors

In [52]:
# Selecting rows using logical vector
x[x[,1] > 2,]

0,1,2,3
3,8,13,18
4,9,14,19
5,10,15,20


### Subsetting arrays


In [53]:
x <- array(1:27, dim = rep(3, 3))
print(x)

, , 1

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

, , 2

     [,1] [,2] [,3]
[1,]   10   13   16
[2,]   11   14   17
[3,]   12   15   18

, , 3

     [,1] [,2] [,3]
[1,]   19   22   25
[2,]   20   23   26
[3,]   21   24   27



In [54]:
# Subsetting arrays using numeric indices
x[1,,]

0,1,2
1,10,19
4,13,22
7,16,25


In [55]:
print(x[, 1:2,])

, , 1

     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

, , 2

     [,1] [,2]
[1,]   10   13
[2,]   11   14
[3,]   12   15

, , 3

     [,1] [,2]
[1,]   19   22
[2,]   20   23
[3,]   21   24



In [56]:
print(x[,,1:2])

, , 1

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

, , 2

     [,1] [,2] [,3]
[1,]   10   13   16
[2,]   11   14   17
[3,]   12   15   18



# Exercise 1.5

**Question 1**

There is a dataset called `Titanic` loaded into your R session. What is its type? What is its dimension? Print it in your console.

**Question 2**

Select the part of the data where `Age` is `Child` and `Survived` is `No` and assign this to `mat1` using the `assign` function.

**Question 3**

Select all the men and women in 2nd and 3rd class from `mat1`. Then select all the women from the `Titanic` dataset.

**Question 4**

How many female adults did not survive? Hint use the `sum` function.

**Question 5**

Create a 2-D array or a matrix with `Sex` and `Class` where the number of people in each category represents all those that survived. Hint you can add matrices or arrays together using the `+` operator.

## Missing values (NA) in vectors

R supports missing values in all its basic types, and can be generically identified using the `"NA"` symbol. Underneath the hood, you should be aware that NA values are always typed:

In [57]:
# NA defaults as logical
typeof(NA)

In [58]:
# NA integer type
typeof(NA_integer_)

In [59]:
# NA real type
typeof(NA_real_)

In [60]:
# NA character type
typeof(NA_character_)

In [61]:
# NA complex type
typeof(NA_complex_)

However, for practical uses, you will only need to use `"NA"` and you will not see the other types given explicitly.

Note also that the availability of `"NA"` for all the basic types is quite unusual in programming languages. Since R is primarily a statistical programming language where the emphasis is to work with data that could easily have missing values, this feature is available and very useful.

In [70]:
x <- NA
# The is.na() function allows us to evaluate variables for NA
print(is.na(NA))

[1] TRUE


In [69]:
x <- 1:6
x[c(2, 4, 6)] <- NA
print(x)
# Returns a logical vector for NA comparison
print(is.na(x))
# Returns NA values from x
print(x[is.na(x)])

[1]  1 NA  3 NA  5 NA
[1] FALSE  TRUE FALSE  TRUE FALSE  TRUE
[1] NA NA NA


Note: `"NA"` should not be confused with `"NULL"` which is an undefined "placeholder" and has no type. It should also not be confused with `"NaN"` (not a number) specific to the `"numeric"` class.

# Exercise 1.6

**Question 1**

Sample 100 numbers from the uniform distribution with `min = -5` and `max = 5` and assign the output into `vec1`. Select 30 positions at random and insert `"NA"` into those positions.

**Question 2**

Return all the values in `vec1` that are not `"NA"` using subsetting.

**Question 3**

Replace the `"NA"` values in `vec1` with the `mean` of `vec1`. Hint: you may find the option `"na.rm = TRUE"` useful when using the `mean` function.

## Factor type in R

Factors are a type for storing categorical variables, you can think of them as a lookup. They are not a basic type and are composed of an `integer` array that denote the position and a `character` array (`levels`) that denotes the value. They are only available as elements in vectors a vector.

Consider the products below, `"productA"`, is more popular than `"productC"` which is more popular than `"productB"`:

In [64]:
products <- sample(c("productA", "productB", "productC"), 100, replace = TRUE, prob = c(3, 1, 2))

In [65]:
# The table() function returns counts
table(products)

products
productA productB productC 
      48       15       37 

In [66]:
# The unique() function returns unique values in the vector
unique(products)

In [67]:
# We can transform variables into factors using the factor function
# This ...
x <- factor(products, levels = c("productA", "productB", "productC"))
# is equivalent to this:
y <- factor(products)
sum(x != y)

In [68]:
# The levels of x:
levels(x)

# Exercise 1.7

**Question 1**

The chief marketing officer has decided to revive the flagging sales of `"productB"` by rebranding it as `"razorX"` in her marketing campaign. Use the `"<-"` operator to make this change to the `levels` in the `x` variable above.

**Question 2**

As part of your analysis, you are running analysis and would like `"razorX"` to be first item in the levels of the factor. Search for the `"relevel` function with R help and use this function to do this.

**Question 3**

The `"iris"` table has a column called `"Species"`. Subsetting on `data.frames` is very similar to subsetting on matrices. Use this column from `"iris"` to create a frequency `table` of the species. Which one has the highest frequency in the table?