###Matrices

In R, a matrix is a collection of elements of the same data type (numeric, character, or logical) arranged into a fixed number of rows and columns. Since you are only working with rows and columns, a matrix is called two-dimensional.

You can construct a matrix in R with the matrix() function. Consider the following example: matrix(1:9, byrow = TRUE, nrow = 3, ncol = 3)

In the matrix() function:

The first argument is the collection of elements that R will arrange into the rows and columns of the matrix. Here, we use 1:9 which constructs the vector c(1, 2, 3, 4, 5, 6, 7, 8, 9).

The argument byrow indicates that the matrix is filled by the rows. This means that the matrix is filled from left to right and when the first row is completed, the filling continues on the second row. If we want the matrix to be filled by the columns, we 
just place byrow = FALSE.

The third argument nrow indicates that the matrix should have three rows.

The fourth argument ncol indicates the number of columns that the matrix should have

#####Instructions
Construct a matrix with 5 rows and 4 columns containing the numbers 1 up to 20 and assign it to the variable m. Specify the byrow argument to be TRUE


In [9]:
# Construction of a matrix with 5 rows that contain the numbers 1 up to 20 and assign it to m
m<-matrix(1:20, byrow = TRUE, nrow = 5, ncol = 4)

# print m to the console
m

0,1,2,3
1,2,3,4
5,6,7,8
9,10,11,12
13,14,15,16
17,18,19,20


###Factors

In this exercise you dive into the wonderful world of factors.

The term factor refers to a statistical data type used to store categorical variables. The difference between a categorical variable and a continuous variable is that a categorical variable can belong to a limited number of categories. A continuous variable, on the other hand, can correspond to an infinite number of values.

It is important that R knows whether it is dealing with a continuous or a categorical variable, as the statistical models you will develop in the future treat both types differently.

A good example of a categorical variable is the variable student_status. An individual can either be "student" or "not student". This means that "student" and "not student" are two values of the categorical variable student_status and every observation can be assigned one of these values. We can do this using the factor function.

#####Instructions
Turn the vector student_status into a factor and put this in a variable called categorical_student
Print the variable categorical_student

In [10]:
# a vector called student_status
student_status <- c("student", "not student", "student", "not student")

# turn student_status into a factor and save it in the variable categorical_student
categorical_student<-as.factor(student_status)

# print categorical_student to the console
categorical_student

###Dataframes: What's a data frame?

You may remember the matrix, a multi-dimensional object that we discussed earlier. All the elements that you put in a matrix should be of the same type. However, when performing a market research survey, you often have questions such as:

'Are your married?' or 'yes/no' questions (= boolean data type)

'How old are you?' (= numeric data type)

'What is your opinion on this product?' or other 'open-ended' questions (= character data type)

The output, namely the respondents' answers to the questions formulated above, is a data set of different data types. You will often find yourself working with data sets that contain different data types instead of only one. A data frame has the variables of a data set as columns and the observations as rows. This will be a familiar concept for those coming from different statistical software packages such as SAS or SPSS.

####Inspecting dataframes

There are several functions you can use to inspect your dataframe. To name a few

head: this by default prints the first 6 rows of the dataframe

tail: this by default prints the last 6 rows to the console

str: this prints the structure of your dataframe

dim: this by default prints the dimensions, that is, the number of rows and columns of your dataframe

colnames: this prints the names of the columns of your dataframe

####Constructing a dataframe yourself

Since using built-in data sets is not even half the fun of creating your own data sets, the rest of this chapter is based on your personally developed data set.

As a first goal, you want to construct a data frame that describes the main characteristics of eight planets in our solar system. The main features of a planet are:

The type of planet (Terrestrial or Gas Giant).

The planet's diameter relative to the diameter of the Earth.

The planet's rotation across the sun relative to that of the Earth.

If the planet has rings or not (TRUE or FALSE).

You construct a data frame with the data.frame() function. As arguments, you should provide the above mentioned vectors as input that should become the different columns of that data frame. Therefore, it is important that each vector used to construct a data frame has an equal length. But do not forget that it is possible (and likely) that they contain different types of data.

#####Instructions
Use the function data.frame() to construct a data frame. Call this variable planet_df.

In [11]:
# planets vector
planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")

# type vector
type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet", "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant")

# diameter vector
diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883)

# rotation vector
rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67)

# rings vector
rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)

# construct a dataframe planet_df from all the above variables
planet_df<-data.frame(planets,type,diameter,rotation,rings)

####Indexing and selecting columns from a dataframe

In the same way as you indexed your vectors, you can select elements from your dataframe using square brackets. Different from dataframes however, you now have multiple dimensions: rows and columns. That's why you can use a comma in the middle of the brackets to differentiate between rows and columns. For instance, the following code planet_df[1,2] would select the element in the first row and the second column from the dataframe planet_df.

You can also use the \$ operator to select an entire column from a dataframe. For instance, planet_df$planets would select the entire planets column from the dataframe planet_df.

#####Instructions
Select the elements in the first row, and the second and third column from planet_df

Select the entire third column from planet_df

In [12]:
# select the values in the first row and second and third columns
planet_df[1,c(2,3)]
# select the entire third column
planet_df[,3]

Unnamed: 0,type,diameter
1,Terrestrial planet,0.382


###Lists

A list in R is similar to your to-do list at work or school: the different items on that list most likely differ in length, characteristic, type of activity that has to do be done.

A list in R allows you to gather a variety of objects under one name (that is, the name of the list) in an ordered way. These objects can be matrices, vectors, data frames, even other lists, etc. It is not even required that these objects are related to each other.

You can easily construct a list using the list() function. In this function you can wrap the different elements like so: list(item1, item2, item3).

#####Instructions
Put the objects my_vector, my_matrix and my_df into a list called my_list

Make sure to print my_list

In [13]:
# Vector with numerics from 1 up to 10
my_vector <- 1:10 

# Matrix with numerics from 1 up to 9
my_matrix <- matrix(1:9, ncol = 3)

# First 10 elements of the built-in data frame 'mtcars'
my_df <- mtcars[1:10,]

# Construct my_list with these different elements:
my_list<-list(my_vector,my_matrix,my_df)

# print my_list to the console
my_list

0,1,2
1,4,7
2,5,8
3,6,9

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1
Duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4
Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
Merc 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4


###Selecting elements from a list

Your list will often be built out of numerous elements and components. Therefore, getting a single element, multiple elements, or a component out of it is not always straightforward. One way to select a component is using the numbered position of that component. For example, to "grab" the first component of my_list you type my_list[[1]]

Another way to check is to refer to the names of the components: my_list[["my_vector"]] selects the my_vector vector.

A last way to grab an element from a list is using the \$ sign. The following code would select my_df from my_list: my_list$my_df.

Besides selecting components, you often need to select specific elements out of these components. For example, with my_list[[1]][1] you select from the first component of my_list the first element. This would select the number 1.

#####Instructions
Grab the second element of my_list and print it to the console
Grab the first column of the third component of my_list and print it to the console

In [14]:
# Vector with numerics from 1 up to 10
my_vector <- 1:10 

# Matrix with numerics from 1 up to 9
my_matrix <- matrix(1:9, ncol = 3)

# First 10 elements of the built-in data frame 'mtcars'
my_df <- mtcars[1:10,]

# Construct list with these different elements:
my_list <- list(my_vector, my_matrix, my_df)

# Grab the second element of my_list and print it to the console
my_list[[2]]

# Grab the first column of the third component of `my_list` and print it to the console
my_list[[3]][,1]

0,1,2
1,4,7
2,5,8
3,6,9
