# Time Series Analysis With R

## Objectives

This course is a practical introduction to time series analysis with R.

It will introduce students to:

*   The specificity of time series data;
*   The free statistical software R to conduct time series analysis;
* Some of the main univariate and multivariate techniques to analyze time series data.

At the end of the course, the students are expected to know the specificity of time series data and to be able to use R to perform simple time series analysis by applying the techniques described during the course.

## Lectures

Structure of the course:



*  Theoretical concepts: this part of the course will introduce students to the main theoretical concepts of time series analysis;
*   R Tutorial: this part of the course consists in a hands-on tutorial on the R functions necessary to perform time series analysis. Every part of a time series analysis project will be taken into account, including data wrangling, visual representation, and statistical analysis;
* Individual/Group work: this part of the course consists in individual and group work based on the application of the theoretical and practical knowledge described in the previous part of the course

# Getting started with R

## RStudio Interface and Data

### Download and Install RStudio

This course is based on the statistical software R. R is easier to use in the development environment RStudio (it works on both Windows, Apple, and other OS).

It is possible to download a free version of RStudio Desktop from the official websites.

You might also use a free online version of RStudio by registering to the RStudio Cloud free plan. However, the free plan gives you just 15 hours per months. Our lessons take 4.5 hours per month, and since you also need to practice, the best choice is to install RStudio and R on your computer.

Now we are going to see how to get started with RStudio Desktop (or sign in with Google Colab).

First, download and install a free version of RStudio Desktop and open the software.

### Create a RStudio Project and Import data

When starting a data analysis project with RStudio, we create a new dedicated environment where we will keep all the scripts (files containing the code to perform the analysis), data sets, and outputs of the analysis (such as plots and tables). This dedicated work-space is simply called a project.

To create a new project with RStudio, follows these steps:

click on File (on the top left);
then, click on New Project;
select New Directory, and New Project;
choose a folder for the project, and give a name to your project. You can use the name Time-Series-Analysis-With-R;
In this way, it will be created a new folder for the project, in the main folder specified in the previous step. In this folder, you will find a file .Rproj, the name of which is the name you assigned to your project. To work on this project, you just need to open the .Rproj file.

### Create a Script

Once the project has been created, we can open a new script and save it.

A script is a file containing code. We can create a first script named basic-r-syntax, where you will test the basic code we are going to see. The script will be saved with extension .r.

You can open, change, and save the file every time you work on it. To save your code is important, otherwise you would have to write the same code every time you work on the project!

## Basic R

### Objects

An object is an R entity composed of a name and a value.

The arrow (<-) sign is used to create objects and assign a value to an object (or to change or “update” its previous value).

Example: create an object with name “object_consisting_of_a_number” and value equal 2:



In [None]:
object_consisting_of_a_number <- 2

Enter the name of the object in the console and run the command: the value assigned to the object will be printed.

In [None]:
object_consisting_of_a_number

The object is equal to its value. Therefore, for instance, an object with a numerical value can be used to perform arithmetical operations.

In [None]:
object_consisting_of_a_number * 10

The value of an object can be transformed:

In [None]:
object_consisting_of_a_number <- object_consisting_of_a_number * 10

object_consisting_of_a_number

An object can also represent a **function**.

Example: create an object for the sum (addition) function:

In [None]:
function_sum <- function(x, y){
  z <- x+y
  return(z)
}

The function can now be applied to two numerical values:

In [None]:
function_sum(5,2)

Actually, we don’t need this function, since mathematical functions are already implemented in R.

In [None]:
sum(5, 2)

In [None]:
5 + 7

In [None]:
2 * 3

In [None]:
3^2

In [None]:
sqrt(25)

The value of an object can be a number, a function, but also a vector. Vectors are sequences of values.

In [None]:
vector_of_numbers <- c(1,   7,3,4,5,6,7,8,23,21)

In [None]:
vector_of_numbers

A vector of numbers can be the argument of mathematical operations.

In [None]:
vector_of_numbers * 2

In [None]:
vector_of_numbers + 3

Other R objects are matrix, list, and data.frame.

A matrix is a table composed of rows and columns containing only numerical values.

In [None]:
a_matrix <- matrix(data = 1:60, nrow = 10, ncol = 6)

a_matrix

0,1,2,3,4,5
1,11,21,31,41,51
2,12,22,32,42,52
3,13,23,33,43,53
4,14,24,34,44,54
5,15,25,35,45,55
6,16,26,36,46,56
7,17,27,37,47,57
8,18,28,38,48,58
9,19,29,39,49,59
10,20,30,40,50,60


A list is just a list of other objects. For instance, this list includes a numerical value, a vectors of numbers, and a matrix.

In [None]:
a_list <- list( object_consisting_of_a_number,vector_of_numbers, a_matrix)

a_list

0,1,2,3,4,5
1,11,21,31,41,51
2,12,22,32,42,52
3,13,23,33,43,53
4,14,24,34,44,54
5,15,25,35,45,55
6,16,26,36,46,56
7,17,27,37,47,57
8,18,28,38,48,58
9,19,29,39,49,59
10,20,30,40,50,60


A data.frame is like a matrix that can contain numbers but also other types of data, such as characters (a textual type of data), or factors (unordered categorical variables, such as gender, or ordered categories, such as low, medium, high).

Data sets are usually stored in data.frame. For instance, if you import a csv or an Excel file in R, the corresponding R object is a data.frame.

In [None]:
# this is an object (vector) consisting of a series of numerical values
numerical_vector <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
numerical_vector

In [None]:
# this is another object (vector) consisting of a series of categorical values
categorical_vector <- c("Monday", "Tuesday", "Monday", "Tuesday", "Monday", "Wednesday","Thursday", "Wednesday", "Thursday", "Saturday", "Sunday", "Friday", "Saturday", "Sunday")
categorical_vector

In [None]:
# this is an object consisting of a data.frame, created combining vectors through the function "data.frame"
a_dataframe <- data.frame("first_variable" = numerical_vector,
                          "second_variable" = categorical_vector)
a_dataframe

first_variable,second_variable
<dbl>,<chr>
1,Monday
2,Tuesday
3,Monday
4,Tuesday
5,Monday
6,Wednesday
7,Thursday
8,Wednesday
9,Thursday
10,Saturday


To access a specific column of a data.frame, you can use the name of the data.frame, the dollar symbol $, and the name of the column.

In [None]:
a_dataframe$first_variable

In [None]:
a_dataframe$second_variable

It is possible to add columns to a data.frame by writing:

the name of the data.frame
the dollar sign
a name for the new column
the arrow sign <-
a vector of values to be stored in the new column (it has to have length equal to the other vectors composing the data.frame)

In [None]:
a_dataframe$a_new_variable <- c(12, 261, 45, 29, 54, 234, 45, 42, 6, 267, 87, 3, 12, 9)

In [None]:
a_dataframe

first_variable,second_variable,a_new_variable
<dbl>,<chr>,<dbl>
1,Monday,12
2,Tuesday,261
3,Monday,45
4,Tuesday,29
5,Monday,54
6,Wednesday,234
7,Thursday,45
8,Wednesday,42
9,Thursday,6
10,Saturday,267


It is possible to visualize the first few rows of a data.frame by using the function head.

In [None]:
head(a_dataframe)

Unnamed: 0_level_0,first_variable,second_variable,a_new_variable
Unnamed: 0_level_1,<dbl>,<chr>,<dbl>
1,1,Monday,12
2,2,Tuesday,261
3,3,Monday,45
4,4,Tuesday,29
5,5,Monday,54
6,6,Wednesday,234


## Functions

A function is a coded operation that applies to an object (e.g.: a number, a textual feature etc.) to transform it based on specific rules. A function has a name (the name of the function) and some arguments. Among the arguments of a function there is always an object or a value, for instance a numerical value, which is the content the function is applied to, and other possible arguments (either mandatory or optional).

Functions are operations applied to objects that give a certain output. E.g.: the arithmetical operation “addition” is a function that applies to two or more numbers to give, as its output, their sum. The arguments of the “sum” function are the numbers that are added together.

The name of the function is written out of parentheses, and the arguments of the function inside the parentheses:

In [None]:
sum(5, 3)

Arguments of functions can be numbers but also textual features. For instance, the function paste creates a string composed of the strings that it takes as arguments.

In [None]:
paste("the", "cat", "is", "at", "home")

In R you can sometimes find a “nested” syntax, which can be confusing. The best practice is to keep things as simple as possible.

In [None]:
# this comment, written after the hash mark, describe what is going on here: two "paste" function nested together have been used (improperly! because they make the code more complicated than necessary) to show how functions can be nested together. It would have been better to use the "paste" function just one time!
paste(paste("the", "cat", "is", "at", "home"), "and", "sleeps", "on", "the", "sofa")

To sum up, functions manipulate and transform objects. Data wrangling, data visualization, as well as data analysis, are performed through functions. (This is for the next chapter lesson)

## Data Types

Variables can have different R formats, such as:



*  double: numbers that include decimals (0.1, 5.676, 121.67). This format is appropriate for continuous variables;
*   integer: such as 1, 2, 3, 10, 400. It is a format suitable to count data;
* factors: for categorical variables. Factors can be ordered (e.g.: level of agreement: “high”, “medium”, “low”), or not (e.g.: hair colors “blond”, “dark brown”, “brown”);
* characters: textual labels;
* logicals: the format of logical values (i.e.: TRUE and FALSE)
* dates: used to represent days;
* POSIX: a class of R format to represent dates and times

