# Hands-On Exercise 3.1: Working with Time Series

## Objectives

In this exercise, you will learn how to manipulate time series in R.

## Overview

You will use the `xts` package (along with others) to load, view, summarize and analyze two time series.

- Oil prices
- COVID cases and deaths

## Load libraries

Load the libraries you'll be using in this exercise.

In [None]:
library(dplyr)
library(lubridate)
library(readr)
library(TTR)
library(xts)

## Analyze oil prices

In RStudio, create a new script (e.g. `Ex31.R`). Add commands to the file according to the instructions that follow in this exercise, and execute each command as you move through the steps.

Read the oil price data (`data/brent_spot_price.csv`) as `oil_price_data`.

<font color="red">**Set the working directory to the course root folder using `setwd("/home/user/course/")`.**</font>

#### <font color="green">Solution...</font>

In [None]:
oil_price_data <- read_csv("data/brent_spot_price.csv")

Review the data.

#### <font color="green">Solution...</font>

In [None]:
View(oil_price_data)

Create an `xts` object (`oil_prices`) from `price` indexed on `month`.

#### <font color="green">Solution...</font>

In [None]:
oil_prices <- xts(oil_price_data$price, order.by=ym(oil_price_data$month))

Examine the `oil_prices` object.

#### <font color="green">Solution...</font>

In [None]:
str(oil_prices)

How many data points are there?

#### <font color="green">Solution...</font>

396

Examine the `head` and `tail` of the time series.

#### <font color="green">Solution...</font>

In [None]:
head(oil_prices)
tail(oil_prices)

Plot the time series.

#### <font color="green">Solution...</font>

In [None]:
plot(oil_prices)

Extract the prices for the year 2000.

#### <font color="green">Solution...</font>

In [None]:
oil_prices["2000"]

Extract the prices for the years 2000 through 2005.

#### <font color="green">Solution...</font>

In [None]:
oil_prices["2000/2005"]

Extract the prices for the years 2020 onwards.

#### <font color="green">Solution...</font>

In [None]:
oil_prices["2020/"]

Extract prices for the 1990s and 2010s. Store them in `oil_prices_1990s` and `oil_prices_2010s`, respectively.

#### <font color="green">Solution...</font>

In [None]:
oil_prices_1990s <- oil_prices["1990/1999"]
oil_prices_2010s <- oil_prices["2010/2019"]

Show the prices for the first three periods.

#### <font color="green">Solution...</font>

In [None]:
xts::first(oil_prices, 3)

Show the prices for the first two *years*.

#### <font color="green">Solution...</font>

In [None]:
xts::first(oil_prices, "2 years")

Make a copy of `oil_prices` called `tmp`. Remove the prices for 1991 (i.e set them to `NA`).

#### <font color="green">Solution...</font>

In [None]:
tmp <- oil_prices
tmp["1991"] <- NA

View `tmp`.

#### <font color="green">Solution...</font>

In [None]:
View(tmp)

Forward fill the 1991 prices using the last value from 1990.

#### <font color="green">Solution...</font>

In [None]:
na.locf(tmp)

Interpolate the missing 1991 values.

#### <font color="green">Solution...</font>

In [None]:
na.approx(tmp)

Combine the values for the 1990s (`oil_prices_1990s`) and 2010s (`oil_prices_2010s`).

#### <font color="green">Solution...</font>

In [None]:
rbind(oil_prices_2010s, oil_prices_1990s)

Note that the order doesn't matter. The data is lined up correctly.

Increase all the prices in 1999 by 10%.

#### <font color="green">Solution...</font>

In [None]:
oil_prices["1999"] * 1.1

Inner join the 1999 and 2000 prices.

#### <font color="green">Solution...</font>

In [None]:
oil_prices["1999"] + oil_prices["2000"]

This results in an _empty_ time series as there's no overlap in the time periods.

Calculate opening, high, low and closing oil prices for each year.

#### <font color="green">Solution...</font>

In [None]:
to.yearly(oil_prices)

## Analyze COVID 19 infections and deaths

Read the COVID 19 data (`data/covid_19.csv`) as `covid_data`.

#### <font color="green">Solution...</font>

In [None]:
covid_data <- read_csv("data/covid_19.csv")

Filter out everything _except_ the UK data.

In [None]:
covid_data <- filter(covid_data, iso_code == "GBR")

Create an `xts` times series from the `new_cases` field. Call it `new_cases_raw`. 

Visualize the `new_cases_raw` time series. 

#### <font color="green">Solution...</font>

In [None]:
new_cases_raw <- xts(covid_data$new_cases, order.by=covid_data$date)
plot(new_cases_raw)

Create an `xts` times series from the `new_cases_smoothed` field. Call it `new_cases`. 

Visualize the `new_cases` time series. 

#### <font color="green">Solution...</font>

In [None]:
new_cases <- xts(covid_data$new_cases_smoothed, order.by=covid_data$date)
plot(new_cases)

Recreate the smoothed time series from the raw data by applying a window function that calculates a 7-day rolling `mean`.

Plot this new time series.

#### <font color="green">Solution...</font>

In [None]:
ts <- rollapply(new_cases_raw, width = 7, FUN = mean)
plot(ts)

Compare this time series with the smoothed data from the original data set.

Create an `xts` times series from the `new_deaths_smoothed` field. Call it `new_deaths`. 

Visualize the `new_deaths` time series. 

#### <font color="green">Solution...</font>

In [None]:
new_deaths <- xts(covid_data$new_deaths_smoothed, order.by=covid_data$date)
plot(new_deaths)

Rescale the `new_cases` and `new_deaths` time series values to range between 0 and 1---i.e. put them on the same scale---by dividing by the maximum value in each of the time series.

Name the normalized time series `normalized_cases` and `normalized_deaths`, respectively.

#### <font color="green">Solution...</font>

In [None]:
normalized_cases <- new_cases / max(new_cases, na.rm=TRUE)
normalized_deaths <- new_deaths / max(new_deaths, na.rm=TRUE)

Combine the normalized series using `cbind` naming the result `covid_ts`.

#### <font color="green">Solution...</font>

In [None]:
covid_ts <- cbind(normalized_cases, normalized_deaths)

Visualize the two time series (`covid_ts`).

#### <font color="green">Solution...</font>

In [None]:
plot(covid_ts)

How long does it take for increases in cases to manifest as deaths?

Compare the distance between peaks.

Shift the deaths back by the number of days it takes for cases to result in deaths. Store the result in `shifted_deaths`.

#### <font color="green">Solution...</font>

In [None]:
shifted_deaths <- lag.xts(normalized_deaths, k = -14)

Combine and plot `normalized_cases` and `shifted_deaths`.

#### <font color="green">Solution...</font>

In [None]:
plot(cbind(normalized_cases, shifted_deaths))

Do they now line up? If not, try other values for the the $k$ parameter.

Visualize the rate of change of `new_cases`.

#### <font color="green">Solution...</font>

In [None]:
plot(ROC(new_cases))

Calculate the opening, closing, low and high numbers of new cases (`new_cases_raw`) for each week covered by the data.

#### <font color="green">Solution...</font>

In [None]:
to.weekly(new_cases_raw)

## Congratulations!

You have successfully manipulated time series using R.