# Hands-On Exercise 4.1: Visualizing Time Series Data

## Objectives

In this exercise, you will learn different ways of visualizing time series data in R.

## Overview

You will use a variety of packages and tools to visualize time series, rates of change and compare time series.

## Load libraries

Load the libraries you'll be using in this exercise.

In [None]:
library(corrplot)
library(dplyr)
library(forecast)
library(lubridate)
library(PerformanceAnalytics)
library(readr)
library(RColorBrewer)
library(TTR)

## Visualize champagne sales

In RStudio, create a new script (e.g. `Ex41.R`). Add commands to the file according to the instructions that follow in this exercise, and execute each command as you move through the steps.

Read the champagne sales data in `data/champagne_sales.csv` as `champagne_sales_data`.

<font color="red">**Set the working directory to the course root folder using `setwd("/home/user/course/")`.**</font>

#### <font color="green">Solution...</font>

In [None]:
champagne_sales_data <- read_csv("data/champagne_sales.csv")

Review the data.

#### <font color="green">Solution...</font>

In [None]:
View(champagne_sales_data)

This reports monthly sales of champagne from 1964 to 1972.

Create an `xts` object (`champagne_sales`) from `sales` indexed on `month`.

#### <font color="green">Solution...</font>

In [None]:
champagne_sales <- xts(
  champagne_sales_data$sales, 
  order.by=ym(champagne_sales_data$month)
)

Plot `champagne_sales` as a line chart.

#### <font color="green">Solution...</font>

In [None]:
plot(champagne_sales, main="Champagne sales")

Plot `champagne_sales` as a bar chart.

#### <font color="green">Solution...</font>

In [None]:
plot(champagne_sales, type="h", main="Champagne sales") 

Display a line chart of `champagne_sales` highlighting sales in 1969.

In [None]:
chart.TimeSeries(
  champagne_sales, 
  period.areas = c("1969"),
  period.color = "#0000FF22",
  event.lines = c("Jan 64"), # Required. Bug?
  event.labels = c("")       # Required. Bug?
)

## Visualize COVID 19 data

Read the COVID 19 data (`data/covid_19.csv`) as `covid_data`.

Filter it so that it only contains data for the UK (`iso_code` is `GBR`).

In [None]:
covid_data <- read_csv("data/covid_19.csv") |>
  filter(iso_code == "GBR")

Review the data.

#### <font color="green">Solution...</font>

In [None]:
View(covid_data)

This is daily COVID data including cases and deaths.

Create an `xts` object called `new_cases` from `new_cases_smoothed` indexed on `date`.

Create an `xts` object called `new_deaths` from `new_deaths_smoothed` indexed on `date`.

#### <font color="green">Solution...</font>

In [None]:
new_cases <- xts(covid_data$new_cases_smoothed, order.by=covid_data$date)
new_deaths <- xts(covid_data$new_deaths_smoothed, order.by=covid_data$date)

Compare `new_cases` and `new_deaths` by plotting them one above the other.

#### <font color="green">Solution...</font>

In [None]:
par(mfrow = c(2, 1))
plot(new_cases, main = "New cases")
plot(new_deaths, main = "New deaths")
dev.off()

# Alternatively

# lattice::xyplot(cbind(new_cases, new_deaths))

Note the dates of the peaks in the two charts.

Plot vertical lines for 10 Jan 2021 and 24 Jan 2021 on both plots.

#### <font color="green">Solution...</font>

In [None]:
plot_panel <- function(x, ...) {
  lines(x, ...)
  abline(v = as.Date("2021-01-10"), col = "red")
  abline(v = as.Date("2021-01-24"), col = "red")
}

plot.zoo(cbind(new_cases, new_deaths), main = "COVID 19", panel=plot_panel)

What does this suggest about the time taken for cases to result in deaths?

Calculate the rate of change of new cases (and assign it to `new_cases_change`).

#### <font color="green">Solution...</font>

In [None]:
new_cases_change <- ROC(new_cases)

Compare `new_cases` and `new_cases_changed` by plotting them one on top of the other.

#### <font color="green">Solution...</font>

In [None]:
lattice::xyplot(cbind(new_cases, new_cases_change))

Note that increases in cases are often the result of _sustained_ small growth.

Create a stack bar chart relating new cases and new deaths.

#### <font color="green">Solution...</font>

In [None]:
colors=c("red", "black")
covid <- merge(new_cases, new_deaths, all = FALSE)
barplot(covid["2020-03::2020-04"], col=colors)
graphics::legend(
  "topleft", 
  c("New deaths", "New cases"), 
  col=rev(colors), 
  lwd=5
)

## Visualize Amazon revenues

Read the Amazon revenue data (`data/amazon_revenue.csv`) as `amazon_revenue_data`.

#### <font color="green">Solution...</font>

In [None]:
amazon_revenue_data <- read_csv("data/amazon_revenue.csv")

Review the data.

#### <font color="green">Solution...</font>

In [None]:
View(amazon_revenue_data)

This table contains quarterly revenue data for Amazon.

Notably, it ends in Q3 2019---just before the pandemic.

Create an `xts` object called `amazon_revenue` from `revenue` indexed on `quarter`.

#### <font color="green">Solution...</font>

In [None]:
amazon_revenues <- xts(
  amazon_revenue_data$revenue, 
  order.by=ym(amazon_revenue_data$quarter)
)

Plot the revenue time series.

#### <font color="green">Solution...</font>

In [None]:
plot(amazon_revenues)

Calculate the rate of change for the revenue (assigning it to `amazon_revenue_change`).

#### <font color="green">Solution...</font>

In [None]:
amazon_revenue_change <- ROC(amazon_revenues)

Plot revenues above revenue change for comparison.

#### <font color="green">Solution...</font>

In [None]:
lattice::xyplot(cbind(amazon_revenues, amazon_revenue_change))

Growth slows in the later years. If you look closely at the revenue data, is appears to be flattening out.

The rate of change chart makes it much easier to see this.

Display a histogram of rate of change.

Overlay a density plot.

#### <font color="green">Solution...</font>

In [None]:
hist(amazon_revenue_change)
d <- density(amazon_revenue_change, na.rm=TRUE)
lines(d, col = "red", lwd = 2)

Cleary there's more positive change than negative change.

Calculate an autocorrelation plot of the rate of change.

#### <font color="green">Solution...</font>

In [None]:
Acf(
  amazon_revenue_change, 
  na.action = na.pass, 
  main = "Amazon returns ACF"
)

The seasonality of the data (4 periods) is apparent.

Decompose the _revenue_ data into its components and plot them.

#### <font color="green">Solution...</font>

In [None]:
decomposition <- decompose(ts(amazon_revenues, frequency = 4))
plot(decomposition)

## Visualize tech stocks

Read the tech stock data (`data/tech_stocks.csv`) as `tech_stock_data`.

#### <font color="green">Solution...</font>

In [None]:
tech_stock_data <- read_csv("data/tech_stocks.csv")

Review the data.

#### <font color="green">Solution...</font>

In [None]:
View(tech_stock_data)

This data contains the daily closing prices of 8 well-known tech stocks.

Create an `xts` object called `tech_stocks` from the data indexed on `date`.

#### <font color="green">Solution...</font>

In [None]:
tech_stocks <- xts(
  select(tech_stock_data, -date), 
  order.by = ymd(tech_stock_data$date)
)

Plot the tech_stocks.

#### <font color="green">Solution...</font>

In [None]:
plot(tech_stocks)

IBM is much older than the other companies, compressing the data to the right.

Display the data from 1995 onwards.

#### <font color="green">Solution...</font>

In [None]:
plot(tech_stocks["1995/"])

Calculate the returns for the stocks (i.e. rate of change) and assign to `tech_stock_returns`.

#### <font color="green">Solution...</font>

In [None]:
tech_stock_returns <- ROC(tech_stocks)

Create a boxplot comparing Apple and IBM returns.

#### <font color="green">Solution...</font>

In [None]:
boxplot(
  cbind(tech_stock_returns$aapl, tech_stock_returns$ibm), 
  horizontal = TRUE, 
  col = "red"
)

Apple appears to be the more volatile stock.

Calculate an autocorrelation plot of IBM stock returns.

#### <font color="green">Solution...</font>

In [None]:
Acf(
  tech_stock_returns$ibm, 
  na.action = na.pass, 
  main = "IBM returns ACF"
)

Are there any obvious patterns?

Assess the IBM returns for normality using a Q-Q plot.

#### <font color="green">Solution...</font>

In [None]:
qqnorm(tech_stock_returns$ibm, main = "IBM returns Q-Q plot")
qqline(tech_stock_returns$ibm, col = "red")

Plot IBM returns against Microsoft returns and fit a regression line.

#### <font color="green">Solution...</font>

In [None]:
plot(coredata(tech_stock_returns[, c("ibm", "msft")]))
abline(
  reg=lm(tech_stock_returns$ibm ~ tech_stock_returns$msft), 
  col="red", 
  lwd=2
)

Do they appear highly correlated?

Visualize pairwise scatter plots of all the tech stock returns.

#### <font color="green">Solution...</font>

In [None]:
pairs(coredata(tech_stock_returns))

Calculate the pairwise correlation matrix for the stocks. Assign it to `tech_stock_correlations`.

#### <font color="green">Solution...</font>

In [None]:
tech_stock_correlations <- cor(
  coredata(tech_stock_returns), 
  use = "pairwise.complete.obs"
)

View the correlations.

#### <font color="green">Solution...</font>

In [None]:
View(tech_stock_correlation)

Visualize the correlation matrix.

#### <font color="green">Solution...</font>

In [None]:
corrplot(tech_stock_correlations)

Visualize the correlation matrix as a heatmap.

#### <font color="green">Solution...</font>

In [None]:
corrplot(tech_stock_correlations, method = "color", type = "upper")

## Congratulations!

You have successfully visualized time series data in R.