# Time Series Analysis: Tutorial 6

## Import packages

In [19]:
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')
plt.rc('text', usetex=True)
import pandas as pd
import statsmodels.api as sm
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.arima.model import ARIMAResults
from sklearn.linear_model import LinearRegression

## Box-Jenkins Method

The Box-Jenkins Method consists of the following steps:

1. Identification
2. Estimation
3. Diagnostic Checking
4. Forecasting

The steps 2 and 3 might be reiteraded until one has found a suitable model. We want to perform this procedure on a dataset which contains average house prices for the UK from January 1991 to May 2007.

### Data

In [1]:
# Load the dta-file into a dataframe, compute the relative price change and generate a time variable.
# Put everything together into one DataFrame.

In [2]:
# Plot both series with matplotlib. The relative price changes might have to be adapted in scale.

## 1. Identification

In [3]:
# Identify serial correlation via ACF and PACF.

In [5]:
# Compute the numerical values for the ACF and PACF.

In [5]:
# Just to refresh, let's calculate the PAC for lag k=3 via an OLS regression.

## 2. Estimation

In [4]:
# Estimate candidate models according to ACF and PACF.
# Let's try an AR(2) and ARMA(1,1).

# AR(2) via OLS

In [7]:
# AR(2) via ARIMA

## Remark!

The constants are different! This is due to the fact, that via OLS one estimates the actual constant $\delta$ in $y=\delta+\alpha x_1+\beta x_2+\varepsilon$. The ARIMA command instead gives $\mu=\frac{\delta}{1-\alpha-\beta}$ as the constant, which in reality is the unconditional mean.

In [8]:
# ARMA(1,1) via ARIMA

## 3. Diagnostic Checking

In [9]:
# Check the residuals of the AR(2) and ARMA(1,1). What should they look like?

In [10]:
# Check if the residuals are still serially correlated.
# ACF and PACF of AR(2) residuals

In [11]:
# ACF and PACF of ARMA(1,1)

In [12]:
# Significant spikes at lag k=2 suggest, that we should look for a different model.
# Let's try an ARMA(2,1)

In [13]:
# Restrict the model by dropping the insignificant ar.L1 term.

In [14]:
# Check the residuals again.

## Result

Which model do you choose?

## 4. Forecasting

In [15]:
# Estimate the AR(2), but without the last 5 observations.

In [16]:
# Compute forecasts five periods into the future.

In [17]:
# Plot them against the observed series. What can you observe?