<a href="https://colab.research.google.com/github/data-analytics-workshop/python/blob/master/005_case_study_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Case Study 2 - Time Series

Forecasting of House Price Index (HPI).

An ARIMA model is a class of statistical models for analyzing and forecasting time series data. The data used in this study are monthly Canberra house price trading index in 21 years from 1990 - 2011. The units are house price indexes and there are 253 observations.

To perform time series analysis we need libraries for data manipulation and visualization. Frist of all we should import the libraries needed.

**Import Libraries**

In [0]:
# Import Library for Data Manipulation
import pandas as pd
import numpy as np

In [0]:
# Import Library for Visualization
import seaborn as sns
import matplotlib.pyplot as plt

**Import Data**

We import the house price index data and sparse the date.

In [0]:
# Set Date Parser
mydateparser = lambda x: pd.datetime.strptime(x, "%d-%b-%y")

In [0]:
# Import Dataset
df_house = pd.read_csv('https://raw.githubusercontent.com/dianrdn/data/master/House-index-canberra.csv', sep = ',', parse_dates=[0], index_col=0, squeeze=True, date_parser=mydateparser)
df_house

**Explore Dataset**

The data is also plotted as a time series with the month along the x-axis and house price index figures on the y-axis.

In [0]:
# Explore Dataset
plt.rcParams['figure.figsize'] = (16, 9)
plt.style.use('ggplot')
df_house.plot()
plt.xlabel('year')
plt.ylabel('house price index')
plt.title('Overtime House Index')

We can see that the house price index dataset has a clear trend.

Let’s also take a quick look at an autocorrelation plot of the time series. This is also built-in to Pandas.

In [0]:
# Import Module
from pandas.plotting import autocorrelation_plot

# Visualize Autocorrelation Plot
autocorrelation_plot(df_house)
plt.show()

**Time Series - ARIMA Modeling**

Modeling ARIMA

First, we fit an ARIMA(5,1,0) model. This sets the lag value to 5 for autoregression, uses a difference order of 1 to make the time series stationary, and uses a moving average model of 0.

In [0]:
# Import Module
from statsmodels.tsa.arima_model import ARIMA

# Modeling and Show Result Summary
arima = ARIMA(df_house, order=(5,1,0))
arima_fit = arima.fit(disp=0)
print(arima_fit.summary())

Residuals

In [0]:
# Visualize Residuals Overtime
residuals = pd.DataFrame(arima_fit.resid)
residuals.plot()
plt.show()

In [0]:
# Visualize Residuals Distribution
residuals.plot(kind='kde')
plt.show()

In [0]:
# Show Residuals Description
residuals.describe()

Visualize Actual vs Predicted House Price Index

In [0]:
for t in range(len(test)):
	model = ARIMA(history, order=(5,1,0))
	model_fit = model.fit(disp=0)
	output = model_fit.forecast()
	yhat = output[0]
	predictions.append(yhat)
	obs = test[t]
	history.append(obs)
	print('predicted=%f, expected=%f' % (yhat, obs))

In [0]:
# Import Module
from sklearn.metrics import mean_squared_error

# Show Actual vs Predicted House Price Index
df_house_values = df_house.values
size = int(len(df_house_values) * 0.6)
train, test = df_house_values[0:size], df_house_values[size:len(df_house_values)]
history = [x for x in train]
predictions = list()
for t in range(len(test)):
	arima = ARIMA(history, order=(5,1,0))
	arima_fit = arima.fit(disp=0)
	output = arima_fit.forecast()
	predicted = output[0]
	predictions.append(predicted)
	actual = test[t]
	history.append(actual)
	print('predicted=%f, actual=%f' % (predicted, actual))
error = mean_squared_error(test, predictions)
print('Test MSE: %.3f' % error)

In [0]:
# Visualize Actual vs Predicted House Price Index
plt.plot(test, color='blue', label='actual')
plt.plot(predictions, color='red', label='prediction')
plt.xlabel('year')
plt.ylabel('house index')
plt.title('Overtime House Price Index')
plt.legend()
plt.show()