# CHAPTER 3 - Deep Neural Networks for Time Series Forecasting the Easy Way

_pg. 31-50_
  
   
   
# PART 1 - Prepare the Data
_pg. 32-36_

## Getting the data from the internet  
This example uses data from the internet, which we need to download and clean first

In [None]:
import numpy as np
import pandas as pd
import urllib

In [None]:
# See p. 32
url = "http://ww2.amstat.org/publications/jse/datasets/COE.xls"

# !!! You need to update this location
loc = "/home/ubuntu/CSU/Notebooks/csu-2017/COE.xls"
urllib.urlretrieve(url, loc)

## Cleaning Up Downloaded Spreadsheet Files

In [None]:
Excel_file = pd.ExcelFile(loc)

## Worksheet Names

In [None]:
print Excel_file.sheet_names

In [None]:
spreadsheet = Excel_file.parse('COE data')
print spreadsheet.info()

In [None]:
data = spreadsheet['COE$']

## View `data` Values

In [None]:
print data.head()

## Adjusting Data

There are some errors in the data below, can you find them?

In [None]:
print spreadsheet['DATE'][193:204]

Use the following to fix the year errors...

In [None]:
spreadsheet.set_value(194, 'DATE', '2004-02-15')
spreadsheet.set_value(198, 'DATE', '2004-04-15')
spreadsheet.set_value(202, 'DATE', '2004-06-15')
print spreadsheet['DATE'][193:204]

## Saving the Data

As shown in the book, we can save the data for later use with the following code as a comma separated values `.csv` file.

In [None]:
# !!! You need to update this location
loc = "/home/ubuntu/CSU/Notebooks/csu-2017/COE.csv"
spreadsheet.to_csv(loc)

## Vizualizing the Data

Let's recreate Figure 3.1 from page 32.

This will give us a better intuition of the observed data, and then later we will have something to compare with our predictions.

In [None]:
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
matplotlib.style.use('ggplot')
plt.rcParams['figure.figsize'] = (8,6)

# Observed data
time = spreadsheet['DATE'].tolist()
price = data.tolist()

plt.plot(time, price, "-", label="COE Price", color="darkblue", linewidth='0.5')
plt.title("Figure 3.1: Certificate of Entitlement Price")
plt.xlabel("Date")
plt.ylabel("Singaporean dollars")
plt.legend()
plt.show()

# PART 2
_pg. 39-42_
## How to Scale the Input Attributes

In [None]:
from sklearn import preprocessing
x = data
scaler = preprocessing.MinMaxScaler(feature_range=(0,1))

In [None]:
print scaler

In [None]:
print type(x)

In [None]:
x = np.array(x).reshape(-1,1)
# NUMPY
# Why did we use -1 here?

In [None]:
print type(x)

## Log Transform

Here the author is transforming the data because as he says "it helps" with this specific set of data, __however__ this is __NOT__ a generally applicable statement.

In [None]:
x = np.log(x)
x[0:5]

## Scale `x`

In [None]:
x = scaler.fit_transform(x)

In [None]:
x = x.reshape(-1)
print x.shape

In [None]:
print np.min(x)

In [None]:
print np.max(x)

# PART 3
_pg. 42-49_

## Working with `statsmodels` Library

In [None]:
from statsmodels.tsa.stattools import pacf

In [None]:
x_pacf = pacf(x, nlags=5, method='ols')

In [None]:
print x_pacf

## Import `nnet_ts`

This example from the book uses the `theano` backend for `keras`, but this is not important.

Use `pip install theano` from the command-line if you have not already installed theano.

In [None]:
from nnet_ts import *
count = 0
ahead = 12
pred = list()

## The `while` Loop

We will use the same NN architecture as in Figure 2.5 on page 27.

In [None]:
while count < ahead:
    np.random.seed(2016)
    
    # Try to understand this line, where are we in the series?
    end = len(x) - ahead + count
    
    # Set the NN parameters
    fit1 = TimeSeriesNnet(hidden_layers=[7,3], activation_functions=["tanh", "tanh"])

    # What is the lag parameter doing?
    fit1.fit(x[0:end], lag=1, epochs=100)

    # What are we predicting?
    out = fit1.predict_ahead(n_ahead=1)
    
    print "Obs {:02d}: x={:0.4f}  prediction={:0.4f} ".format(count+1, round(x[count], 4), round(pd.Series(out), 4))
    pred.append(out)   
    count += 1

### QUESTION: Do the prediction values make sense?

## Realized and Predicted Values

Now we need to undo the scaling and log transformation we used to preprocess the data.

In [None]:
pred1 = scaler.inverse_transform(pred)
pred1 = np.exp(pred1)
print np.round(pred1, 1)

## Visualizing the Results

In [None]:
# FIGURE 3.6 (p. 49)
%matplotlib inline
plt.rcParams['figure.figsize'] = (8,6)

# Original Series
plt.plot(time[-12:], price[-12:], linestyle='solid', label="Observed", color="darkblue", linewidth='1')

# Predictions
plt.plot(time[-12:], pred1, linestyle='solid', label="Predicted", color="red", linewidth='1')

# Desired Tolerance
max_price = [p + 1500 for p in price]
min_price = [p - 1500 for p in price]
# Why are we using -12 for the index?
# Answer:
plt.plot(time[-12:], max_price[-12:], linestyle='solid', label="Tolerance", color="grey", linewidth='0.5')
plt.plot(time[-12:], min_price[-12:], linestyle='solid', color="grey", linewidth='0.5')

# Figure Settings
plt.title("Figure 3.6: Observed and predicted values for COE")
plt.xlabel("Date")
plt.ylabel("Singaporean dollars")
plt.ylim(9000,17000)
plt.legend()
plt.show()