<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Timeseries Forecasting with LSTM on Stock Data

_Authors: Kiefer Katovich (SF)_

---

In this lab you will practice building LSTMs with Keras to fit and predict on timeseries data. In particular, you will be pulling down stock price data using a matplotlib package (code provided).

This lab mirrors in large part the Keras LSTM code in the lecture, so please see that for reference.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime

sns.set_style('whitegrid')

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

### 1. Import the required Keras packages

In [2]:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from sklearn.preprocessing import MinMaxScaler

Using TensorFlow backend.


### 2. Pull down the stock price data using `matplotlib.finance`

The code to pull the stock data is provided for you. This code will pull the daily close prices for a variety of companies in the year 2010. 

In [3]:
from matplotlib.finance import quotes_historical_yahoo_ochl
import datetime

symbol_dict = {
    'TOT': 'Total',
    'XOM': 'Exxon',
    'CVX': 'Chevron',
    'COP': 'ConocoPhillips',
    'VLO': 'Valero Energy',
    'MSFT': 'Microsoft',
    'IBM': 'IBM',
    'TWX': 'Time Warner',
    'CMCSA': 'Comcast',
    'CVC': 'Cablevision',
    'YHOO': 'Yahoo',
    'DELL': 'Dell',
    'HPQ': 'HP',
    'AMZN': 'Amazon',
    'TM': 'Toyota',
    'CAJ': 'Canon',
    'MTU': 'Mitsubishi',
    'SNE': 'Sony',
    'F': 'Ford',
    'HMC': 'Honda',
    'NAV': 'Navistar',
    'NOC': 'Northrop Grumman',
    'BA': 'Boeing',
    'KO': 'Coca Cola',
    'MMM': '3M',
    'MCD': 'Mc Donalds',
    'PEP': 'Pepsi',
    'MDLZ': 'Kraft Foods',
    'K': 'Kellogg',
    'UN': 'Unilever',
    'MAR': 'Marriott',
    'PG': 'Procter Gamble',
    'CL': 'Colgate-Palmolive',
    'GE': 'General Electrics',
    'WFC': 'Wells Fargo',
    'JPM': 'JPMorgan Chase',
    'AIG': 'AIG',
    'AXP': 'American express',
    'BAC': 'Bank of America',
    'GS': 'Goldman Sachs',
    'AAPL': 'Apple',
    'SAP': 'SAP',
    'CSCO': 'Cisco',
    'TXN': 'Texas instruments',
    'XRX': 'Xerox',
    'LMT': 'Lookheed Martin',
    'WMT': 'Wal-Mart',
    'WBA': 'Walgreen',
    'HD': 'Home Depot',
    'GSK': 'GlaxoSmithKline',
    'PFE': 'Pfizer',
    'SNY': 'Sanofi-Aventis',
    'NVS': 'Novartis',
    'KMB': 'Kimberly-Clark',
    'R': 'Ryder',
    'GD': 'General Dynamics',
    'RTN': 'Raytheon',
    'CVS': 'CVS',
    'CAT': 'Caterpillar',
    'DD': 'DuPont de Nemours'}

symbols, names = np.array(list(symbol_dict.items())).T

# Choose a time period (2008 crash in middle!)
d1 = datetime.datetime(2010, 1, 1)
d2 = datetime.datetime(2011, 1, 1)

quotes = [quotes_historical_yahoo_ochl(symbol, d1, d2, asobject=True)
          for symbol in symbols]

stock_close = {symb:q.close for symb, q in zip(symbols, quotes)}


### 3. Convert the stock closes to a dataframe and perform first-order differencing

This converts the data into daily changes in stock price.

In [4]:
# A:

### 4. Select a handful of stock symbols to look at and subset the data.

In [5]:
# A:

### 5. Normalize the data to the range -1 to 1

In [6]:
# A:

### 6. Split the data into training and testing subsets

Make sure the test set is all future data (timepoints beyond the training data).

In [7]:
# A:

### 7. Create an X and Y matrix where the predictor matrix X are the changes lagged one behind Y.

A cool aspect of LSTMs and neural network architectures in general is that it is trivial to fit models with multiple outputs.

Here our target will be a *matrix* of tomorrows price changes, and the predictors will be a matrix of todays stock changes. Thus we are going to optimize the neural network to minimize the error predicting the next stock change on all of the stocks at the same time!

Create the X and Y below.

In [8]:
# A:

### 8. Reshape the training and testing X to be 3D

Recall that the LSTM takes a tensor in format:

    [observations, timesteps, features]
    
We lagged this only 1, so we will only have 1 timestep per feature (we will be using a stateful LSTM for fitting so we don't actually need more than 1 timestep).

> *Hint: the `np.reshape` function can add the 3rd dimension to your data.*

In [9]:
# A:

### 9. Build a Keras LSTM model

Below is some code that will build an LSTM model for this forecasting problem:

```python
model = Sequential()
# Remember: "batch_input_shape" and specify the batch size
model.add(LSTM(32, batch_input_shape=(1, 1, Xtrain.shape[2]), stateful=True))
model.add(Dense(Xtrain.shape[2]))
model.compile(loss='mean_squared_error', optimizer='adam')
model.summary()
```

Some things to note:
- The `stateful=True` indicates that our LSTM layer is stateful and will need to be manually reset. This allows us to feed in all the observations sequentially and allow the LSTM to (possibly) find signal in temporal patterns.
- We have 32 LSTM neurons. The `batch_input_shape=(1, 1, Xtrain.shape[2])` specifies the expected input to this layer. There will be 1 observation at a time, with 1 timestep, and as many features as there are in Xtrain.
- The `model.add(Dense(Xtrain.shape[2]))` adds the final output layer with as many output neurons as there are features. When we fit the model we will be predicting the matrix Y that has the same number of columns. This is the portion that allows us to fit on multiple outputs.

In [10]:
# A:

### 10. Fit the stateful LSTM model

Below is code to help you fit the model outlined above:

```python
for i in range(100):
    if (i % 5) == 0:
        print 'ITER:', i
        model.fit(Xtrain, Ytrain, nb_epoch=1, batch_size=1, verbose=1, shuffle=False)
    else:
        model.fit(Xtrain, Ytrain, nb_epoch=1, batch_size=1, verbose=0, shuffle=False)
    model.reset_states()
```

Here we will iterate over the entire training data 100 times. Each time we will fit the model without shuffling the data, feeding in single observations sequentially (batch_size=1).

The if-else statement inside allows us to print out every 5 iterations. Notice the `model.reset_states()`. This is where we manually clear the internal state of the LSTM neurons after each pass through the data.

In [11]:
# A:

### 11. Plot the forecast on the testing data

Plot out the testing data and the forecast from the LSTM prediction. You will need to predict on the X test data.

Remember that the LSTM is now making a prediction for each of the stocks, so the prediction will be a matrix. To plot an individual stock, you will need to pull out the column corresponding to that stock.

> **Note:** as this is a fairly challenging plotting problem, I have provided template plotting code below. It is similar to the plotting code outlined in the LSTM forecasting lecture.

In [12]:
# from sklearn.metrics import r2_score

# def plot_all(model, Xtrain, Xtest, Ytest, bdfn, bank_names):
#     train_hat = model.predict(Xtrain, batch_size=1)
#     model.reset_states()
#     test_hat = model.predict(Xtest, batch_size=1)
#     model.reset_states()
    
#     train_plot = np.empty_like(bdfn.values)
#     train_plot[:, :] = np.nan
#     train_plot[1:Xtrain.shape[0]+1, :] = train_hat
    
#     test_plot = np.empty_like(bdfn.values)
#     test_plot[:, :] = np.nan
#     test_plot[-Xtest.shape[0]:, :] = test_hat
    
#     fig, axarr = plt.subplots(Ytest.shape[1], 1, figsize=(12,24))
    
#     for i in range(Ytest.shape[1]):
#         axarr[i].plot(bdfn.iloc[:,i], color='grey', alpha=0.7)
#         axarr[i].plot(train_plot[:,i])
#         axarr[i].plot(test_plot[:,i])
#         test_r2 = r2_score(Ytest[:,i], test_hat[:,i])
#         axarr[i].set_title(bank_names[bdfn.columns[i]]+' test R2: '+str(test_r2))
        
#     plt.show()

In [13]:
# A:

### 12. [Bonus] Build another LSTM model of your choice.

There are many ways you could change things up. Consider:
1. Changing the stocks in the predictor matrix or target.
2. Adding layers to the neural network architecture (check out the Keras documentation or examples!)

In [14]:
# A: