## Lab 11: Recurrent and LSTM Neural Networks

#### CSC 215 Artificial Intelligence (Spring 2019)

#### Dr. Haiquan Chen, California State University, Sacramento

# Helpful Functions for Tensorflow (Little Gems)

The following functions will be used with TensorFlow to help preprocess the data.  They allow you to build the feature vector for a neural network. 

* Predictors/Inputs 
    * Fill any missing inputs with the median for that column.  Use **missing_median**.
    * Encode textual/categorical values with **encode_text_dummy**.
    * Encode numeric values with **encode_numeric_zscore**.
* Output
    * Discard rows with missing outputs.
    * Encode textual/categorical values with **encode_text_index**.
    * Do not encode output numeric values.
* Convert dataframe to numpy array to create feature vectors (x) and expected output (y) with **to_xy**.

In [1]:
import collections
from sklearn import preprocessing
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import shutil
import os


# Encode text values to dummy variables(i.e. [1,0,0],[0,1,0],[0,0,1] for red,green,blue)
def encode_text_dummy(df, name):
    dummies = pd.get_dummies(df[name])
    for x in dummies.columns:
        dummy_name = "{}-{}".format(name, x)
        df[dummy_name] = dummies[x]
    df.drop(name, axis=1, inplace=True)


# Encode text values to indexes(i.e. [1],[2],[3] for red,green,blue).
def encode_text_index(df, name):
    le = preprocessing.LabelEncoder()
    df[name] = le.fit_transform(df[name])
    return le.classes_


# Encode a numeric column as zscores
def encode_numeric_zscore(df, name, mean=None, sd=None):
    if mean is None:
        mean = df[name].mean()

    if sd is None:
        sd = df[name].std()

    df[name] = (df[name] - mean) / sd


# Convert all missing values in the specified column to the median
def missing_median(df, name):
    med = df[name].median()
    df[name] = df[name].fillna(med)


# Convert all missing values in the specified column to the default
def missing_default(df, name, default_value):
    df[name] = df[name].fillna(default_value)


# Convert a Pandas dataframe to the x,y inputs that TensorFlow needs
def to_xy(df, target):
    result = []
    for x in df.columns:
        if x != target:
            result.append(x)
    # find out the type of the target column. 
    target_type = df[target].dtypes
    target_type = target_type[0] if isinstance(target_type, collections.Sequence) else target_type
    # Encode to int for classification, float otherwise. TensorFlow likes 32 bits.
    if target_type in (np.int64, np.int32):
        # Classification
        dummies = pd.get_dummies(df[target])
        return df[result].values.astype(np.float32), dummies.values.astype(np.float32)
    else:
        # Regression
        return df[result].values.astype(np.float32), df[target].values.astype(np.float32)

# Nicely formatted time string
def hms_string(sec_elapsed):
    h = int(sec_elapsed / (60 * 60))
    m = int((sec_elapsed % (60 * 60)) / 60)
    s = sec_elapsed % 60
    return "{}:{:>02}:{:>05.2f}".format(h, m, s)


# Regression chart.
def chart_regression(pred,y,sort=True):
    t = pd.DataFrame({'pred' : pred, 'y' : y.flatten()})
    if sort:
        t.sort_values(by=['y'],inplace=True)
    a = plt.plot(t['y'].tolist(),label='expected')
    b = plt.plot(t['pred'].tolist(),label='prediction')
    plt.ylabel('output')
    plt.legend()
    plt.show()

# Remove all rows where the specified column is +/- sd standard deviations
def remove_outliers(df, name, sd):
    drop_rows = df.index[(np.abs(df[name] - df[name].mean()) >= (sd * df[name].std()))]
    df.drop(drop_rows, axis=0, inplace=True)


# Encode a column to a range between normalized_low and normalized_high.
def encode_numeric_range(df, name, normalized_low=-1, normalized_high=1,
                         data_low=None, data_high=None):
    if data_low is None:
        data_low = min(df[name])
        data_high = max(df[name])

    df[name] = ((df[name] - data_low) / (data_high - data_low)) \
               * (normalized_high - normalized_low) + normalized_low


# Data Structure for Recurrent Neural Networks

### RNN is good at predicting something over a sequnce of vectors

For example, we might take as input a stock price and volume, to predict if we should buy (1), sell (-1), or hold (0).

In [2]:
x = [
    [32,1383],
    [41,2928],
    [39,8823],
    [20,1252],
    [15,1532]
]

y = [
    1,
    -1,
    0,
    -1,
    1
]

print(x)
print(y)

[[32, 1383], [41, 2928], [39, 8823], [20, 1252], [15, 1532]]
[1, -1, 0, -1, 1]


Put data to a data frame.

In [3]:
# from IPython.display import display, HTML
import pandas as pd
import numpy as np

x = np.array(x)

df = pd.DataFrame({'price':x[:,0], 'volume':x[:,1], 'y':y})
df

Unnamed: 0,price,volume,y
0,32,1383,1
1,41,2928,-1
2,39,8823,0
3,20,1252,-1
4,15,1532,1


Now we get to ***sequence*** format.  We want to predict something over a sequence, so the data format needs to add a dimension.  

### Notice that x should be of 3 dimensions.

In [4]:
x = [
    [[32,1383],[41,2928],[39,8823],[20,1252],[15,1532]],
    [[35,8272],[32,1383],[41,2928],[39,8823],[20,1252]],
    [[37,2738],[35,8272],[32,1383],[41,2928],[39,8823]],
    [[34,2845],[37,2738],[35,8272],[32,1383],[41,2928]],
    [[32,2345],[34,2845],[37,2738],[35,8272],[32,1383]],
]

y = [
    1,
    -1,
    0,
    -1,
    1
]

print(x)
print(y)

[[[32, 1383], [41, 2928], [39, 8823], [20, 1252], [15, 1532]], [[35, 8272], [32, 1383], [41, 2928], [39, 8823], [20, 1252]], [[37, 2738], [35, 8272], [32, 1383], [41, 2928], [39, 8823]], [[34, 2845], [37, 2738], [35, 8272], [32, 1383], [41, 2928]], [[32, 2345], [34, 2845], [37, 2738], [35, 8272], [32, 1383]]]
[1, -1, 0, -1, 1]


Even if there is only one feature (stock price), the 3rd dimension must be used:

In [5]:
x = [
    [[32],[41],[39],[20],[15]],
    [[35],[32],[41],[39],[20]],
    [[37],[35],[32],[41],[39]],
    [[34],[37],[35],[32],[41]],
    [[32],[34],[37],[35],[32]],
]

y = [
    1,
    -1,
    0,
    -1,
    1
]

print(x)
print(y)

[[[32], [41], [39], [20], [15]], [[35], [32], [41], [39], [20]], [[37], [35], [32], [41], [39]], [[34], [37], [35], [32], [41]], [[32], [34], [37], [35], [32]]]
[1, -1, 0, -1, 1]


# Recurrent Neural Networks

So far the neural networks that we’ve examined have always had forward connections.  This manner to connect layers is the reason that ***these networks are called “feedforward.”***  


In Recurrent neural networks, "backward/recurrent connections" are also allowed. A "backward/recurrent connection" occurs when a connection is formed between a neuron and a neuron at the same level or a neuron at a previous level.


Most recurrent neural network architectures maintain "state" in the recurrent connections.  ***A recurrent neural network’s state acts as a short-term memory (context) for the neural network.***  Consequently, a recurrent neural network will not always produce the same output for a given input.

# Understanding LSTM

Long Short Term Neural Network (LSTM) are ***a type of recurrent unit***.  For TensorFlow, LSTM is provided as a layer type that can be combined with other layer types, such as dense.  

https://keras.io/layers/recurrent/


 ***The following diagram shows an LSTM unit over three time slices***: the current time slice (t), as well as the previous (t-1) and next (t+1) slice:

![LSTM Layers](images/lab11_lstm1.png "LSTM Layers")

The values $\hat{y}$ are the output from the unit, the values ($x$) are the input to the unit and the values $c$ are the context values.  Both the output and context values are always fed to the next time slice. 



### Dropout in RNN

Tensorflow privide parameters for you to use dropout in RNN.  In LSTM, you can define two types of dropout. 

***Regular dropout***. Applied on the inputs and/or the outputs.  They mask (or "drop") the vertical connections from x_t and to h_t in the picture below. 

***Recurrent dropout***. Recurrent dropout masks (or "drops") the horizontal connections between the recurrent units in the picture below.


![LSTM Layers](images/lab11_lstm3.png "Dropout in LSTM")

# LSTM Example for Classification

The following code creates the LSTM network.  This is an example of RNN classification.  The following code trains on a data set (x) with a max sequence size of 6 (columns) and 6 training elements (rows)

In [7]:
import numpy as np
import keras

from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Embedding
from keras.layers import LSTM
from keras.datasets import imdb

# assume we have 4 classes
num_classes = 4  

x = [
    [[0],[1],[1],[0],[0],[0]],
    [[0],[0],[0],[2],[2],[0]],
    [[0],[0],[0],[0],[3],[3]],
    [[0],[2],[2],[0],[0],[0]],
    [[0],[0],[3],[3],[0],[0]],
    [[0],[0],[0],[0],[1],[1]]
]


# Tensorflow likes float32 and int32
x = np.array(x, dtype=np.float32)
y = np.array([1,2,3,2,3,1], dtype=np.int32)


# Convert y2to dummy variables (one-hot encoding for classification problem)

y_2 = keras.utils.to_categorical(y, num_classes)
y_2

array([[0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 1., 0., 0.]], dtype=float32)

### Like CNN,  input_shape is the shape of each sample

In [8]:
print('Build model...')
model = Sequential()

# each sequence has 6 members and each member is 1-dimentinal

#Like CNN,  input_shape is the shape of each sample

model.add(LSTM(128, activation='tanh', dropout=0.2, recurrent_dropout=0.2, input_shape=(6, 1)))
model.add(Dense(4, activation='sigmoid'))

# try using different optimizers and different optimizer configs
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])


print('Train...')
model.fit(x, y_2,verbose=2, epochs=100)
pred = model.predict(x)

predict_classes = np.argmax(pred,axis=1)
print("Predicted classes:",predict_classes)
print("Expected classes:",y)

Build model...
Train...
Epoch 1/100
 - 1s - loss: 1.3891 - acc: 0.0000e+00
Epoch 2/100
 - 0s - loss: 1.3786 - acc: 0.6667
Epoch 3/100
 - 0s - loss: 1.3834 - acc: 0.3333
Epoch 4/100
 - 0s - loss: 1.3718 - acc: 0.3333
Epoch 5/100
 - 0s - loss: 1.3583 - acc: 0.6667
Epoch 6/100
 - 0s - loss: 1.3567 - acc: 0.5000
Epoch 7/100
 - 0s - loss: 1.3495 - acc: 0.3333
Epoch 8/100
 - 0s - loss: 1.3566 - acc: 0.3333
Epoch 9/100
 - 0s - loss: 1.3670 - acc: 0.1667
Epoch 10/100
 - 0s - loss: 1.3438 - acc: 0.3333
Epoch 11/100
 - 0s - loss: 1.3195 - acc: 0.3333
Epoch 12/100
 - 0s - loss: 1.3348 - acc: 0.3333
Epoch 13/100
 - 0s - loss: 1.3151 - acc: 0.5000
Epoch 14/100
 - 0s - loss: 1.3169 - acc: 0.1667
Epoch 15/100
 - 0s - loss: 1.2788 - acc: 0.3333
Epoch 16/100
 - 0s - loss: 1.2660 - acc: 0.3333
Epoch 17/100
 - 0s - loss: 1.2616 - acc: 0.3333
Epoch 18/100
 - 0s - loss: 1.2516 - acc: 0.3333
Epoch 19/100
 - 0s - loss: 1.2761 - acc: 0.6667
Epoch 20/100
 - 0s - loss: 1.2521 - acc: 0.5000
Epoch 21/100
 - 0s - 

Let's predict any ad hoc sequence using trained model

For example  [[0],[0],[0],[0],[0],[1]]

In [9]:
x = np.array([[0],[0],[0],[0],[0],[1]])


In [10]:
x = np.array(x, dtype=np.float32)  

pred = model.predict(x)

print(x)

print("Prediction:", np.argmax(pred[0]))

ValueError: Error when checking input: expected lstm_1_input to have 3 dimensions, but got array with shape (6, 1)

### Why?  

### Remeber x must be a 3D array!!

In [11]:
x = np.array([[[0],[0],[0],[0],[0],[1]]])

x = np.array(x, dtype=np.float32)  

pred = model.predict(x)

print(x)

print("Prediction:", np.argmax(pred[0]))

[[[0.]
  [0.]
  [0.]
  [0.]
  [0.]
  [1.]]]
Prediction: 1


# LSTM Example for Regression

An example of RNN regression to predict sunspots.  The data files needed for this example can be found at the following location.

* [Sunspot Data Files](http://www.sidc.be/silso/datafiles#total)

http://www.sidc.be/silso/infosndtot

The following code is used to load the sunspot file:

In [12]:
import pandas as pd
import os

path = "./data/"
    
filename = os.path.join(path,"SN_d_tot_V2.0.csv")   
names = ['year', 'month', 'day', 'dec_year', 'sn_value' , 'sn_error', 'obs_num']
df = pd.read_csv(filename, sep=';', header=None, names=names, index_col=False)

# index_col=False forces pandas not to use the first column as the index

df[0:10]

# -1 means NA

Unnamed: 0,year,month,day,dec_year,sn_value,sn_error,obs_num
0,1818,1,1,1818.001,-1,-1.0,0
1,1818,1,2,1818.004,-1,-1.0,0
2,1818,1,3,1818.007,-1,-1.0,0
3,1818,1,4,1818.01,-1,-1.0,0
4,1818,1,5,1818.012,-1,-1.0,0
5,1818,1,6,1818.015,-1,-1.0,0
6,1818,1,7,1818.018,-1,-1.0,0
7,1818,1,8,1818.021,65,10.2,1
8,1818,1,9,1818.023,-1,-1.0,0
9,1818,1,10,1818.026,-1,-1.0,0


In [13]:
print("Ending file:")
df[-10:]

Ending file:


Unnamed: 0,year,month,day,dec_year,sn_value,sn_error,obs_num
73159,2018,4,21,2018.303,28,1.4,9
73160,2018,4,22,2018.305,22,1.9,30
73161,2018,4,23,2018.308,23,1.3,27
73162,2018,4,24,2018.311,22,1.1,16
73163,2018,4,25,2018.314,18,1.7,37
73164,2018,4,26,2018.316,14,1.2,38
73165,2018,4,27,2018.319,14,2.9,25
73166,2018,4,28,2018.322,0,0.0,32
73167,2018,4,29,2018.325,0,0.0,23
73168,2018,4,30,2018.327,0,0.0,31


In [14]:
# The missing values are marked by -1 

df = df[(df['sn_value'] != -1) & (df['obs_num'] != 0)]

In [15]:
df.shape

(69922, 7)

In [16]:
df[0:12]

Unnamed: 0,year,month,day,dec_year,sn_value,sn_error,obs_num
7,1818,1,8,1818.021,65,10.2,1
12,1818,1,13,1818.034,37,7.7,1
16,1818,1,17,1818.045,77,11.1,1
17,1818,1,18,1818.048,98,12.6,1
18,1818,1,19,1818.051,105,13.0,1
24,1818,1,25,1818.067,25,6.3,1
27,1818,1,28,1818.075,38,7.8,1
28,1818,1,29,1818.078,20,5.7,1
33,1818,2,3,1818.092,17,5.2,1
35,1818,2,5,1818.097,20,5.7,1


### Now, we want to predict a SN value based on the $N$ preceding values.  

In [17]:
df_train = df[df['year']<2000]
df_test = df[df['year']>=2000]

spots_train = df_train['sn_value'].tolist()
spots_test = df_test['sn_value'].tolist()

print("Training set has {} records.".format(len(spots_train)))
print("Test set has {} records.".format(len(spots_test)))

Training set has 63227 records.
Test set has 6695 records.


### Sequentialize to create x and y in the format RNN likes.

In [18]:
import numpy as np

def to_sequences(seq_size, data):
    x = []
    y = []

    for i in range(len(data)-SEQUENCE_SIZE-1):
        #print(i)
        window = data[i:(i+SEQUENCE_SIZE)]
        after_window = data[i+SEQUENCE_SIZE]
        window = [[x] for x in window]
        #print("{} - {}".format(window,after_window))
        x.append(window)
        y.append(after_window)
        
    return np.array(x),np.array(y)

In [19]:
SEQUENCE_SIZE = 10
x_train,y_train = to_sequences(SEQUENCE_SIZE,spots_train)
x_test,y_test = to_sequences(SEQUENCE_SIZE,spots_test)

print("Shape of x_train: {}".format(x_train.shape))
print("Shape of x_test: {}".format(x_test.shape))
print("Shape of y_train: {}".format(y_train.shape))
print("Shape of y_test: {}".format(y_test.shape))

Shape of x_train: (63216, 10, 1)
Shape of x_test: (6684, 10, 1)
Shape of y_train: (63216,)
Shape of y_test: (6684,)


In [20]:
x_train[0:5]

array([[[ 65],
        [ 37],
        [ 77],
        [ 98],
        [105],
        [ 25],
        [ 38],
        [ 20],
        [ 17],
        [ 20]],

       [[ 37],
        [ 77],
        [ 98],
        [105],
        [ 25],
        [ 38],
        [ 20],
        [ 17],
        [ 20],
        [ 25]],

       [[ 77],
        [ 98],
        [105],
        [ 25],
        [ 38],
        [ 20],
        [ 17],
        [ 20],
        [ 25],
        [ 87]],

       [[ 98],
        [105],
        [ 25],
        [ 38],
        [ 20],
        [ 17],
        [ 20],
        [ 25],
        [ 87],
        [192]],

       [[105],
        [ 25],
        [ 38],
        [ 20],
        [ 17],
        [ 20],
        [ 25],
        [ 87],
        [192],
        [ 73]]])

In [21]:
y_train[0:5]

array([ 25,  87, 192,  73,  82])

### Ready to train a RNN model 

In [22]:
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Embedding
from keras.layers import LSTM
from keras.datasets import imdb
from keras.callbacks import EarlyStopping
import numpy as np

print('Build model...')
model = Sequential()

model.add(LSTM(64, dropout=0.1, recurrent_dropout=0.1, input_shape=(SEQUENCE_SIZE, 1)))
model.add(Dense(32))
model.add(Dense(1))

model.compile(loss='mean_squared_error', optimizer='adam')

monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5, verbose=1, mode='auto')
print('Train...')

model.fit(x_train,y_train,validation_data=(x_test,y_test),callbacks=[monitor],verbose=2, epochs=10)  

Build model...
Train...
Train on 63216 samples, validate on 6684 samples
Epoch 1/10
 - 10s - loss: 1405.1794 - val_loss: 335.3967
Epoch 2/10
 - 9s - loss: 625.6281 - val_loss: 281.1883
Epoch 3/10
 - 9s - loss: 621.4315 - val_loss: 249.5758
Epoch 4/10
 - 9s - loss: 610.0931 - val_loss: 362.8855
Epoch 5/10
 - 9s - loss: 603.6360 - val_loss: 278.9313
Epoch 6/10
 - 9s - loss: 604.6408 - val_loss: 323.5103
Epoch 7/10
 - 9s - loss: 604.1804 - val_loss: 257.7712
Epoch 8/10
 - 9s - loss: 608.8451 - val_loss: 350.9319
Epoch 00008: early stopping


<keras.callbacks.History at 0x1f1e8d0b748>

In [23]:
from sklearn import metrics

pred = model.predict(x_test)
score = np.sqrt(metrics.mean_squared_error(pred,y_test))
print("Score (RMSE): {}".format(score))

Score (RMSE): 18.733176621473262


## Advanced topics when using LSTM

https://keras.io/layers/recurrent/


### Accessing the hidden state output   $\hat{y}$  for each time slice


It is possible to access the hidden state output $\hat{y}$ (the cell output) for each time slice, which can be useful when developing sophisticated recurrent neural network architectures, such as the encoder-decoder model. This can be done by setting the ***return_sequences parameter to True*** when defining the LSTM layer

***You must set return_sequences=True when stacking multiple LSTM layers.***



model = Sequential()

model.add(LSTM(..., return_sequences=True, input_shape=(...)))

model.add(LSTM(..., return_sequences=True))

model.add(LSTM(..., return_sequences=True))

model.add(LSTM(...))

model.add(Dense(...))

### Accessing the context values (internal state) $c$ for each time slice

***the return_state argument provides access to the context values (internal state) $c$ for each time slice***


For example, we can access both the sequence of hidden state output and the internal states at the same time.

This can be done as follows:

LSTM(..., return_sequences=True, return_state=True)

### References:

* [Google Colab](https://colab.research.google.com/) - Free web based platform that includes Python, Juypter Notebooks, and TensorFlow with free GPU support.  No setup needed.
* [IBM Cognitive Class Labs](https://www.datascientistworkbench.com) - Free web based platform that includes Python, Juypter Notebooks, and TensorFlow.  No setup needed.
* [Python Anaconda](https://www.continuum.io/downloads) - Python distribution that includes many data science packages, such as Numpy, Scipy, Scikit-Learn, Pandas, and much more.
* [TensorFlow](https://www.tensorflow.org/) - Google's mathematics package for deep learning.
* [Kaggle](https://www.kaggle.com/) - Competitive data science.  Good source of sample data.
* T81-558: Applications of Deep Neural Networks. Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/)