# simple LSTM

so basic it runs on pumpkin spice

## generating data

### echo sequence prediction problem

our echo sequence prediction problem needs data: specifically vectors of random sequences. let's use integers, and define our problem space as integers between 0 and 99.

we'll use the ```randint()``` function from the python 3 ```random``` [module](https://docs.python.org/3/library/random.html "python 3 random module docs") to generate random integers within the range we specify (in this case, 0 to 99). 

we can use the ```randint()``` function within a function of our own to generate sequences of random integers--this will be the data for our problem.

In [36]:
# randint() is inside the python random module

import random

In [37]:
# use randint() to generate a random integer between 0 and 99

rand_int = random.randint(0, 99)

rand_int

45

we need a _lot_ more than one of these. which means it's time to build a function to automate this for us:

In [38]:
def make_seq(seq_length, n_features):
    
    '''
    generate sequences of a given length
    and given number of features
    '''
    return [random.randint(0, n_features - 1) for _ in range(seq_length)]

__demo:__ let's make a sequence with 10 values and 50 features

In [39]:
make_seq(10, 50)

[19, 25, 12, 40, 11, 36, 25, 42, 6, 25]

### one hot encoding

before we can train the model, we have to encode the data into a format that an LSTM can use. the way we encode data matters; choices made here can significantly affect model performance.

to frame this data properly, let's revisit the original problem:

we're trying to predict a number. a _specific_ number.

if we wanted to _approximate_ the number, we could frame this as a __regression__ problem, and train our model to output a close (but not exact) approximation of the number.

but because we want the _exact_ integer (and _not_ an approximation, which is what a regression model outputs) we need to frame this problem as a __classification__ model.

__classification__ means handling categorical data, which machines can do handily using __one hot encoding__.

### automatic vs manual one hot encoding

```scikit-learn``` has a super neat ```OneHotEncoder()``` [transformer](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html "sklearn OneHotEncoder doc") that can automate one hot encoding, but because it fits the data, it can only encode the values that it sees represented. 

we need all possible values--from 0 to 99--represented. but because we're generating our integer sequences pseudo-randomly using ```np.random.randint()```, we can't guarantee that all values will be represented.

it's possible to feed in the categories to ```OneHotEncoder()``` manually. here, however, we're going to simply make our own transformer.

we'll convert the results to a ```numpy``` ```array``` in order to make them easier to decode later.

### decoding

later on we'll need a way to interpret the model's results. to do so we'll need to decode the one hot scheme.

we can easily do this using the ```numpy``` ```argmax()``` function.

```numpy.argmax()``` returns the indices for the maximum values along a vector. because each vector in the binary one hot encoding will be a lot of zeroes with a single high value--a ```1```--we can easily use ```argmax()``` to grab the index of the ```1``` value and return it. that's our output.

In [40]:
from numpy import array
from numpy import argmax

# encoder function

def one_hot_encoder(seq, n_features):
    
    '''
    creates a vector of binary values for each
    possible feature in the dataset.
    '''
    
    encoding = list()
    
    for val in seq:
        
        vector = [0 for _ in range(n_features)]
        vector[val] = 1
        encoding.append(vector)
        
    return array(encoding)  

# decoder function

def one_hot_decoder(seq_encoded):
    '''
    decodes results by returning the index of
    the point in the vector with the largest value,
    i.e. 1 
    '''
    
    return [argmax(vector) for vector in seq_encoded]

In [41]:
seq = make_seq(50, 100)

seq_encoded = one_hot_encoder(seq, 100)

seq_decoded = one_hot_decoder(seq_encoded)

print(seq, '\n')
print(seq_encoded, '\n')
print(seq_decoded)

[37, 0, 73, 47, 37, 53, 96, 51, 80, 54, 97, 91, 11, 27, 22, 18, 96, 50, 12, 7, 10, 18, 58, 1, 88, 16, 0, 52, 32, 46, 90, 39, 59, 96, 13, 50, 30, 30, 84, 7, 49, 88, 46, 90, 5, 9, 47, 31, 93, 78] 

[[0 0 0 ..., 0 0 0]
 [1 0 0 ..., 0 0 0]
 [0 0 0 ..., 0 0 0]
 ..., 
 [0 0 0 ..., 0 0 0]
 [0 0 0 ..., 0 0 0]
 [0 0 0 ..., 0 0 0]] 

[37, 0, 73, 47, 37, 53, 96, 51, 80, 54, 97, 91, 11, 27, 22, 18, 96, 50, 12, 7, 10, 18, 58, 1, 88, 16, 0, 52, 32, 46, 90, 39, 59, 96, 13, 50, 30, 30, 84, 7, 49, 88, 46, 90, 5, 9, 47, 31, 93, 78]


### reshape to 3d matrix

LSTMs require input in the form of a 3d matrix.

the three dimensions LSTMs need, in order, are: __samples, time steps, & features__.

the sequence we generated above, ```seq```, is 

* one __sample__,
* fifty __time steps__, 
* one hundred __features__.

for ```seq```, the specific sequence we just generated, it's easy to set the shape to three dimensions using the ```reshape()``` function:

In [42]:
X = seq_encoded.reshape(1, 50, 100)

print(X)
print(X.shape)

[[[0 0 0 ..., 0 0 0]
  [1 0 0 ..., 0 0 0]
  [0 0 0 ..., 0 0 0]
  ..., 
  [0 0 0 ..., 0 0 0]
  [0 0 0 ..., 0 0 0]
  [0 0 0 ..., 0 0 0]]]
(1, 50, 100)


a more generalizable version might look like this:

```X = seq_encoded.reshape(n_samples, length, n_features)```

### generating samples

following each of the steps above, in order, will generate 1 sample for our LSTM model.

it makes sense to automate these tasks:

In [43]:
def make_sample(length, n_features, output_index):
    '''
    creates a single sample that is LSTM-ready.
    '''
    #create sequence of pseudo-random integers
    seq = make_seq(length, n_features)
    
    # one hot encoding
    seq_encoded = one_hot_encoder(seq, n_features)
    
    # reshape to 3d matrix suitable for LSTM
    X = seq_encoded.reshape(1, length, n_features)
    
    # get the output
    y = seq_encoded[output_index].reshape(1, n_features)
    
    return X, y
    

let's test ```make_sample``` to make sure it works:

In [44]:
X, y = make_sample(50, 100, 17)

print(X, '\n')
print(X.shape, '\n')
print(y, '\n')
print(y.shape)

[[[0 0 0 ..., 0 0 0]
  [0 0 0 ..., 0 0 0]
  [0 0 0 ..., 0 0 0]
  ..., 
  [0 0 0 ..., 0 0 0]
  [0 0 0 ..., 0 0 0]
  [0 0 0 ..., 0 0 0]]] 

(1, 50, 100) 

[[0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]] 

(1, 100)


## building the model

we'll build the model in 3 steps:

1) __define__ and __compile__ the model

2) __fit__ the model

3) __evaluate__ the model

define the model using the ```keras``` API:

In [45]:
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense

In [46]:
# set the feature number and length in advance for cleaner code

length = 10

n_features = 25

output_index = 7

### create & define the model

this simple model will consist of two layers: an __LSTM__ layer with 25 memory units, and a fully connected __dense__ layer with one neuron per feature.

In [47]:
# initialize a Sequential() model

lstm_model = Sequential()

# add LSTM layer
# 25 memory units

lstm_model.add(LSTM(25, input_shape=(length, n_features)))

# add dense layer
# fully connected = 1 neuron for each feature
# softmax for classification output

lstm_model.add(Dense(n_features, activation='softmax'))


### compile the model

this is where we set parameters like the model's _loss function_, _optimizer_, and the specific performance-related information we want the machine to output as it trains.

this example uses the [log loss function](http://wiki.fast.ai/index.php/Log_Loss "log loss function"), specified under the ```loss``` parameter as ```categorical_crossentropy```.

this model uses the __adam__ optimizer, and outputs accuracy measurements ```acc``` each epoch.

In [48]:
# compile the model

lstm_model.compile(loss='categorical_crossentropy', optimizer='adam', 
                   metrics=['acc'])

### fit the model

using the ```make_sample``` function we created above we could:

1) make a large number of samples

2) put them all together

3) feed them into the model

__however__, let's not do that here. the purpose of this notebook is to explore basic configuration of LSTMs using the ```keras``` api. 

for this model, we'll set one epoch to one sample, and clear the internal state in between epochs. this means our batch size = 1.

this means that the model will train and test on one sample at a time.

setting the ```verbose``` parameter equal to 2 will output training times, loss, and accuracy (either 1 or 0, for 0-100%) for each epoch. 

accuracy will be either 1 or 0 because we're training and testing on one sample at a time.

#### manually fit the model

we will train the model over ten thousand samples/epochs, with a batch size of 1 sample.

it's easy to create a loop that will run the model on a new sample each time, using the ```make_sample``` function from above:

In [49]:
# fit the model

for epoch in range(10000):
    
    X, y = make_sample(length, n_features, output_index)
    
    lstm_model.fit(X, y, epochs=1, verbose=0)

### evaluate the model

because this is such a simple model/example, we won't do any fancy model evaluation or tuning here. 

a fast & easy way to check our model's predictions on new data is to run ```predict()``` a number of times, and see what percentage are correct. 

#### the importance of evaluating on unseen data

since we are generating our own data, one batch at a time, the problem of testing on training data doesn't apply.

__however__, it's important to note that if we were using an intact dataset, we would *definitely* want to keep some of the data aside to test the model on after it's fit. 

testing the model on data that it's already seen is pointless--it doesn't give any insight into the model's ability to generalize to unseen data. 

for datasets that already exist (and aren't being randomly generated as we go), keeping out a portion to test on in the begining is crucial. it's that special, reserved testing data that we would use to test our model now.

because all this data is brand new, we aren't working with a __train test split__ this time.

#### build a function

as with anything i may want to do more than once, i'll go ahead and make a function.

to implement this we'll use a simple counter and some arithmetic:

In [50]:
def eval_model(model):
    '''
    evaluate model accuracy over
    100 independently generated samples
    '''
    
    for i in range(100):
        
        # counter
        correct_preds = 0
        
        # create sample
        X, y = make_sample(length, n_features, output_index)
        
        # generate predictions
        y_hat = model.predict(X)
        
        # evaluate results
        if one_hot_decoder(y_hat) == one_hot_decoder(y):
            
            # update counter
            correct_preds += 1
    
    # arithmetic to get percentage of correct predictions
    accuracy = correct_preds / 100 * 100.0
    
    return accuracy

let's test the function (and our model) out:

In [51]:
lstm_accuracy = eval_model(lstm_model)

print('LSTM model accuracy = %f' % lstm_accuracy)

LSTM model accuracy = 1.000000


### making predictions

our lstm model is ready to make predictions!

since all our data is generated new, this step (in this case) is almost identical to testing our model above. 

since this would be a more user-facing application, it might be nice to create a function that not only generates predictions on whatever new data it gets, but also outputs more verbose information to the user.

In [52]:
def get_predictions(model, X_, y_):
    
    y_hat = model.predict(X)
    
    # get original sequence
    seq = [one_hot_decoder(x) for x in X_]
    
    # correct
    correct = one_hot_decoder(y_)
    
    # model's prediction
    predicted = one_hot_decoder(y_hat)
    
    return seq, predicted, correct

#### get predictions on a fresh sample

time to test our model & helper functions we've created to see whether they can easily make useful predictions for a user:

In [53]:
X, y = make_sample(length, n_features, output_index)

get_predictions(lstm_model, X, y)

([[23, 20, 24, 20, 6, 20, 16, 4, 6, 12]], [4], [4])

it appears our ```lstm_model``` is working! using the ```get_predictions``` function we can run this more than once, and confirm that the model is working well:

In [54]:
# generate new data for each run
# this could optionally be part of the get_predictions() function

X, y = make_sample(length, n_features, output_index)

get_predictions(lstm_model, X, y)

([[8, 0, 19, 15, 19, 0, 22, 20, 17, 21]], [20], [20])

our model is performing optimally! we asked for the seventh number in a sequence, and running ```get_predictions``` repeatedly demonstrates our model is ready to go.

## more information

##### python 3 random module documentation:

https://docs.python.org/3/library/random.html

##### sklearn preprocessing documentation:

http://scikit-learn.org/stable/modules/classes.html#module-sklearn.preprocessing

##### a very cool blog post about how LSTMs work:

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

## thanks for reading!