# simple LSTM

so basic it runs on pumpkin spice

## echo sequence prediction problem

### generating data

our echo sequence prediction problem needs data: specifically vectors of random sequences. let's use integers, and define our problem space as integers between 0 and 99.

we'll use the ```randint()``` function from the python 3 ```random``` [module](https://docs.python.org/3/library/random.html "python 3 random module docs") to generate random integers within the range we specify (in this case, 0 to 99). 

we can use the ```randint()``` function within a function of our own to generate sequences of random integers--this will be the data for our problem.

In [7]:
# randint() is inside the python random module

import random

In [9]:
# use randint() to generate a random integer between 0 and 99

rand_int = random.randint(0, 99)

rand_int

60

we need a _lot_ more than one of these. which means it's time to build a function to automate this for us:

In [16]:
def make_seq(seq_length, n_features):
    return [random.randint(0, n_features - 1) for _ in range(seq_length)]

__demo:__ let's make a sequence with 10 values and 50 features

In [17]:
make_seq(10, 50)

[14, 37, 3, 25, 22, 14, 42, 24, 47, 5]

### one hot encoding

before we can train the model, we have to encode the data into a format that an LSTM can use. the way we encode data matters; choices made here can significantly affect model performance.

to frame this data properly, let's revisit the original problem:

we're trying to predict a number. a _specific_ number.

if we wanted to _approximate_ the number, we could frame this as a __regression__ problem, and train our model to output a close (but not exact) approximation of the number.

but because we want the _exact_ integer (and _not_ an approximation, which is what a regression model outputs) we need to frame this problem as a __classification__ model.

__classification__ means handling categorical data, which machines can do handily using __one hot encoding__.

### automatic vs manual one hot encoding

```scikit-learn``` has a super neat ```OneHotEncoder()``` [transformer](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html "sklearn OneHotEncoder doc") that can automate one hot encoding, but because it fits the data, it can only encode the values that it sees represented. 

we need all possible values--from 0 to 99--represented. it's possible to feed in the categories to ```OneHotEncoder()``` manually. but here we're going to simply make our own transformer.

## more information

##### python 3 random module documentation

https://docs.python.org/3/library/random.html