### Understanding Stateful LSTM Recurrent Neural Networks in Python with Keras
[REFERENCE]
https://machinelearningmastery.com/understanding-stateful-lstm-recurrent-neural-networks-python-keras/

### GOALS
> + How to develop a naive LSTM network for a sequence prediction problem.
>+ How to carefully manage state through batches and features with an LSTM network.
>+ Hot to manually manage state in an LSTM network for stateful prediction.

### PROBLEM - LEARN the ALPHABET
> That is, given a letter of the alphabet, predict the next letter of the alphabet.

In [1]:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, LSTM
from keras.utils import np_utils

Using TensorFlow backend.


In [2]:
np.random.seed(127)
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

In [3]:
def data_preprocessing(data, timesteps, features):
    char_to_index = dict((c,i+1) for i, c in enumerate(data))
    index_to_char = dict((i+1,[c]) for i, c in enumerate(data))
    seq_length = np.max([timesteps, features])
    print(seq_length)
    dataX = []
    dataY = []
    for i in range(0, len(alphabet) - seq_length,1):
        seq_in = alphabet[i:i + seq_length]
        seq_out = alphabet[i + seq_length]
        dataX.append([char_to_index[char] for char in seq_in])
        dataY.append(char_to_index[seq_out])
        print(seq_in, '->', seq_out)

    ## 1. reshape to [samples, time steps, features]
    X = np.reshape(dataX, (len(dataX), timesteps,features))

    ## 2. normalize the input integers to the range 0~1
    X = X/float(len(alphabet))

    ## 3. think of this problem as a sequence classification task.
    y = np_utils.to_categorical(dataY)
    return X, y, dataX,index_to_char

### Naive LSTM for Learning one-char to one-char Mapping

In [4]:
def build_model(X,y):
    model = Sequential()
    model.add(LSTM(32, input_shape=(X.shape[1], X.shape[2])))
    model.add(Dense(y.shape[1], activation='softmax'))
    model.compile(loss='categorical_crossentropy',
                  optimizer='adam', metrics=['accuracy'])
    model.fit(X, y, epochs=500, batch_size=1, verbose=2)
    return model

In [5]:
## Evaluation

def evaluate(model,X, y, data_x,index_to_char, timesteps, features):
    # summarize performance of the model
    scores = model.evaluate(X, y, verbose=0)
    print("Model Accuracy: %.2f%%" % (scores[1]*100))

    # demonstrate some model predictions
    for pattern in data_x:
        x = np.reshape(pattern, newshape=(1, timesteps, features))
        
        x = x / float(len(alphabet))
        prediction = model.predict(x, verbose=0)
        index = np.argmax(prediction)
        result = index_to_char[index]
        seq_in = [index_to_char[value] for value in pattern]
        print(seq_in, "->", result)
    


In [6]:
n_timesteps = 1
n_features = 1
X, y, data_x,index_to_char = data_preprocessing(alphabet, n_timesteps, n_features)
model = build_model(X,y)


1
A -> B
B -> C
C -> D
D -> E
E -> F
F -> G
G -> H
H -> I
I -> J
J -> K
K -> L
L -> M
M -> N
N -> O
O -> P
P -> Q
Q -> R
R -> S
S -> T
T -> U
U -> V
V -> W
W -> X
X -> Y
Y -> Z
Epoch 1/500
 - 1s - loss: 3.3029 - acc: 0.0000e+00
Epoch 2/500
 - 0s - loss: 3.2944 - acc: 0.0000e+00
Epoch 3/500
 - 0s - loss: 3.2907 - acc: 0.0000e+00
Epoch 4/500
 - 0s - loss: 3.2868 - acc: 0.0000e+00
Epoch 5/500
 - 0s - loss: 3.2830 - acc: 0.0400
Epoch 6/500
 - 0s - loss: 3.2796 - acc: 0.0400
Epoch 7/500
 - 0s - loss: 3.2755 - acc: 0.0000e+00
Epoch 8/500
 - 0s - loss: 3.2720 - acc: 0.0400
Epoch 9/500
 - 0s - loss: 3.2682 - acc: 0.0000e+00
Epoch 10/500
 - 0s - loss: 3.2645 - acc: 0.0400
Epoch 11/500
 - 0s - loss: 3.2601 - acc: 0.0000e+00
Epoch 12/500
 - 0s - loss: 3.2559 - acc: 0.0400
Epoch 13/500
 - 0s - loss: 3.2513 - acc: 0.0400
Epoch 14/500
 - 0s - loss: 3.2471 - acc: 0.0400
Epoch 15/500
 - 0s - loss: 3.2429 - acc: 0.0400
Epoch 16/500
 - 0s - loss: 3.2375 - acc: 0.0400
Epoch 17/500
 - 0s - loss: 3.2326 - 

Epoch 166/500
 - 0s - loss: 2.2575 - acc: 0.2000
Epoch 167/500
 - 0s - loss: 2.2538 - acc: 0.2800
Epoch 168/500
 - 0s - loss: 2.2525 - acc: 0.2400
Epoch 169/500
 - 0s - loss: 2.2485 - acc: 0.1600
Epoch 170/500
 - 0s - loss: 2.2449 - acc: 0.2000
Epoch 171/500
 - 0s - loss: 2.2426 - acc: 0.2000
Epoch 172/500
 - 0s - loss: 2.2410 - acc: 0.1600
Epoch 173/500
 - 0s - loss: 2.2381 - acc: 0.2000
Epoch 174/500
 - 0s - loss: 2.2344 - acc: 0.2400
Epoch 175/500
 - 0s - loss: 2.2328 - acc: 0.1200
Epoch 176/500
 - 0s - loss: 2.2283 - acc: 0.2000
Epoch 177/500
 - 0s - loss: 2.2249 - acc: 0.2400
Epoch 178/500
 - 0s - loss: 2.2244 - acc: 0.2000
Epoch 179/500
 - 0s - loss: 2.2229 - acc: 0.2400
Epoch 180/500
 - 0s - loss: 2.2175 - acc: 0.2000
Epoch 181/500
 - 0s - loss: 2.2143 - acc: 0.2000
Epoch 182/500
 - 0s - loss: 2.2112 - acc: 0.2400
Epoch 183/500
 - 0s - loss: 2.2097 - acc: 0.2400
Epoch 184/500
 - 0s - loss: 2.2069 - acc: 0.2400
Epoch 185/500
 - 0s - loss: 2.2039 - acc: 0.2400
Epoch 186/500
 - 0s 

 - 0s - loss: 1.9250 - acc: 0.5200
Epoch 334/500
 - 0s - loss: 1.9237 - acc: 0.6000
Epoch 335/500
 - 0s - loss: 1.9216 - acc: 0.5200
Epoch 336/500
 - 0s - loss: 1.9218 - acc: 0.5200
Epoch 337/500
 - 0s - loss: 1.9196 - acc: 0.4800
Epoch 338/500
 - 0s - loss: 1.9199 - acc: 0.5600
Epoch 339/500
 - 0s - loss: 1.9174 - acc: 0.5200
Epoch 340/500
 - 0s - loss: 1.9175 - acc: 0.5200
Epoch 341/500
 - 0s - loss: 1.9147 - acc: 0.6000
Epoch 342/500
 - 0s - loss: 1.9116 - acc: 0.6000
Epoch 343/500
 - 0s - loss: 1.9120 - acc: 0.6000
Epoch 344/500
 - 0s - loss: 1.9134 - acc: 0.5200
Epoch 345/500
 - 0s - loss: 1.9121 - acc: 0.5200
Epoch 346/500
 - 0s - loss: 1.9082 - acc: 0.5200
Epoch 347/500
 - 0s - loss: 1.9075 - acc: 0.5200
Epoch 348/500
 - 0s - loss: 1.9072 - acc: 0.5600
Epoch 349/500
 - 0s - loss: 1.9061 - acc: 0.6800
Epoch 350/500
 - 0s - loss: 1.9036 - acc: 0.6000
Epoch 351/500
 - 0s - loss: 1.9017 - acc: 0.5200
Epoch 352/500
 - 0s - loss: 1.9025 - acc: 0.5200
Epoch 353/500
 - 0s - loss: 1.9010

In [7]:
evaluate(model,X, y, data_x,index_to_char, n_timesteps, n_features)

Model Accuracy: 80.00%
[['A']] -> ['B']
[['B']] -> ['B']
[['C']] -> ['D']
[['D']] -> ['E']
[['E']] -> ['F']
[['F']] -> ['G']
[['G']] -> ['H']
[['H']] -> ['I']
[['I']] -> ['J']
[['J']] -> ['K']
[['K']] -> ['L']
[['L']] -> ['M']
[['M']] -> ['N']
[['N']] -> ['O']
[['O']] -> ['P']
[['P']] -> ['Q']
[['Q']] -> ['R']
[['R']] -> ['S']
[['S']] -> ['T']
[['T']] -> ['V']
[['U']] -> ['V']
[['V']] -> ['Y']
[['W']] -> ['Z']
[['X']] -> ['Z']
[['Y']] -> ['Z']


### Naive LSTM for a Three-Char Feature Window to One-Char Mapping
> Adding more context to data
> + That is, more features.

In [8]:
n_timesteps = 1
n_features = 3
X, y, data_x,index_to_char = data_preprocessing(alphabet, n_timesteps, n_features)
model = build_model(X,y)

3
ABC -> D
BCD -> E
CDE -> F
DEF -> G
EFG -> H
FGH -> I
GHI -> J
HIJ -> K
IJK -> L
JKL -> M
KLM -> N
LMN -> O
MNO -> P
NOP -> Q
OPQ -> R
PQR -> S
QRS -> T
RST -> U
STU -> V
TUV -> W
UVW -> X
VWX -> Y
WXY -> Z
Epoch 1/500
 - 1s - loss: 3.3076 - acc: 0.0435
Epoch 2/500
 - 0s - loss: 3.2943 - acc: 0.0435
Epoch 3/500
 - 0s - loss: 3.2870 - acc: 0.0435
Epoch 4/500
 - 0s - loss: 3.2793 - acc: 0.0870
Epoch 5/500
 - 0s - loss: 3.2719 - acc: 0.0870
Epoch 6/500
 - 0s - loss: 3.2643 - acc: 0.0435
Epoch 7/500
 - 0s - loss: 3.2563 - acc: 0.0435
Epoch 8/500
 - 0s - loss: 3.2488 - acc: 0.0435
Epoch 9/500
 - 0s - loss: 3.2399 - acc: 0.0435
Epoch 10/500
 - 0s - loss: 3.2314 - acc: 0.0435
Epoch 11/500
 - 0s - loss: 3.2219 - acc: 0.0435
Epoch 12/500
 - 0s - loss: 3.2121 - acc: 0.0435
Epoch 13/500
 - 0s - loss: 3.2027 - acc: 0.0435
Epoch 14/500
 - 0s - loss: 3.1931 - acc: 0.0435
Epoch 15/500
 - 0s - loss: 3.1828 - acc: 0.0435
Epoch 16/500
 - 0s - loss: 3.1734 - acc: 0.0435
Epoch 17/500
 - 0s - loss: 3.163

Epoch 166/500
 - 0s - loss: 2.2679 - acc: 0.2174
Epoch 167/500
 - 0s - loss: 2.2644 - acc: 0.1739
Epoch 168/500
 - 0s - loss: 2.2613 - acc: 0.1739
Epoch 169/500
 - 0s - loss: 2.2594 - acc: 0.1739
Epoch 170/500
 - 0s - loss: 2.2540 - acc: 0.1739
Epoch 171/500
 - 0s - loss: 2.2502 - acc: 0.1739
Epoch 172/500
 - 0s - loss: 2.2481 - acc: 0.1739
Epoch 173/500
 - 0s - loss: 2.2431 - acc: 0.2174
Epoch 174/500
 - 0s - loss: 2.2388 - acc: 0.2174
Epoch 175/500
 - 0s - loss: 2.2389 - acc: 0.1739
Epoch 176/500
 - 0s - loss: 2.2336 - acc: 0.2174
Epoch 177/500
 - 0s - loss: 2.2286 - acc: 0.2174
Epoch 178/500
 - 0s - loss: 2.2279 - acc: 0.2174
Epoch 179/500
 - 0s - loss: 2.2233 - acc: 0.2174
Epoch 180/500
 - 0s - loss: 2.2208 - acc: 0.2174
Epoch 181/500
 - 0s - loss: 2.2169 - acc: 0.2174
Epoch 182/500
 - 0s - loss: 2.2122 - acc: 0.2174
Epoch 183/500
 - 0s - loss: 2.2123 - acc: 0.1739
Epoch 184/500
 - 0s - loss: 2.2078 - acc: 0.1739
Epoch 185/500
 - 0s - loss: 2.2050 - acc: 0.2174
Epoch 186/500
 - 0s 

 - 0s - loss: 1.8812 - acc: 0.6957
Epoch 334/500
 - 0s - loss: 1.8815 - acc: 0.6957
Epoch 335/500
 - 0s - loss: 1.8816 - acc: 0.5217
Epoch 336/500
 - 0s - loss: 1.8764 - acc: 0.6087
Epoch 337/500
 - 0s - loss: 1.8780 - acc: 0.5217
Epoch 338/500
 - 0s - loss: 1.8755 - acc: 0.5217
Epoch 339/500
 - 0s - loss: 1.8739 - acc: 0.6087
Epoch 340/500
 - 0s - loss: 1.8717 - acc: 0.6087
Epoch 341/500
 - 0s - loss: 1.8697 - acc: 0.6087
Epoch 342/500
 - 0s - loss: 1.8704 - acc: 0.6522
Epoch 343/500
 - 0s - loss: 1.8672 - acc: 0.5217
Epoch 344/500
 - 0s - loss: 1.8643 - acc: 0.6087
Epoch 345/500
 - 0s - loss: 1.8643 - acc: 0.5217
Epoch 346/500
 - 0s - loss: 1.8594 - acc: 0.6957
Epoch 347/500
 - 0s - loss: 1.8607 - acc: 0.6522
Epoch 348/500
 - 0s - loss: 1.8583 - acc: 0.6957
Epoch 349/500
 - 0s - loss: 1.8575 - acc: 0.6522
Epoch 350/500
 - 0s - loss: 1.8525 - acc: 0.6957
Epoch 351/500
 - 0s - loss: 1.8549 - acc: 0.5652
Epoch 352/500
 - 0s - loss: 1.8528 - acc: 0.6087
Epoch 353/500
 - 0s - loss: 1.8504

In [9]:
evaluate(model,X, y, data_x,index_to_char, n_timesteps, n_features)

Model Accuracy: 78.26%
[['A'], ['B'], ['C']] -> ['D']
[['B'], ['C'], ['D']] -> ['E']
[['C'], ['D'], ['E']] -> ['F']
[['D'], ['E'], ['F']] -> ['G']
[['E'], ['F'], ['G']] -> ['H']
[['F'], ['G'], ['H']] -> ['I']
[['G'], ['H'], ['I']] -> ['J']
[['H'], ['I'], ['J']] -> ['K']
[['I'], ['J'], ['K']] -> ['L']
[['J'], ['K'], ['L']] -> ['M']
[['K'], ['L'], ['M']] -> ['N']
[['L'], ['M'], ['N']] -> ['O']
[['M'], ['N'], ['O']] -> ['P']
[['N'], ['O'], ['P']] -> ['Q']
[['O'], ['P'], ['Q']] -> ['Q']
[['P'], ['Q'], ['R']] -> ['S']
[['Q'], ['R'], ['S']] -> ['T']
[['R'], ['S'], ['T']] -> ['U']
[['S'], ['T'], ['U']] -> ['W']
[['T'], ['U'], ['V']] -> ['Y']
[['U'], ['V'], ['W']] -> ['Z']
[['V'], ['W'], ['X']] -> ['Z']
[['W'], ['X'], ['Y']] -> ['Z']


### Naive LSTM for Three-Char Time Step Window to One-Char Mapping
> + use multiple timesteps rather that multiple features within one timestemp.

In [10]:
n_timesteps = 3
n_features = 1
X, y, data_x,index_to_char = data_preprocessing(alphabet, n_timesteps, n_features)
model = build_model(X,y)

3
ABC -> D
BCD -> E
CDE -> F
DEF -> G
EFG -> H
FGH -> I
GHI -> J
HIJ -> K
IJK -> L
JKL -> M
KLM -> N
LMN -> O
MNO -> P
NOP -> Q
OPQ -> R
PQR -> S
QRS -> T
RST -> U
STU -> V
TUV -> W
UVW -> X
VWX -> Y
WXY -> Z
Epoch 1/500
 - 1s - loss: 3.3057 - acc: 0.0435
Epoch 2/500
 - 0s - loss: 3.2907 - acc: 0.0435
Epoch 3/500
 - 0s - loss: 3.2806 - acc: 0.0435
Epoch 4/500
 - 0s - loss: 3.2723 - acc: 0.0435
Epoch 5/500
 - 0s - loss: 3.2627 - acc: 0.0435
Epoch 6/500
 - 0s - loss: 3.2541 - acc: 0.0435
Epoch 7/500
 - 0s - loss: 3.2429 - acc: 0.0435
Epoch 8/500
 - 0s - loss: 3.2317 - acc: 0.0435
Epoch 9/500
 - 0s - loss: 3.2196 - acc: 0.0435
Epoch 10/500
 - 0s - loss: 3.2070 - acc: 0.0435
Epoch 11/500
 - 0s - loss: 3.1924 - acc: 0.0435
Epoch 12/500
 - 0s - loss: 3.1773 - acc: 0.0435
Epoch 13/500
 - 0s - loss: 3.1614 - acc: 0.0435
Epoch 14/500
 - 0s - loss: 3.1473 - acc: 0.0435
Epoch 15/500
 - 0s - loss: 3.1301 - acc: 0.0435
Epoch 16/500
 - 0s - loss: 3.1144 - acc: 0.0435
Epoch 17/500
 - 0s - loss: 3.102

Epoch 166/500
 - 0s - loss: 1.1756 - acc: 0.9130
Epoch 167/500
 - 0s - loss: 1.1758 - acc: 0.8696
Epoch 168/500
 - 0s - loss: 1.1789 - acc: 0.9130
Epoch 169/500
 - 0s - loss: 1.1852 - acc: 0.8261
Epoch 170/500
 - 0s - loss: 1.1759 - acc: 0.7391
Epoch 171/500
 - 0s - loss: 1.1583 - acc: 0.8696
Epoch 172/500
 - 0s - loss: 1.1526 - acc: 0.8696
Epoch 173/500
 - 0s - loss: 1.1409 - acc: 0.9130
Epoch 174/500
 - 0s - loss: 1.1447 - acc: 0.9130
Epoch 175/500
 - 0s - loss: 1.1399 - acc: 0.8261
Epoch 176/500
 - 0s - loss: 1.1385 - acc: 0.9130
Epoch 177/500
 - 0s - loss: 1.1255 - acc: 0.8261
Epoch 178/500
 - 0s - loss: 1.1226 - acc: 0.7826
Epoch 179/500
 - 0s - loss: 1.1169 - acc: 0.8696
Epoch 180/500
 - 0s - loss: 1.1078 - acc: 0.9130
Epoch 181/500
 - 0s - loss: 1.1025 - acc: 0.9130
Epoch 182/500
 - 0s - loss: 1.1038 - acc: 0.8696
Epoch 183/500
 - 0s - loss: 1.0977 - acc: 0.8696
Epoch 184/500
 - 0s - loss: 1.1028 - acc: 0.8696
Epoch 185/500
 - 0s - loss: 1.0900 - acc: 0.8696
Epoch 186/500
 - 0s 

Epoch 334/500
 - 0s - loss: 0.5516 - acc: 0.9565
Epoch 335/500
 - 0s - loss: 0.5548 - acc: 0.9565
Epoch 336/500
 - 0s - loss: 0.5453 - acc: 0.9565
Epoch 337/500
 - 0s - loss: 0.5455 - acc: 0.9130
Epoch 338/500
 - 0s - loss: 0.5456 - acc: 0.9565
Epoch 339/500
 - 0s - loss: 0.5386 - acc: 0.9130
Epoch 340/500
 - 0s - loss: 0.5353 - acc: 0.9130
Epoch 341/500
 - 0s - loss: 0.5373 - acc: 0.9130
Epoch 342/500
 - 0s - loss: 0.5321 - acc: 0.9130
Epoch 343/500
 - 0s - loss: 0.5267 - acc: 0.9130
Epoch 344/500
 - 0s - loss: 0.5307 - acc: 0.9565
Epoch 345/500
 - 0s - loss: 0.5306 - acc: 0.9565
Epoch 346/500
 - 0s - loss: 0.5326 - acc: 0.9565
Epoch 347/500
 - 0s - loss: 0.5264 - acc: 0.9565
Epoch 348/500
 - 0s - loss: 0.5177 - acc: 1.0000
Epoch 349/500
 - 0s - loss: 0.5123 - acc: 0.9565
Epoch 350/500
 - 0s - loss: 0.5186 - acc: 0.8696
Epoch 351/500
 - 0s - loss: 0.5148 - acc: 0.9130
Epoch 352/500
 - 0s - loss: 0.5118 - acc: 0.9565
Epoch 353/500
 - 0s - loss: 0.5084 - acc: 0.9565
Epoch 354/500
 - 0s 

In [11]:
evaluate(model,X, y, data_x,index_to_char, n_timesteps, n_features)

Model Accuracy: 95.65%
[['A'], ['B'], ['C']] -> ['D']
[['B'], ['C'], ['D']] -> ['E']
[['C'], ['D'], ['E']] -> ['F']
[['D'], ['E'], ['F']] -> ['G']
[['E'], ['F'], ['G']] -> ['H']
[['F'], ['G'], ['H']] -> ['I']
[['G'], ['H'], ['I']] -> ['J']
[['H'], ['I'], ['J']] -> ['K']
[['I'], ['J'], ['K']] -> ['L']
[['J'], ['K'], ['L']] -> ['M']
[['K'], ['L'], ['M']] -> ['N']
[['L'], ['M'], ['N']] -> ['O']
[['M'], ['N'], ['O']] -> ['P']
[['N'], ['O'], ['P']] -> ['Q']
[['O'], ['P'], ['Q']] -> ['R']
[['P'], ['Q'], ['R']] -> ['S']
[['Q'], ['R'], ['S']] -> ['T']
[['R'], ['S'], ['T']] -> ['U']
[['S'], ['T'], ['U']] -> ['V']
[['T'], ['U'], ['V']] -> ['W']
[['U'], ['V'], ['W']] -> ['X']
[['V'], ['W'], ['X']] -> ['Z']
[['W'], ['X'], ['Y']] -> ['Z']


In [12]:
import pandas as pd
X = pd.DataFrame.from_dict(index_to_char).T
Y =X.shift(-1)
data = pd.concat([X,Y],axis=1).dropna()
data.columns = ['X','Y']

np.reshape([1,2,3,4],newshape=(2,2))

array([[1, 2],
       [3, 4]])