在之前的2个例子里，我们试了如下的2个配置，time_step均为1.
- (batch=25, time_step=1, feature=1)
- (batch=23, time_step=1, feature=3)

第一个例子里更像是1对1的模式匹配，而第二个例子则是3对1的模式匹配，完全依靠feature产生对应的模式。时序的特性没有反应出来。

注意：尽管X的维度定义batch=25/23, 但是在训练的时候，(fit)我们用的batch_size=1, Keras 在每次训练新的一组数据是是reset 了网络状态。包括本例中的fit, 其batch_size 也被定义为1。 而我们将在另一篇里说明batch_size >1 的时候改造第一个配置，使其也有时序的效果。

在这个例子里，我们将强化时序的效果
- (batch=23, time_step=3, feature=1)

从该例子中可以看到，只要是有规律的，feature=1 也能从时序中的得到正确的匹配

换而言之，我们先在（time_step=1, feature=1) 和 (time_step=1, feature=3)的配置上看到问题，然后用(time_step=3, feature=1) 来证明时序是LSTM的正确用法。

首先，准备好数据环境

In [1]:
# Naive LSTM to learn three-char time steps to one-char mapping
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.utils import np_utils
# fix random seed for reproducibility
numpy.random.seed(7)
# define the raw dataset
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
# create mapping of characters to integers (0-25) and the reverse
char_to_int = dict((c, i) for i, c in enumerate(alphabet))
int_to_char = dict((i, c) for i, c in enumerate(alphabet))
# prepare the dataset of input to output pairs encoded as integers
seq_length = 3
dataX = []
dataY = []
for i in range(0, len(alphabet) - seq_length, 1):
	seq_in = alphabet[i:i + seq_length]
	seq_out = alphabet[i + seq_length]
	dataX.append([char_to_int[char] for char in seq_in])
	dataY.append(char_to_int[seq_out])
	print(seq_in, '->', seq_out)

Using TensorFlow backend.


ABC -> D
BCD -> E
CDE -> F
DEF -> G
EFG -> H
FGH -> I
GHI -> J
HIJ -> K
IJK -> L
JKL -> M
KLM -> N
LMN -> O
MNO -> P
NOP -> Q
OPQ -> R
PQR -> S
QRS -> T
RST -> U
STU -> V
TUV -> W
UVW -> X
VWX -> Y
WXY -> Z


In [2]:
# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (len(dataX), seq_length, 1))
print(X.shape)

(23, 3, 1)


In [3]:
# normalize
X = X / float(len(alphabet))
# one hot encode the output variable
y = np_utils.to_categorical(dataY)
# create and fit the model
model = Sequential()
model.add(LSTM(32, input_shape=(X.shape[1], X.shape[2])))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=500, batch_size=1, verbose=2)

Epoch 1/500
0s - loss: 3.2701 - acc: 0.0000e+00
Epoch 2/500
0s - loss: 3.2547 - acc: 0.0000e+00
Epoch 3/500
0s - loss: 3.2465 - acc: 0.0000e+00
Epoch 4/500
0s - loss: 3.2393 - acc: 0.0435
Epoch 5/500
0s - loss: 3.2313 - acc: 0.0435
Epoch 6/500
0s - loss: 3.2230 - acc: 0.0435
Epoch 7/500
0s - loss: 3.2141 - acc: 0.0435
Epoch 8/500
0s - loss: 3.2051 - acc: 0.0435
Epoch 9/500
0s - loss: 3.1939 - acc: 0.0435
Epoch 10/500
0s - loss: 3.1821 - acc: 0.0435
Epoch 11/500
0s - loss: 3.1701 - acc: 0.0435
Epoch 12/500
0s - loss: 3.1553 - acc: 0.0435
Epoch 13/500
0s - loss: 3.1396 - acc: 0.0435
Epoch 14/500
0s - loss: 3.1237 - acc: 0.0435
Epoch 15/500
0s - loss: 3.1060 - acc: 0.0435
Epoch 16/500
0s - loss: 3.0883 - acc: 0.0000e+00
Epoch 17/500
0s - loss: 3.0721 - acc: 0.0000e+00
Epoch 18/500
0s - loss: 3.0582 - acc: 0.0435
Epoch 19/500
0s - loss: 3.0378 - acc: 0.0435
Epoch 20/500
0s - loss: 3.0207 - acc: 0.0870
Epoch 21/500
0s - loss: 3.0051 - acc: 0.0870
Epoch 22/500
0s - loss: 2.9827 - acc: 0.0870

<keras.callbacks.History at 0x7fd9e81bfa90>

In [4]:
# summarize performance of the model
scores = model.evaluate(X, y, verbose=0)
print("Model Accuracy: %.2f%%" % (scores[1]*100))
# demonstrate some model predictions
for pattern in dataX:
	x = numpy.reshape(pattern, (1, len(pattern), 1))
	x = x / float(len(alphabet))
	prediction = model.predict(x, verbose=0)
	index = numpy.argmax(prediction)
	result = int_to_char[index]
	seq_in = [int_to_char[value] for value in pattern]
	print(seq_in, "->", result)

Model Accuracy: 100.00%
['A', 'B', 'C'] -> D
['B', 'C', 'D'] -> E
['C', 'D', 'E'] -> F
['D', 'E', 'F'] -> G
['E', 'F', 'G'] -> H
['F', 'G', 'H'] -> I
['G', 'H', 'I'] -> J
['H', 'I', 'J'] -> K
['I', 'J', 'K'] -> L
['J', 'K', 'L'] -> M
['K', 'L', 'M'] -> N
['L', 'M', 'N'] -> O
['M', 'N', 'O'] -> P
['N', 'O', 'P'] -> Q
['O', 'P', 'Q'] -> R
['P', 'Q', 'R'] -> S
['Q', 'R', 'S'] -> T
['R', 'S', 'T'] -> U
['S', 'T', 'U'] -> V
['T', 'U', 'V'] -> W
['U', 'V', 'W'] -> X
['V', 'W', 'X'] -> Y
['W', 'X', 'Y'] -> Z


We can see that the model learns the problem perfectly as evidenced by the model evaluation and the example predictions.

But it has learned a simpler problem. Specifically, it has learned to predict the next letter from a sequence of three letters in the alphabet. It can be shown any random sequence of three letters from the alphabet and predict the next letter.

It can not actually enumerate the alphabet. I expect that a larger enough multilayer perception network might be able to learn the same mapping using the window method.

The LSTM networks are stateful. They should be able to learn the whole alphabet sequence, but by default the Keras implementation resets the network state after each training batch.