In [2]:
from keras import Sequential
from keras.layers import Dense, LSTM, Dropout
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint
import numpy as np

Using TensorFlow backend.


In [3]:
fileName = 'alice_adventures_book.txt'

In [4]:
with open(fileName) as fil:
    text = fil.read().lower()

In [5]:
chars = sorted(list(set(text)))

In [6]:
print('Total chars', len(text),  'Unique chars', len(chars))

Total chars 163817 Unique chars 61


In [7]:
chars_to_int = dict((c,i) for i,c in enumerate(chars))

In [8]:
chars_to_int

{'\n': 0,
 ' ': 1,
 '!': 2,
 '#': 3,
 '$': 4,
 '%': 5,
 '(': 6,
 ')': 7,
 '*': 8,
 ',': 9,
 '-': 10,
 '.': 11,
 '/': 12,
 '0': 13,
 '1': 14,
 '2': 15,
 '3': 16,
 '4': 17,
 '5': 18,
 '6': 19,
 '7': 20,
 '8': 21,
 '9': 22,
 ':': 23,
 ';': 24,
 '?': 25,
 '@': 26,
 '[': 27,
 ']': 28,
 '_': 29,
 'a': 30,
 'b': 31,
 'c': 32,
 'd': 33,
 'e': 34,
 'f': 35,
 'g': 36,
 'h': 37,
 'i': 38,
 'j': 39,
 'k': 40,
 'l': 41,
 'm': 42,
 'n': 43,
 'o': 44,
 'p': 45,
 'q': 46,
 'r': 47,
 's': 48,
 't': 49,
 'u': 50,
 'v': 51,
 'w': 52,
 'x': 53,
 'y': 54,
 'z': 55,
 '‘': 56,
 '’': 57,
 '“': 58,
 '”': 59,
 '\ufeff': 60}

In [9]:
text_to_int = [chars_to_int[c] for c in text]

In [10]:
text_to_int[:10]

[60, 45, 47, 44, 39, 34, 32, 49, 1, 36]

In [11]:
# we will consider previous 100 chars to predict next char
sequence_size = 100
x = []
y = []
for i in range(len(text_to_int)-sequence_size):
    x.append(text_to_int[i:i+sequence_size])
    y.append(text_to_int[i+sequence_size])

In [12]:
from keras.utils import np_utils

In [13]:
# for training we would like to have y as a one-hot vector
Y = np_utils.to_categorical(y)
X = np.reshape(x, (len(x), sequence_size, 1))

LSTM layer input_shape = (#samples, #timeSteps, #features)   



In Keras LSTM(n) means "create an LSTM layer consisting of LSTM units. The following picture demonstrates what layer and unit (or neuron) are, and the rightmost image shows the internal structure of a single LSTM unit.

![image1.png](./image1.png)

The following picture shows how the whole LSTM layer operates.



As we know an LSTM layer processes a sequence, i.e, 𝕩1,…,𝕩𝑁
. At each step 𝑡 the layer (each neuron) takes the input 𝕩𝕥, output from previous step 𝕙𝕥−𝟙, and bias 𝑏, and outputs a vector 𝕙𝕥. Coordinates of 𝕙𝕥 are outputs of the neurons/units, and hence the size of the vector 𝕙𝕥 is equal to the number of units/neurons. This process continues until 𝕩𝑁

![image2.png](./image2.png)

Now let's compute the number of parameters for LSTM(1) and LSTM(3) and compare it with what Keras shows when we call model.summary().

Let 𝑖𝑛𝑝
be the size of the vector 𝕩𝕥 and 𝑜𝑢𝑡 be the size of the vector 𝕙𝕥 (this is also the number of neurons/units). Each neuron/unit takes input vector, output from the previous step, and a bias which makes 𝑖𝑛𝑝+𝑜𝑢𝑡+1 parameters (weights). But we have 𝑜𝑢𝑡 number of neurons and so we have 𝑜𝑢𝑡×(𝑖𝑛𝑝+𝑜𝑢𝑡+1) parameters. Finally each unit has 4 weights (see the rightmost image, yellow boxes) and we have the following formula for the number of parameters:
4𝑜𝑢𝑡(𝑖𝑛𝑝+𝑜𝑢𝑡+1)

In [14]:
model = Sequential()
model.add(LSTM(350, input_shape=(X.shape[1], X.shape[2]), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(350))
model.add(Dense(Y.shape[1], activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy')

In [15]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 100, 350)          492800    
_________________________________________________________________
dropout_1 (Dropout)          (None, 100, 350)          0         
_________________________________________________________________
lstm_2 (LSTM)                (None, 350)               981400    
_________________________________________________________________
dense_1 (Dense)              (None, 60)                21060     
Total params: 1,495,260
Trainable params: 1,495,260
Non-trainable params: 0
_________________________________________________________________


In [19]:
checkpoint = ModelCheckpoint(filepath="weights-{epoch:02d}-{loss:.4f}.hdf5", monitor="loss", save_best_only=True, mode="min", verbose=1)

In [None]:
model.fit(X,Y, epochs=20, batch_size=128, callbacks=[checkpoint])

*Reference* : https://machinelearningmastery.com/text-generation-lstm-recurrent-neural-networks-python-keras/