### Required modules

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow import keras as keras
import reader

The module reader is my user-defined module containing functions to build vocabulary, read files and converting words to id and vice versa.

### Reading data and building vocab

In [2]:
train=reader.read('ptb.train.txt','../Datasets ML/simple-examples/data')
valid=reader.read('ptb.valid.txt','../Datasets ML/simple-examples/data')
test=reader.read('ptb.test.txt','../Datasets ML/simple-examples/data')
vocab=reader.build_vocab('ptb.train.txt','../Datasets ML/simple-examples/data')
vocab_size=len(vocab)

The function getdata converts data files into training and validation set with provided timesteps. <br>
Shape of each data point is  (1,timesteps,1).

In [3]:
def getdata(data,time_steps):
    data=reader.word_to_id(data,vocab)
    size=len(data)
    epochs=(size-1)//time_steps
    x=np.zeros((epochs,time_steps),dtype=int)
    y=np.zeros((epochs,time_steps),dtype=int)
    for i in range(0,epochs):
        x[i,:]=data[i*time_steps:(i+1)*time_steps]
        y[i,:]=data[i*time_steps+1:(i+1)*time_steps+1]
        
    x=x.reshape((-1,time_steps,1))
    y=y.reshape((-1,time_steps,1))
    return x,y

In [4]:
x_data_trn,y_data_trn=getdata(train,30)
x_data_val,y_data_val=getdata(train,30)

### Building model

Now, we will build a stacked LSTM model using two LSTM cells and bind them using RNN layer in Keras. For better and stable learning we will use Batch Normalization layer. At last we will use Time Distributed layer to apply the activation function at each time step as our model will return sequence i.e. return output at each time step.

In [5]:
inp=keras.Input(shape=(30,))
embedding=keras.layers.Embedding(vocab_size,500,input_length=30)
x=embedding(inp)

l1=keras.layers.LSTMCell(500)
l2=keras.layers.LSTMCell(500)
rnn_stacked=keras.layers.StackedRNNCells([l1,l2])
rnn=keras.layers.RNN(rnn_stacked,return_sequences=True)
x=rnn(x)

norm=keras.layers.BatchNormalization()
x=norm(x)

dense=keras.layers.Dense(vocab_size)
td=keras.layers.TimeDistributed(dense)
x=td(x)

activation=keras.layers.Activation('softmax')
out=activation(x)

model=keras.Model(inp,out)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['categorical_accuracy'])
model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 30)]              0         
_________________________________________________________________
embedding (Embedding)        (None, 30, 500)           5000000   
_________________________________________________________________
rnn (RNN)                    (None, 30, 500)           4004000   
_________________________________________________________________
batch_normalization (BatchNo (None, 30, 500)           2000      
_________________________________________________________________
time_distributed (TimeDistri (None, 30, 10000)         5010000   
_________________________________________________________________
activation (Activation)      (None, 30, 10000)         0         
Total params: 14,016,000
Trainable params: 14,015,000
Non-trainable params: 1,000
_____________________________________________

Now, we are all set to train our LSTM model.

#### Training

For faster training process we will use GPU instead of CPU with batch size of 20. Language models are very slow in case of training. They would need atleast 40 epochs to train the model.

In [6]:
with tf.device('/device:GPU:0'):
    model.fit(x_data_trn, y_data_trn, batch_size=20, epochs=40, validation_data=(x_data_val,y_data_val))

### Saving and loading model

Instead of saving model as H5 we will save model weights and parameters separately as it would consume very less space.

> Saving model weights

In [7]:
model.save_weights('./weights.h5',save_format='h5')

In [8]:
string=model.to_json()
with open('./model_config.json','w') as file:
    file.write(string)
file.close()

> Loading

In [9]:
with open('./model_config.json','r') as file:
    config=file.read()
file.close()

In [10]:
loaded=keras.models.model_from_json(config)

In [11]:
loaded.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 30)]              0         
_________________________________________________________________
embedding (Embedding)        (None, 30, 500)           5000000   
_________________________________________________________________
rnn (RNN)                    (None, 30, 500)           4004000   
_________________________________________________________________
batch_normalization (BatchNo (None, 30, 500)           2000      
_________________________________________________________________
time_distributed (TimeDistri (None, 30, 10000)         5010000   
_________________________________________________________________
activation (Activation)      (None, 30, 10000)         0         
Total params: 14,016,000
Trainable params: 14,015,000
Non-trainable params: 1,000
_____________________________________________

In [12]:
loaded.load_weights('./weights.h5')

Voila!!!!

### Testing model

Now, we come to the testing part. How our model performs to a random input by the user.

In [13]:
import json
with open('./vocab.json','r') as file:
    vocab=file.read()
file.close()
vocab=json.loads(vocab)

context='Despite the fact that tea has been popular in the UK for hundreds of years, the question of when to add the milk \
        is one which still provokes many an argument'

Defining a predicting function to predict specific number of words given the context.

In [14]:
def predict(context,time_steps,num):
    words=context.replace('.',' <eos>').split()
    for i in range(0,num):
        inp=words[-time_steps:]
        inp=reader.word_to_id(inp,vocab)
        inp=np.reshape(inp,(1,time_steps,1))
        out=model.predict(inp)
        out=np.argmax(out,axis=2)[0]
        results=reader.id_to_word(out,vocab)
        words.append(results[-1])
    return words

Checking results

In [15]:
results=predict(context,30,4)
print(' '.join(results))

Despite the fact that tea has been popular in the UK for hundreds of years, the question of when to add the milk is one which still provokes many an argument against baker boys it
