# Training Chatbot Model

**Design and build a simple chatbot using data from the Cornell Movie Dialogues corpus, using Keras**

#### Model structure
In short, the input sequence (the question asked to the chatbot) is passed into the encder LSTM, which outputs the final states of the encoder LSTM. These final states are passed into the decoder LSTM, along with the output sequence (the reply for the question, in the training data). The output of this decoder LSTM is the same as the actual reply, but shifted one time step to the left. That is, if the reply (aka, the input to the decoder lstm) is 'I am fine', the output for first time step with input 'I' will be 'am', the input for the second time step will be 'am', with output 'fine', and so on.

![Alt Text](https://cdn-images-1.medium.com/max/1600/1*Ismhi-muID5ooWf3ZIQFFg.png)

In [1]:
import pickle
import numpy as np
from keras.models import Model
from keras.layers.recurrent import LSTM
from keras.layers import Dense, Input, Embedding
from keras.preprocessing.sequence import pad_sequences
from keras.callbacks import ModelCheckpoint
from sklearn.model_selection import train_test_split

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


### Parameters

In [2]:
# batch size
BATCH_SIZE = 64

# Number of Epochs
NUM_EPOCHS = 25 #100

# GLOVE Embedding Size
GLOVE_EMBEDDING_SIZE = 100

# RNN Size
HIDDEN_UNITS = 256

# vocabulary size
MAX_VOCAB_SIZE = 14000 #1000

# Model file path
WEIGHT_FILE_PATH = 'models/keras-glove-weights.h5'

### Retrive Paramters

In [3]:
def load_preprocess():
    """
    Load the Preprocessed Training data and return them
    """
    with open('models/preprocess.p', mode='rb') as in_file:
        return pickle.load(in_file)

In [4]:
# call loading function
((context),
(input_texts_word2em),
(target_texts),
(word2em),
(target_word2idx, target_idx2word)) = load_preprocess()

In [5]:
num_decoder_tokens = context['num_decoder_tokens']
encoder_max_seq_length = context['encoder_max_seq_length']
decoder_max_seq_length = context['decoder_max_seq_length']

### Generate Input batches

In [6]:
def generate_batch(input_word2em_data, output_text_data):
    '''
    '''
    num_batches = len(input_word2em_data) // BATCH_SIZE
    while True:
        for batchIdx in range(0, num_batches):
            start = batchIdx * BATCH_SIZE
            end = (batchIdx + 1) * BATCH_SIZE # until end of next batch
            encoder_input_data_batch = pad_sequences(input_word2em_data[start:end], encoder_max_seq_length)
            decoder_target_data_batch = np.zeros(shape=(BATCH_SIZE, decoder_max_seq_length, num_decoder_tokens))
            decoder_input_data_batch = np.zeros(shape=(BATCH_SIZE, decoder_max_seq_length, GLOVE_EMBEDDING_SIZE))
            for lineIdx, target_words in enumerate(output_text_data[start:end]):
                for idx, w in enumerate(target_words):
                    w2idx = target_word2idx['unknown']  # default unknown
                    if w in target_word2idx:
                        w2idx = target_word2idx[w]
                    if w in word2em:
                        decoder_input_data_batch[lineIdx, idx, :] = word2em[w]
                    if idx > 0:
                        decoder_target_data_batch[lineIdx, idx - 1, w2idx] = 1
            yield [encoder_input_data_batch, decoder_input_data_batch], decoder_target_data_batch

### Build the Model

   - Neural Network Layers

In [None]:
# encoder
encoder_inputs = Input(shape=(None, GLOVE_EMBEDDING_SIZE), name='encoder_inputs')
encoder_lstm = LSTM(units=HIDDEN_UNITS, return_state=True, name='encoder_lstm')
encoder_outputs, encoder_state_h, encoder_state_c = encoder_lstm(encoder_inputs)
encoder_states = [encoder_state_h, encoder_state_c]

# decoder
decoder_inputs = Input(shape=(None, GLOVE_EMBEDDING_SIZE), name='decoder_inputs')
decoder_lstm = LSTM(units=HIDDEN_UNITS, return_state=True, return_sequences=True, name='decoder_lstm')
decoder_outputs, decoder_state_h, decoder_state_c = decoder_lstm(decoder_inputs,
                                                                 initial_state=encoder_states)
decoder_dense = Dense(units=num_decoder_tokens, activation='softmax', name='decoder_dense')
decoder_outputs = decoder_dense(decoder_outputs)

# pass inputs to model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

# compile
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

In [None]:
# save model
json = model.to_json()
pickle.dump((json), open('models/keras-glove-architecture.p', 'wb'))


   - Train-test split

In [None]:
Xtrain, Xtest, Ytrain, Ytest = train_test_split(input_texts_word2em, target_texts, test_size=0.2)

print(len(Xtrain))
print(len(Xtest))

   - Train model

In [7]:
# get train and test batches
train_gen = generate_batch(Xtrain, Ytrain)
test_gen = generate_batch(Xtest, Ytest)

# batches number 
train_num_batches = len(Xtrain) // BATCH_SIZE
test_num_batches = len(Xtest) // BATCH_SIZE

checkpoint = ModelCheckpoint(filepath=WEIGHT_FILE_PATH, save_best_only=True)
model.fit_generator(generator=train_gen, steps_per_epoch=train_num_batches,
                    epochs=NUM_EPOCHS,
                    verbose=1, validation_data=test_gen, validation_steps=test_num_batches, callbacks=[checkpoint])

model.save_weights(WEIGHT_FILE_PATH)


  '. They will not be included '


86508
21628
Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25

IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)



Epoch 9/25
Epoch 10/25

IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)



Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


There are a few challenges in using this model. The most disturbing one is that the model cannot handle variable length sequences.  The next one is the vocabulary size. The decoder has to run softmax over a large vocabulary for each word in the output. That is going to slow down the training process, even if your hardware is capable of handling it.