<div class="alert alert-block alert-info">
<b>Importing Necessary Libraries:</b> We will import necessary libraries for analysis. One of them being Skopt which will be used for parameter tuning of the Seq2Seq model.
</div>

In [20]:
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd
from random import shuffle
import os
import re
from keras.models import Sequential,load_model
from sklearn.model_selection import train_test_split
from keras.layers import Bidirectional,Dense, CuDNNGRU, CuDNNLSTM, RepeatVector, TimeDistributed, BatchNormalization, Embedding
from keras.optimizers import *
from keras.callbacks import *
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.utils import to_categorical

import skopt
from skopt import gp_minimize, forest_minimize
from skopt.space import Real, Categorical, Integer
from skopt.plots import plot_convergence
from skopt.plots import plot_objective, plot_evaluations
from skopt.plots import plot_objective
from skopt.utils import use_named_args

<div class="alert alert-block alert-info">
<b>Loading the data for analysis :</b> We will load the data for analysis. We will extract the Question-Answer pair from the data. 
</div>

In [2]:
df_professional = pd.read_csv('qna_chitchat_professional.tsv',sep='\t')

In [3]:
df_professional.head(4)

Unnamed: 0,Question,Answer,Source,Metadata
0,Are we the same age?,Age doesn't really apply to me.,qna_chitchat_professional,editorial:chitchat
1,Are you a baby?,Age doesn't really apply to me.,qna_chitchat_professional,editorial:chitchat
2,Are you a grown up?,Age doesn't really apply to me.,qna_chitchat_professional,editorial:chitchat
3,Are you a grownup?,Age doesn't really apply to me.,qna_chitchat_professional,editorial:chitchat


<div class="alert alert-block alert-info">
<b>Data Cleaning(Text Cleaning):</b> We will perform text preprocessing on the data loaded. This will involve removal of punctuations, numerics and converting all the words to lowercase.
</div>

In [4]:
clean_text                  = []
for i in range(len(df_professional)):
    ques_text               = df_professional.loc[i]['Question']
    ans_text                = df_professional.loc[i]['Answer']
    
    ques_text_split         = ques_text.split()
    ans_text_split          = ans_text.split()
    
    ques_text__punc_removed = ' '.join([re.sub('[^A-Za-z]+', '', ques_text_split[i]) for i in range(len(ques_text_split))])
    ans_text_punc_removed   = ' '.join([re.sub('[^A-Za-z]+', '', ans_text_split[i]) for i in range(len(ans_text_split))])
    
    
    ques_text_nums_removed  = ' '.join([word.lower() for word in ques_text__punc_removed.split() if word.isalpha()])
    ans_text_nums_removed   = ' '.join([word.lower() for word in ans_text_punc_removed.split() if word.isalpha()])
    
    clean_text.append([ques_text_nums_removed,ans_text_nums_removed])
clean_text                  = np.array(clean_text)
print('Punctuations removed and all the sentences converted to smaller case.')

Punctuations removed and all the sentences converted to smaller case.


<div class="alert alert-block alert-info">
<b>Tokenization and Vocabulary creation:</b> We will tokenize the text and also create separate vocabularies for Questions and Answers to incorporate Seq2seq architecture. We will then find out T_x (maximum size of Question or Input statement) to the Encoder model and T_y (maximum size of Answer or Output statement) from the Decoder Model
</div>

In [5]:
ques_tokenizer       = Tokenizer()
ques_tokenizer.fit_on_texts(clean_text[:,0])
question_vocab       = len(ques_tokenizer.word_index) + 1
question_length      = max([len(clean_text[:,0][i].split()) for i in range(len(clean_text[:,0]))])
print('The number of distinct words in the Questions is ',question_vocab)
print('The maximum length of sentence in Question is    ',question_length)

The number of distinct words in the Questions is  2028
The maximum length of sentence in Question is     15


In [6]:
ans_tokenizer        = Tokenizer()
ans_tokenizer.fit_on_texts(clean_text[:,1])
ans_vocab            = len(ans_tokenizer.word_index) + 1
ans_length           = max([len(clean_text[:,1][i].split()) for i in range(len(clean_text[:,1]))])
print('The number of distinct words in the Answers is ',ans_vocab)
print('The maximum length of sentence in Answers is   ',ans_length)

The number of distinct words in the Answers is  196
The maximum length of sentence in Answers is    12


<div class="alert alert-block alert-info">
<b>Encoding sequences:</b> We will now encode the sequences of questions and answers and split them to train and test data
</div>

In [7]:
def encode_sequences(tokenizer, length, lines):
    X = tokenizer.texts_to_sequences(lines)
    X = pad_sequences(X, maxlen=length, padding='post')
    return X
def encode_output(sequences, vocab_size):
    y_list = []
    for sequence in sequences:
        encoded = to_categorical(sequence, num_classes=vocab_size)
        y_list.append(encoded)
    y = np.array(y_list)
    y = y.reshape(sequences.shape[0], sequences.shape[1], vocab_size)
    return y

In [8]:
X = encode_sequences(ques_tokenizer, question_length, clean_text[:, 0])
Y = encode_sequences(ans_tokenizer, ans_length, clean_text[:, 1])
Y = encode_output(Y, ans_vocab)

In [9]:
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=42)

<div class="alert alert-block alert-info">
<b>Model Creation:</b> We write a method to create a model with some parameters to be input while we do Hyper parameter tuning
</div>

In [10]:
def create_model(learning_rate,num_rnn_nodes,emb_size):
    model  = Sequential()
    model.add(Embedding(question_vocab, emb_size, input_length=question_length))
    model.add(BatchNormalization())
    model.add(Bidirectional(CuDNNLSTM(num_rnn_nodes)))
    model.add(BatchNormalization())
    model.add(RepeatVector(ans_length))
    model.add(Bidirectional(CuDNNLSTM(num_rnn_nodes, return_sequences=True)))
    model.add(BatchNormalization())
    model.add(TimeDistributed(Dense(ans_vocab, activation='softmax')))
    model.compile(optimizer=adam(lr=learning_rate), loss='categorical_crossentropy', metrics = ['accuracy'])
    return model


<div class="alert alert-block alert-info">
<b>Hyper parameter margin and initialization:</b> We set up initial values of the Hyper parameters to start with and also specify the range within which we want to find the values of the hyper parameters
</div>

In [11]:
learning_rate      = Real(low=1e-6, high=1e-2, prior='log-uniform',name='learning_rate')
num_rnn_nodes      = Integer(low=20, high=800, name='num_rnn_nodes')
emb_size           = Integer(low=10,high=100,name='emb_size')
#batch_size         = Integer(low=4,high=128,name='batch_size')
dimensions         = [learning_rate,num_rnn_nodes,emb_size]
default_parameters = [0.005, 600, 20]
best_accuracy      = 0.0
path_best_model    = 'chatbot_best_model.keras'

<div class="alert alert-block alert-info">
<b>Fitting and Hyper parameter tuning:</b> The below function performs the hyper parameter tuning and by calling the function to create the model prepared above, setting callbacks and then finally updating the accuracy based on current and previous results for accuracy 
</div>

In [12]:
@use_named_args(dimensions=dimensions)
def fitness(learning_rate, num_rnn_nodes,emb_size):
    print('learning rate: {0:.1e}'.format(learning_rate))
    print('RNN Nodes:', num_rnn_nodes)
    print('Embedding size:', emb_size)
    
    model = create_model(learning_rate=learning_rate,
                         num_rnn_nodes=num_rnn_nodes,
                         emb_size=emb_size)

    callback_log = TensorBoard(
        histogram_freq=0,
        batch_size=32,
        write_graph=True,
        write_grads=False,
        write_images=False)
    es         = EarlyStopping(monitor='val_acc', patience=15, verbose=1, mode='auto', baseline=None, 
                          restore_best_weights=True)   
    history = model.fit(X_train,y_train,
                        epochs=30,
                        batch_size=32,
                        validation_data=(X_test,y_test),
                        callbacks=[callback_log]+[es])
    accuracy = max(history.history['val_acc'])
    print("Accuracy: ",accuracy)
    global best_accuracy
    if accuracy > best_accuracy:
        model.save(path_best_model)
        best_accuracy = accuracy
    del model
    K.clear_session()
    return -accuracy

In [13]:
search_result = gp_minimize(func=fitness,
                            dimensions=dimensions,
                            acq_func='EI', # Expected Improvement.
                            n_calls=40,
                            x0=default_parameters)

learning rate: 5.0e-03
RNN Nodes: 600
Embedding size: 20
Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.8515412478909531
learning rate: 2.1e-03
RNN Nodes: 662
Embedding size: 37
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30


Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.8996297998095466
learning rate: 4.7e-03
RNN Nodes: 578
Embedding size: 98
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.7401405268653565
learning rate: 6.9e-03
RNN Nodes: 637
Embedding size: 64
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30


Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Restoring model weights from the end of the best epoch
Epoch 00017: early stopping
Accuracy:  0.5677319409516542
learning rate: 2.8e-04
RNN Nodes: 723
Embedding size: 25
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.9001586585831663
learning rate: 8.0e-06
RNN Nodes: 317
Embedding size: 32
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30


Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.6487987301939742
learning rate: 1.3e-03
RNN Nodes: 596
Embedding size: 89
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.9228996701491279
learning rate: 4.2e-06
RNN Nodes: 579
Embedding size: 15
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30


Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.6378059841131797
learning rate: 5.8e-04
RNN Nodes: 375
Embedding size: 62
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.917648837064897
learning rate: 1.5e-04
RNN Nodes: 221
Embedding size: 44
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30


Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.8902236325353897
learning rate: 4.7e-05
RNN Nodes: 112
Embedding size: 61
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.7418404342581333
learning rate: 1.4e-03
RNN Nodes: 424
Embedding size: 10
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30


Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.9031429431812393
learning rate: 6.2e-04
RNN Nodes: 565
Embedding size: 100
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.9196887280005059
learning rate: 9.3e-04
RNN Nodes: 757
Embedding size: 38
Train on 4477 samples, validate on 2206 samples


Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.9099803549640306
learning rate: 1.0e-06
RNN Nodes: 192
Embedding size: 100
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


Accuracy:  0.23783620458403612
learning rate: 7.1e-05
RNN Nodes: 20
Embedding size: 10
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.6353505590452677
learning rate: 1.7e-04
RNN Nodes: 773
Embedding size: 100
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30


Epoch 29/30
Epoch 30/30
Accuracy:  0.9156467210457526
learning rate: 2.4e-04
RNN Nodes: 800
Embedding size: 68
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.9183665775539436
learning rate: 2.8e-03
RNN Nodes: 20
Embedding size: 10
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30


Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.8728467822615277
learning rate: 3.4e-04
RNN Nodes: 20
Embedding size: 45
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.7714188571205381
learning rate: 5.6e-05
RNN Nodes: 800
Embedding size: 10
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30


Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.8261181612719435
learning rate: 6.5e-04
RNN Nodes: 800
Embedding size: 10
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.8856149882699615
learning rate: 4.7e-04
RNN Nodes: 800
Embedding size: 85
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30


Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.9207464473310213
learning rate: 5.4e-04
RNN Nodes: 800
Embedding size: 100
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.9149289824480591
learning rate: 1.7e-04
RNN Nodes: 496
Embedding size: 60
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30


Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.9088093091008453
learning rate: 2.2e-03
RNN Nodes: 800
Embedding size: 10
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Restoring model weights from the end of the best epoch
Epoch 00018: early stopping
Accuracy:  0.5737005117149214
learning rate: 3.6e-04
RNN Nodes: 520
Embedding size: 100
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoc

Accuracy:  0.9211619875696931
learning rate: 1.5e-03
RNN Nodes: 594
Embedding size: 10
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.8862571786558855
learning rate: 3.8e-04
RNN Nodes: 523
Embedding size: 100
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30


Epoch 29/30
Epoch 30/30
Accuracy:  0.9190465383170846
learning rate: 6.7e-03
RNN Nodes: 315
Embedding size: 10
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.8467437288270467
learning rate: 4.0e-04
RNN Nodes: 523
Embedding size: 10
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30


Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.894227861115204
learning rate: 4.1e-04
RNN Nodes: 524
Embedding size: 100
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.9191220926892617
learning rate: 7.2e-04
RNN Nodes: 691
Embedding size: 100
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30


Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Restoring model weights from the end of the best epoch
Epoch 00028: early stopping
Accuracy:  0.9190087631844653
learning rate: 1.5e-03
RNN Nodes: 552
Embedding size: 46
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.9097536998992601
learning rate: 3.5e-03
RNN Nodes: 504
Embedding size: 10
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30


Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.7036113647688332
learning rate: 3.5e-04
RNN Nodes: 552
Embedding size: 100
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.9209353260743218
learning rate: 3.6e-04
RNN Nodes: 552
Embedding size: 100
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30


Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.9233529738480679
learning rate: 9.5e-04
RNN Nodes: 544
Embedding size: 21
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.9089226372546751
learning rate: 3.5e-04
RNN Nodes: 561
Embedding size: 100
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30


Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.9165911162753377
learning rate: 2.5e-03
RNN Nodes: 20
Embedding size: 45
Train on 4477 samples, validate on 2206 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Accuracy:  0.8814974302934282


In [23]:
print('The maximum validation accuracy reached is :- ',-1*search_result.fun)

The maximum validation accuracy reached is :-  0.9233529738480679


<div class="alert alert-block alert-info">
<b>Saving the model prepared:</b> We will now save the model fit above so that we can use it later
</div>

In [15]:
model = load_model(path_best_model)

In [21]:
from keras.models import load_model
model.save('chatbot_batman.h5')  

In [16]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 15, 100)           202800    
_________________________________________________________________
batch_normalization_1 (Batch (None, 15, 100)           400       
_________________________________________________________________
bidirectional_1 (Bidirection (None, 1104)              2888064   
_________________________________________________________________
batch_normalization_2 (Batch (None, 1104)              4416      
_________________________________________________________________
repeat_vector_1 (RepeatVecto (None, 12, 1104)          0         
_________________________________________________________________
bidirectional_2 (Bidirection (None, 12, 1104)          7321728   
_________________________________________________________________
batch_normalization_3 (Batch (None, 12, 1104)          4416      
__________

In [27]:
print('Learning rate   ',search_result.x[0])
print('Recurrent Nodes ',search_result.x[1])
print('Embedding Units ',search_result.x[2])

Learning rate    0.00035745406019704944
Recurrent Nodes  552
Embedding Units  100


<div class="alert alert-block alert-info">
<b>Chit chatting with my Bat-Bot developed :) :</b> We will try to perform some conversations with the bot developed.
</div>

In [17]:
def word_int(integer, tokenizer):
    for word, index in tokenizer.word_index.items():
        if index == integer:
            return word
    return None

In [18]:
def predict_sequence(model, tokenizer, value):
    prediction = model.predict(value, verbose=0)[0]
    integers = [np.argmax(vector) for vector in prediction]
    target = []
    for i in integers:
        word = word_int(i, tokenizer)
        if word is None:
            break
        target.append(word)
    return ' '.join(target)

In [19]:
while True:
    ques_input = input('Bat-Query :- Please enter your Query!  ')
    if ques_input == 'Thanks! Bye!':
        break
    question = [ques_input]
    encode_ques = encode_sequences(ques_tokenizer, question_length, question)
    translation = predict_sequence(model, ans_tokenizer, encode_ques)
    print('Bat-Bot:-' ,translation)

Bat-Query :- Please enter your Query!  Can I talk to your manager?
Bat-Bot:- im at your service
Bat-Query :- Please enter your Query!  What's your name ?
Bat-Bot:- i dont have family name
Bat-Query :- Please enter your Query!  You are so confusing me dear!
Bat-Bot:- i think i might have gotten lost there
Bat-Query :- Please enter your Query!  ya you have ..Certainly 
Bat-Bot:- excellent
Bat-Query :- Please enter your Query!  are you being serious with me .. i wanna talk to your manager
Bat-Bot:- im at your service
Bat-Query :- Please enter your Query!  You know I have programmed you and it took me one complete day for that !
Bat-Bot:- lets keep
Bat-Query :- Please enter your Query!  keep what ?
Bat-Bot:- i think have might have gotten
Bat-Query :- Please enter your Query!  ya you got me !
Bat-Bot:- i there
Bat-Query :- Please enter your Query!  you are my favorite !
Bat-Bot:- i dont need you you
Bat-Query :- Please enter your Query!  why?
Bat-Bot:- sorry about at
Bat-Query :- Please en