# Chat Bot:
* **This model is based on the End-to-End Memory network. This model is trained using babi dataset, published by Facebook research.**
* The summary of the data is following:
    1. A story of 4-5 sentences is provided to the model.
    2. A question is given to the model based on the story given.
    3. Answer is binary i.e. yes or no type.
* Model summary is as given below:
    1. The story is used as the memory.
    2. The story is embedded using embedding matrix.
    3. Question is matched with the question, which is also embedded with different embedding matrix. **Softmax activation** function is used for scoring the match.
    4. Another embedded matrix is also used in the model, this vectorize the story, which is then added with the match matrix.This is called response
    5. This inturn, is concatnated with the question (which is embedded) to produce the input for decoder(answer).
    6. This is then passed to **LSTM** layer.
    7. And then finally connected to a **Dense** layer.
    8. This is activated with the **sigmoid** fuction to produe probability of each of the word.
    9. Answer will be with highest probability.

In [1]:
import pickle
import numpy as np
import tensorflow as tf

In [2]:
with open('train_qa.txt', 'rb') as fp:
    train_data = pickle.load(fp)

In [3]:
with open('test_qa.txt', 'rb') as fp:
    test_data = pickle.load(fp)

In [4]:
vocab  = set()
all_data = train_data+test_data
len(all_data)

11000

In [5]:
for story,question, answer in all_data:
    vocab = vocab.union(set(story))
    vocab = vocab.union(set(question))

In [6]:
vocab.add('yes')
vocab.add('no')

In [7]:
vocab_len = len(vocab) + 1
story_max_len = max([len(data[0]) for data in all_data])
ques_max_len = max([len(data[1]) for data in all_data])

In [8]:
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

In [9]:
tokenizer = Tokenizer(filters = [])

In [10]:
tokenizer.fit_on_texts(vocab)
word_index = tokenizer.word_index

In [11]:
def vectorized(data,word_index = word_index, max_story = story_max_len, max_ques = ques_max_len ):
    #for the stories(x)
    X = []
    
    #for the question(q)
    Xq = []
    
    #for the answer(a)
    
    A = []
    
    for story, question, answer in data:
        
        #assigning index for every word in story
        x = [word_index[word.lower()] for word in story]
        
        #assigning index for every word in story
        xq = [word_index[word.lower()] for word in question]
        
        # assign index for the answer
        
        a = np.zeros(len(word_index) + 1)
        a[word_index[answer]] =1
        
        X.append(x)
        
        Xq.append(xq)
        
        A.append(a)
        
        
    return (pad_sequences(X, maxlen= max_story), pad_sequences(Xq, maxlen= max_ques), np.array(A))

In [12]:
train_story , train_question , train_answer = vectorized(train_data, word_index)

In [13]:
test_story, test_question, test_answer = vectorized(test_data)

In [14]:
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, LSTM, Permute, add, concatenate, dot, Activation, Input, Embedding, Dropout

In [15]:
input_seq  = Input((story_max_len,))
ques = Input((ques_max_len,))

Now we can build the encoder as defined in the paper , "End to End Menory Netowrks"

Encoder M, this is used to store and encode memories

In [16]:
encoder_m = Sequential()
encoder_m.add(Embedding(input_dim= vocab_len, output_dim = 64))
encoder_m.add(Dropout(0.2))

Now we can encode C encoder, the output_dimension will be equal to the max length of the question

In [17]:
encoder_c = Sequential()
encoder_c.add(Embedding(input_dim = vocab_len, output_dim = ques_max_len))
encoder_c.add(Dropout(0.2))

Now we can create a question encoder, which will take question as input vector and provide the embedded vector. Output diemensions would be 64 in this case as well

In [18]:
question_encoder = Sequential()
question_encoder.add(Embedding(input_dim = vocab_len, output_dim= 64,input_length = ques_max_len ))
question_encoder.add(Dropout(0.2))

In [19]:
# encode input sequence and questions (which are indices)
# to sequences of dense vectors
input_encoded_m = encoder_m(input_seq)
input_encoded_c = encoder_c(input_seq)
question_encoded = question_encoder(ques)

In [20]:
# shape: `(samples, story_maxlen, query_maxlen)`
match = dot([input_encoded_m, question_encoded], axes=(2, 2))
match = Activation('softmax')(match)

In [21]:
# add the match matrix with the second input vector sequence
response = add([match, input_encoded_c])  # (samples, story_maxlen, query_maxlen)
response = Permute((2, 1))(response)  # (samples, query_maxlen, story_maxlen)

In [22]:
# concatenate the match matrix with the question vector sequence
answer = concatenate([response, question_encoded])

In [23]:
answer

<tf.Tensor 'concatenate/Identity:0' shape=(None, 6, 220) dtype=float32>

In [24]:
answer = LSTM(32)(answer)  # (samples, 32)

In [25]:
# Regularization with Dropout
answer = Dropout(0.5)(answer)
answer = Dense(vocab_len)(answer)  # (samples, vocab_size)

In [26]:
answer = Activation('sigmoid')(answer)

# build the final model
model = Model([input_seq, ques], answer)
model.compile(optimizer='rmsprop', loss='binary_crossentropy',
              metrics=['accuracy'])

In [27]:
model.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 156)]        0                                            
__________________________________________________________________________________________________
input_2 (InputLayer)            [(None, 6)]          0                                            
__________________________________________________________________________________________________
sequential (Sequential)         multiple             2432        input_1[0][0]                    
__________________________________________________________________________________________________
sequential_2 (Sequential)       (None, 6, 64)        2432        input_2[0][0]                    
______________________________________________________________________________________________

In [28]:
history = model.fit([train_story, train_question], train_answer,batch_size=32,epochs=120,validation_data=([test_story, test_question], test_answer))

Train on 10000 samples, validate on 1000 samples
Epoch 1/120
Epoch 2/120
Epoch 3/120
Epoch 4/120
Epoch 5/120
Epoch 6/120
Epoch 7/120
Epoch 8/120
Epoch 9/120
Epoch 10/120
Epoch 11/120
Epoch 12/120
Epoch 13/120
Epoch 14/120
Epoch 15/120
Epoch 16/120
Epoch 17/120
Epoch 18/120
Epoch 19/120
Epoch 20/120
Epoch 21/120
Epoch 22/120
Epoch 23/120
Epoch 24/120
Epoch 25/120
Epoch 26/120
Epoch 27/120
Epoch 28/120
Epoch 29/120
Epoch 30/120
Epoch 31/120
Epoch 32/120
Epoch 33/120
Epoch 34/120
Epoch 35/120
Epoch 36/120
Epoch 37/120
Epoch 38/120
Epoch 39/120
Epoch 40/120
Epoch 41/120
Epoch 42/120
Epoch 43/120
Epoch 44/120
Epoch 45/120
Epoch 46/120
Epoch 47/120
Epoch 48/120
Epoch 49/120
Epoch 50/120
Epoch 51/120
Epoch 52/120
Epoch 53/120


Epoch 54/120
Epoch 55/120
Epoch 56/120
Epoch 57/120
Epoch 58/120
Epoch 59/120
Epoch 60/120
Epoch 61/120
Epoch 62/120
Epoch 63/120
Epoch 64/120
Epoch 65/120
Epoch 66/120
Epoch 67/120
Epoch 68/120
Epoch 69/120
Epoch 70/120
Epoch 71/120
Epoch 72/120
Epoch 73/120
Epoch 74/120
Epoch 75/120
Epoch 76/120
Epoch 77/120
Epoch 78/120
Epoch 79/120
Epoch 80/120
Epoch 81/120
Epoch 82/120
Epoch 83/120
Epoch 84/120
Epoch 85/120
Epoch 86/120
Epoch 87/120
Epoch 88/120
Epoch 89/120
Epoch 90/120
Epoch 91/120
Epoch 92/120
Epoch 93/120
Epoch 94/120
Epoch 95/120
Epoch 96/120
Epoch 97/120
Epoch 98/120
Epoch 99/120
Epoch 100/120
Epoch 101/120
Epoch 102/120
Epoch 103/120
Epoch 104/120
Epoch 105/120
Epoch 106/120
Epoch 107/120


Epoch 108/120
Epoch 109/120
Epoch 110/120
Epoch 111/120
Epoch 112/120
Epoch 113/120
Epoch 114/120
Epoch 115/120
Epoch 116/120
Epoch 117/120
Epoch 118/120
Epoch 119/120
Epoch 120/120


In [29]:
import matplotlib.pyplot as plt

plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])

[<matplotlib.lines.Line2D at 0x19db6cb4488>]

In [30]:
model.save('chatbot_akash.h5')

In [31]:
test_pred_res = model.predict([test_story, test_question])

In [32]:
# lets see a test story:

' '.join(test_data[5][0])


'Daniel went back to the kitchen . Mary grabbed the apple there .'

In [33]:
# lets see the question asked:

' '.join(test_data[5][1])

'Is Daniel in the office ?'

In [34]:
# lets see what is the actual answer

test_data[5][2]

'no'

In [35]:
val_max = np.argmax(test_pred_res[5])

for key, val in tokenizer.word_index.items():
    if val == val_max:
        k = key
        
print("Predicted answer is: ", k)
print("Probability of certainty was: ", test_pred_res[5][val_max])

Predicted answer is:  no
Probability of certainty was:  0.99993086


In [None]:
s