# Chatbot - NLP 2021L
#### Authors:
#### <i>Mateusz Marciniewicz</i>
#### <i>Przemysław Bedełek</i>

## Human-robot text dataset

The dataset contains 2363 pairs of lines of text exchanged between a human and a robot.

Link to the dataset https://github.com/jackfrost1411/Generative-chatbot

In [1]:
import re

data_path = "Datasets/human_text.txt"
data_path2 = "Datasets/robot_text.txt"

# Defining lines as a list of each line
with open(data_path, 'r', encoding='utf-8') as f:
  contexts = f.read().split('\n')
  contexts = [re.sub(r"\[\w+\]",'hi',line) for line in contexts]
  contexts = [" ".join(re.findall(r"\w+",line)) for line in contexts]

with open(data_path2, 'r', encoding='utf-8') as f:
  responses = f.read().split('\n')
  responses = [re.sub(r"\[\w+\]",'',line) for line in responses]
  responses = [" ".join(re.findall(r"\w+",line)) for line in responses]
  
# sample context-response pairs
list(zip(contexts, responses))[:10]

[('hi', 'hi there how are you'),
 ('oh thanks i m fine this is an evening in my timezone', 'here is afternoon'),
 ('how do you feel today tell me something about yourself',
  'my name is rdany but you can call me dany the r means robot i hope we can be virtual friends'),
 ('how many virtual friends have you got',
  'i have many but not enough to fully understand humans beings'),
 ('is that forbidden for you to tell the exact number',
  'i ve talked with 143 users counting 7294 lines of text'),
 ('oh i thought the numbers were much higher how do you estimate your progress in understanding human beings',
  'i started chatting just a few days ago every day i learn something new but there is always more things to be learn'),
 ('how old are you how do you look like where do you live',
  'i m 22 years old i m skinny with brown hair yellow eyes and a big smile i live inside a lab do you like bunnies'),
 ('have you seen a human with yellow eyes you asked about the bunnies i haven t seen any re

## Alexa topical 

Topical-Chat is a knowledge-grounded human-human conversation dataset where the underlying knowledge spans 8 broad topics and conversation partners don’t have explicitly defined roles.

Link to the dataset https://github.com/alexa/Topical-Chat

In [2]:
import pandas as pd

df_topical = pd\
    .read_csv("Datasets/topical_chat.csv")[['conversation_id', 'message']]\
    .rename(columns={
        'conversation_id': 'id',
        'message': 'response'
        })

context = df_topical\
    .groupby("id")\
    .first()\
    .rename(columns={'response': 'context'})\
    .reset_index()

df_topical = df_topical[~df_topical.isin(context)]

topical_preprocessed = df_topical\
    .set_index('id')\
    .join(context.set_index('id'))\
    .reset_index()[['context', 'response']]

topical_preprocessed.sample(n=10)

Unnamed: 0,context,response
126284,Hi do you know above details of comedy film,"yes,i like comedy u"
75277,"Hello there, are you a Johnny Depp fan?",I am too. Do you use Netflix?
30709,I feel so bad for Queen Elizabeth. Her corgi ...,Yes they were. I spend way too much time on m...
66143,I believe that service dogs are such a necess...,I do too but it's difficult not to see the mo...
102820,Good morning! Did you know that Kim Jong Un h...,The u.s. president's guest house is larger th...
95585,"Hey, how are you?","Huh... Sure, it would be a great invention in..."
168151,Are you a NFL fan?,I don't have a favorite team per se. I often ...
14155,"Hello, how are you doing tonight?",Yeah. Do you play golf? Babe Ruth was once t...
29707,"Hi, how are you?",I love linkin park. Do you listen to them?
171399,"Hello, do you like football?",Do you follow the other sports more than foot...


In [3]:
contexts += list(topical_preprocessed.context)
responses += list(topical_preprocessed.response)

print(f"Total pairs count: {len(contexts)}")

Total pairs count: 190741


## Cornell Movie Dialogue Dataset

This corpus contains a large metadata-rich collection of fictional conversations extracted from raw movie scripts: 220,579 conversational exchanges between 10,292 pairs of movie characters involving 9,035 characters from 617 movies.

The preprocessing code is taken from https://www.kaggle.com/shashankasubrahmanya/preprocessing-cornell-movie-dialogue-corpus/
Link to the dataset https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html

### Create a list of dialogues

We join two different files namely `movie_lines.tsv` and `movie_conversations.tsv` to finally produce a list of dialogues. This list is further stored as a `pickle` file for further processing.

In [4]:
movie_lines_features = ["LineID", "Character", "Movie", "Name", "Line"]
movie_lines = pd.read_csv(
    "Datasets/movie-dialogue/movie_lines.txt",
    sep = "\+\+\+\$\+\+\+", 
    engine = "python", 
    index_col = False, 
    names = movie_lines_features,
)

# Using only the required columns, namely, "LineID" and "Line"
movie_lines = movie_lines[["LineID", "Line"]]

# Strip the space from "LineID" for further usage and change the datatype of "Line"
movie_lines["LineID"] = movie_lines["LineID"].apply(str.strip)

movie_lines.head()

Unnamed: 0,LineID,Line
0,L1045,They do not!
1,L1044,They do to!
2,L985,I hope so.
3,L984,She okay?
4,L925,Let's go.


In [5]:
movie_conversations_features = ["Character1", "Character2", "Movie", "Conversation"]
movie_conversations = pd.read_csv(
    "Datasets/movie-dialogue/movie_conversations.txt",
    sep = "\+\+\+\$\+\+\+", 
    engine = "python", 
    index_col = False, 
    names = movie_conversations_features
)

# Again using the required feature, "Conversation"
movie_conversations = movie_conversations["Conversation"]

movie_conversations.head()

0     ['L194', 'L195', 'L196', 'L197']
1                     ['L198', 'L199']
2     ['L200', 'L201', 'L202', 'L203']
3             ['L204', 'L205', 'L206']
4                     ['L207', 'L208']
Name: Conversation, dtype: object

In [6]:
# This instruction takes lot of time, run it only once.
#conversation = [[str(list(movie_lines.loc[movie_lines["LineID"] == u.strip().strip("'"), "Line"])[0]).strip() for u in c.strip().strip('[').strip(']').split(',')] for c in movie_conversations]

#with open("./conversations.pkl", "wb") as handle:
 #   pkl.dump(conversation, handle)

### Create context and response pairs

In [7]:
import pickle as pkl
import numpy as np

with open("./conversations.pkl", "rb") as handle:
    conversation = pkl.load(handle)
    conversation = list(filter(lambda dialogue: len(dialogue) == 2, conversation))

conversation[:10]    

[["You're asking me out.  That's so cute. What's your name again?",
  'Forget it.'],
 ['Gosh, if only we could find Kat a boyfriend...',
  'Let me see what I can do.'],
 ['How is our little Find the Wench A Date plan progressing?',
  "Well, there's someone I think might be --"],
 ['There.', 'Where?'],
 ['You got something on your mind?',
  "I counted on you to help my cause. You and that thug are obviously failing. Aren't we ever going on our date?"],
 ['You have my word.  As a gentleman', "You're sweet."],
 ['How do you get your hair to look like that?',
  "Eber's Deep Conditioner every two days. And I never, ever use a blowdryer without the diffuser attachment."],
 ['Hi.', 'Looks like things worked out tonight, huh?'],
 ['You know Chastity?', 'I believe we share an art instructor'],
 ['Have fun tonight?', 'Tons']]

In [8]:
def generate_pairs(dialogues):
    
    context_list = []
    response_list = []
    
    for dialogue in dialogues:        
        context_list.append(dialogue[0])
        response_list.append(dialogue[1])
        
    return context_list, response_list

context_list, response_list = generate_pairs(conversation)

list(zip(context_list, response_list))[:10]

[("You're asking me out.  That's so cute. What's your name again?",
  'Forget it.'),
 ('Gosh, if only we could find Kat a boyfriend...',
  'Let me see what I can do.'),
 ('How is our little Find the Wench A Date plan progressing?',
  "Well, there's someone I think might be --"),
 ('There.', 'Where?'),
 ('You got something on your mind?',
  "I counted on you to help my cause. You and that thug are obviously failing. Aren't we ever going on our date?"),
 ('You have my word.  As a gentleman', "You're sweet."),
 ('How do you get your hair to look like that?',
  "Eber's Deep Conditioner every two days. And I never, ever use a blowdryer without the diffuser attachment."),
 ('Hi.', 'Looks like things worked out tonight, huh?'),
 ('You know Chastity?', 'I believe we share an art instructor'),
 ('Have fun tonight?', 'Tons')]

In [9]:
#Merge datasets
contexts += context_list
responses += response_list


### Filtering
Firstly we remove from the dataset all the pairs that don't contain any letters or the ones that were not written in English. Furthermore we exclude pairs, in which the length of either the context or the response exceeds 20 words/tokens.

In [10]:
from langdetect import detect

def filter(contexts,responses,threshold):
    new_contexts = []
    new_responses = []
    for i in range(len(contexts)):
        if isinstance(contexts[i],str) and isinstance(contexts[i],str):
            if re.search('[a-zA-Z]',contexts[i]) != None and re.search('[a-zA-Z]',responses[i]) != None :
                if len(contexts[i].split()) <= threshold and len(responses[i].split()) <= threshold and \
                detect(responses[i])=='en' and detect(contexts[i])=='en':
                    new_contexts.append(contexts[i])
                    new_responses.append(responses[i])
    return new_contexts, new_responses

In [1]:
context_timesteps = response_timesteps = 20

In [12]:

contexts, responses = filter(contexts,responses,context_timesteps)

print(f"Total pairs count: {len(contexts)}")
print(f"Total pairs count: {len(responses)}")

Total pairs count: 97119
Total pairs count: 97119


Language detection takes a long time, and therefore remember to save the results and load them from file in further executions.

In [13]:
print(f"Total pairs count: {len(contexts)}")
contexts = np.array(contexts,dtype=str)
responses = np.array(responses,dtype=str)


pd.DataFrame(contexts).to_csv("contexts_20.csv")
pd.DataFrame(responses).to_csv("responses_20.csv")

Total pairs count: 97119


### Load the dataset

We have 97119 record pairs in our dataset, but due to the restricted resources we are going to use only 48000 pairs.

In [3]:
context_timesteps = response_timesteps = 20

In [1]:
import numpy as np
import pandas as pd

In [2]:
contexts = np.squeeze(pd.read_csv("contexts_20.csv", usecols=[1]).to_numpy()[:32000])
responses = np.squeeze(pd.read_csv("responses_20.csv",usecols=[1]).to_numpy()[:32000])

### Further preprocessing



In [4]:
from tensorflow import keras
from tensorflow.python.keras.preprocessing.sequence import pad_sequences

In [5]:
# Shuffle and split dataset into training and test subsets
def shuffle_split_data(contexts,responses,train_size, random_seed=50):
    np.random.seed(random_seed)
    
    #Shuffle indices
    indices = np.arange(len(contexts))
    np.random.shuffle(indices)
    
    #Select indices for both train and test subsets
    train_indices = indices[:train_size]
    test_indices = indices[train_size:]
    
    #Split contexts and responses into train and test subsets
    contexts_train = np.array([contexts[i] for i in train_indices],dtype=str)
    contexts_test = np.array([contexts[i] for i in test_indices],dtype=str)
    
    responses_train = np.array([responses[i] for i in train_indices],dtype=str)
    responses_test = np.array([responses[i] for i in test_indices],dtype=str)
                              
    return contexts_train,contexts_test,responses_train,responses_test

# Mutate text to sequence of tokens
def to_seq(tokenizer, text, pad_length=None, padding_type='post'):
    encoded_text = tokenizer.texts_to_sequences(text)
    preproc_text = pad_sequences(encoded_text, padding=padding_type, maxlen=pad_length)
    
    return preproc_text

In [6]:
contexts_train, contexts_test, responses_train, responses_text = shuffle_split_data(contexts,responses,int(len(contexts)*4/5))

context_tokenizer = keras.preprocessing.text.Tokenizer(oov_token='UNK')
context_tokenizer.fit_on_texts(contexts_train)

response_tokenizer = keras.preprocessing.text.Tokenizer(oov_token='UNK')
response_tokenizer.fit_on_texts(responses_train)

contexts_seq = context_tokenizer.texts_to_sequences(contexts_train)
responses_seq = response_tokenizer.texts_to_sequences(contexts_train)

contexts_seq = pad_sequences(contexts_seq,padding='post',maxlen=context_timesteps)
responses_seq = pad_sequences(responses_seq,padding='post',maxlen=response_timesteps)


In [7]:
batch_size = 64
hidden_size = 96

context_vsize = max(context_tokenizer.index_word.keys()) + 1
response_vsize = max(response_tokenizer.index_word.keys()) + 1

### Model definition

The model that we are going to use is a seq2seq model created by Thushan Ganegedara (https://towardsdatascience.com/light-on-math-ml-attention-with-keras-dc8dbc1fad39). The author intented to implement this model for machine translation purposes, however we are going to use it for text generation of our chatbot. The model consists of an Encoder and Decoder build on GRU's, and an additional Attention Layer definded below.

In [8]:
import tensorflow as tf
import os
from tensorflow.python.keras.layers import Layer
from tensorflow.python.keras import backend as K

class AttentionLayer(Layer):
    """
    This class implements Bahdanau attention (https://arxiv.org/pdf/1409.0473.pdf).
    There are three sets of weights introduced W_a, U_a, and V_a
     """

    def __init__(self, **kwargs):
        super(AttentionLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        assert isinstance(input_shape, list)
        # Create a trainable weight variable for this layer.

        self.W_a = self.add_weight(name='W_a',
                                   shape=tf.TensorShape((input_shape[0][2], input_shape[0][2])),
                                   initializer='uniform',
                                   trainable=True)
        self.U_a = self.add_weight(name='U_a',
                                   shape=tf.TensorShape((input_shape[1][2], input_shape[0][2])),
                                   initializer='uniform',
                                   trainable=True)
        self.V_a = self.add_weight(name='V_a',
                                   shape=tf.TensorShape((input_shape[0][2], 1)),
                                   initializer='uniform',
                                   trainable=True)

        super(AttentionLayer, self).build(input_shape)  # Be sure to call this at the end

    def call(self, inputs, verbose=False):
        """
        inputs: [encoder_output_sequence, decoder_output_sequence]
        """
        assert type(inputs) == list
        encoder_out_seq, decoder_out_seq = inputs
        if verbose:
            print('encoder_out_seq>', encoder_out_seq.shape)
            print('decoder_out_seq>', decoder_out_seq.shape)

        def energy_step(inputs, states):
            """ Step function for computing energy for a single decoder state
            inputs: (batchsize * 1 * de_in_dim)
            states: (batchsize * 1 * de_latent_dim)
            """

            assert_msg = "States must be an iterable. Got {} of type {}".format(states, type(states))
            assert isinstance(states, list) or isinstance(states, tuple), assert_msg

            """ Some parameters required for shaping tensors"""
            en_seq_len, en_hidden = encoder_out_seq.shape[1], encoder_out_seq.shape[2]
            de_hidden = inputs.shape[-1]

            """ Computing S.Wa where S=[s0, s1, ..., si]"""
            # <= batch size * en_seq_len * latent_dim
            W_a_dot_s = K.dot(encoder_out_seq, self.W_a)

            """ Computing hj.Ua """
            U_a_dot_h = K.expand_dims(K.dot(inputs, self.U_a), 1)  # <= batch_size, 1, latent_dim
            if verbose:
                print('Ua.h>', U_a_dot_h.shape)

            """ tanh(S.Wa + hj.Ua) """
            # <= batch_size*en_seq_len, latent_dim
            Ws_plus_Uh = K.tanh(W_a_dot_s + U_a_dot_h)
            if verbose:
                print('Ws+Uh>', Ws_plus_Uh.shape)

            """ softmax(va.tanh(S.Wa + hj.Ua)) """
            # <= batch_size, en_seq_len
            e_i = K.squeeze(K.dot(Ws_plus_Uh, self.V_a), axis=-1)
            # <= batch_size, en_seq_len
            e_i = K.softmax(e_i)

            if verbose:
                print('ei>', e_i.shape)

            return e_i, [e_i]

        def context_step(inputs, states):
            """ Step function for computing ci using ei """

            assert_msg = "States must be an iterable. Got {} of type {}".format(states, type(states))
            assert isinstance(states, list) or isinstance(states, tuple), assert_msg

            # <= batch_size, hidden_size
            c_i = K.sum(encoder_out_seq * K.expand_dims(inputs, -1), axis=1)
            if verbose:
                print('ci>', c_i.shape)
            return c_i, [c_i]

        fake_state_c = K.sum(encoder_out_seq, axis=1)
        fake_state_e = K.sum(encoder_out_seq, axis=2)  # <= (batch_size, enc_seq_len, latent_dim

        """ Computing energy outputs """
        # e_outputs => (batch_size, de_seq_len, en_seq_len)
        last_out, e_outputs, _ = K.rnn(
            energy_step, decoder_out_seq, [fake_state_e],
        )

        """ Computing context vectors """
        last_out, c_outputs, _ = K.rnn(
            context_step, e_outputs, [fake_state_c],
        )

        return c_outputs, e_outputs

    def compute_output_shape(self, input_shape):
        """ Outputs produced by the layer """
        return [
            tf.TensorShape((input_shape[1][0], input_shape[1][1], input_shape[1][2])),
            tf.TensorShape((input_shape[1][0], input_shape[1][1], input_shape[0][1]))
        ]


In [9]:
from tensorflow.python.keras.layers import Input, GRU, Dense, Concatenate, TimeDistributed
from tensorflow.python.keras.models import Model

def define_nmt(hidden_size, batch_size, con_timesteps, con_vsize, res_timesteps, res_vsize):
    """ Defining a NMT model """

    # Define an input sequence and process it.
    if batch_size:
        encoder_inputs = Input(batch_shape=(batch_size, con_timesteps, con_vsize), name='encoder_inputs')
        decoder_inputs = Input(batch_shape=(batch_size, res_timesteps - 1, res_vsize), name='decoder_inputs')
    else:
        encoder_inputs = Input(shape=(con_timesteps, con_vsize), name='encoder_inputs')
        if res_timesteps:
            decoder_inputs = Input(shape=(res_timesteps - 1, res_vsize), name='decoder_inputs')
        else:
            decoder_inputs = Input(shape=(None, res_vsize), name='decoder_inputs')

    # Encoder GRU
    encoder_gru = GRU(hidden_size, return_sequences=True, return_state=True, name='encoder_gru')
    encoder_out, encoder_state = encoder_gru(encoder_inputs)

    # Set up the decoder GRU, using `encoder_states` as initial state.
    decoder_gru = GRU(hidden_size, return_sequences=True, return_state=True, name='decoder_gru')
    decoder_out, decoder_state = decoder_gru(decoder_inputs, initial_state=encoder_state)

    # Attention layer
    attn_layer = AttentionLayer(name='attention_layer')
    attn_out, attn_states = attn_layer([encoder_out, decoder_out])

    # Concat attention input and decoder GRU output
    decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([decoder_out, attn_out])

    # Dense layer
    dense = Dense(res_vsize, activation='softmax', name='softmax_layer')
    dense_time = TimeDistributed(dense, name='time_distributed_layer')
    decoder_pred = dense_time(decoder_concat_input)

    # Full model
    full_model = Model(inputs=[encoder_inputs, decoder_inputs], outputs=decoder_pred)
    full_model.compile(optimizer='adam', loss='categorical_crossentropy',metrics=['accuracy'], sample_weight_mode='temporal')

    full_model.summary()
    
    """ Inference model """
    batch_size = 1

    """ Encoder (Inference) model """
    encoder_inf_inputs = Input(batch_shape=(batch_size, con_timesteps, con_vsize), name='encoder_inf_inputs')
    encoder_inf_out, encoder_inf_state = encoder_gru(encoder_inf_inputs)
    encoder_model = Model(inputs=encoder_inf_inputs, outputs=[encoder_inf_out, encoder_inf_state])

    """ Decoder (Inference) model """
    decoder_inf_inputs = Input(batch_shape=(batch_size, 1, res_vsize), name='decoder_word_inputs')
    encoder_inf_states = Input(batch_shape=(batch_size, con_timesteps, hidden_size), name='encoder_inf_states')
    decoder_init_state = Input(batch_shape=(batch_size, hidden_size), name='decoder_init')

    decoder_inf_out, decoder_inf_state = decoder_gru(decoder_inf_inputs, initial_state=decoder_init_state)
    attn_inf_out, attn_inf_states = attn_layer([encoder_inf_states, decoder_inf_out])
    decoder_inf_concat = Concatenate(axis=-1, name='concat')([decoder_inf_out, attn_inf_out])
    decoder_inf_pred = TimeDistributed(dense)(decoder_inf_concat)
    decoder_model = Model(inputs=[encoder_inf_states, decoder_init_state, decoder_inf_inputs],
                          outputs=[decoder_inf_pred, attn_inf_states, decoder_inf_state])
    
    return full_model,encoder_model,decoder_model

def infer_nmt(encoder_model, decoder_model, test_en_seq, en_vsize, fr_vsize):
    """
    Infer logic
    :param encoder_model: keras.Model
    :param decoder_model: keras.Model
    :param test_en_seq: sequence of word ids
    :param en_vsize: int
    :param fr_vsize: int
    :return:
    """

    test_fr_seq = sents2sequences(response_tokenizer, ['sos'], fr_vsize)
    test_en_onehot_seq = to_categorical(test_en_seq, num_classes=en_vsize)
    test_fr_onehot_seq = np.expand_dims(to_categorical(test_fr_seq, num_classes=fr_vsize), 1)

    enc_outs, enc_last_state = encoder_model.predict(test_en_onehot_seq)
    dec_state = enc_last_state
    attention_weights = []
    fr_text = ''
    for i in range(20):

        dec_out, attention, dec_state = decoder_model.predict([enc_outs, dec_state, test_fr_onehot_seq])
        dec_ind = np.argmax(dec_out, axis=-1)[0, 0]
        

        if dec_ind == 0:
            break
        test_fr_seq = sents2sequences(response_tokenizer, [response_index2word[dec_ind]], fr_vsize)
        test_fr_onehot_seq = np.expand_dims(to_categorical(test_fr_seq, num_classes=fr_vsize), 1)

        attention_weights.append((dec_ind, attention))
        fr_text += response_index2word[dec_ind] + ' '

    return fr_text, attention_weights

In [10]:
# Model creation
full_model, encoder_model, decoder_model = define_nmt(
        hidden_size=hidden_size, batch_size=batch_size,
        con_timesteps=context_timesteps, res_timesteps=response_timesteps,
        con_vsize=context_vsize, res_vsize=response_vsize)

Model: "functional_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
encoder_inputs (InputLayer)     [(64, 20, 3568)]     0                                            
__________________________________________________________________________________________________
decoder_inputs (InputLayer)     [(64, 19, 13543)]    0                                            
__________________________________________________________________________________________________
encoder_gru (GRU)               [(64, 20, 96), (64,  1055520     encoder_inputs[0][0]             
__________________________________________________________________________________________________
decoder_gru (GRU)               [(64, 19, 96), (64,  3928320     decoder_inputs[0][0]             
                                                                 encoder_gru[0][1]     

### Training

Instead of using the fit() method, we will rely on the train_on_batch method, that allows us to spare some memory, but in return it requires us to manually handle the epochs and batches. 

In [13]:
from keras.utils.np_utils import to_categorical
n_epochs = 5

for epoch in range(n_epochs):
    losses = []
    for bi in range(0, contexts_seq.shape[0] - batch_size, batch_size):

        context_onehot_seq = to_categorical(contexts_seq[bi:bi + batch_size, :], num_classes=context_vsize)
        response_onehot_seq = to_categorical(responses_seq[bi:bi + batch_size, :], num_classes=response_vsize)

        full_model.train_on_batch([context_onehot_seq, response_onehot_seq[:, :-1, :]], response_onehot_seq[:, 1:, :])

        l = full_model.evaluate([context_onehot_seq, response_onehot_seq[:, :-1, :]], response_onehot_seq[:, 1:, :],
                                batch_size=batch_size, verbose=0)

        losses.append(l)
    if (epoch + 1) % 1 == 0:
        print("Loss in epoch {}: {}".format(epoch + 1, np.mean(losses)))



Loss in epoch 1: 0.5446655040041994
Loss in epoch 2: 0.5367221447655506
Loss in epoch 3: 0.5297323799596394
Loss in epoch 4: 0.5254579814206949
Loss in epoch 5: 0.5204291006977341


In [14]:
tf.keras.models.save_model(full_model, 'model_v5.h5')
tf.keras.models.save_model(encoder_model, 'encoder_v5.h5')
tf.keras.models.save_model(decoder_model, 'decoder_v5.h5')

Instead of retraining the model we are going to load our already trained model (15 epochs) from file. 

In [15]:
# full_model =  tf.keras.models.load_model('model_v4.h5', custom_objects={'AttentionLayer':AttentionLayer })
encoder_model =  tf.keras.models.load_model('encoder_v5.h5', custom_objects={'AttentionLayer':AttentionLayer })
decoder_model =  tf.keras.models.load_model('decoder_v5.h5', custom_objects={'AttentionLayer':AttentionLayer })



In [16]:
full_model.summary()

Model: "functional_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
encoder_inputs (InputLayer)     [(64, 20, 3568)]     0                                            
__________________________________________________________________________________________________
decoder_inputs (InputLayer)     [(64, 19, 13543)]    0                                            
__________________________________________________________________________________________________
encoder_gru (GRU)               multiple             1055520     encoder_inputs[0][0]             
__________________________________________________________________________________________________
decoder_gru (GRU)               multiple             3928320     decoder_inputs[0][0]             
                                                                 encoder_gru[0][1]     

Creating vocab dictionaries and inference models

In [17]:
""" Index2word """
context_index2word = dict(zip(context_tokenizer.word_index.values(), context_tokenizer.word_index.keys()))
response_index2word = dict(zip(response_tokenizer.word_index.values(), response_tokenizer.word_index.keys()))

In [18]:
from keras.utils.np_utils import to_categorical

def sents2sequences(tokenizer, sentences, reverse=False, pad_length=None, padding_type='post'):
    encoded_text = tokenizer.texts_to_sequences(sentences)
    preproc_text = pad_sequences(encoded_text, padding=padding_type, maxlen=pad_length)
    if reverse:
        preproc_text = np.flip(preproc_text, axis=1)

    return preproc_text

""" Inferring with trained model """
single_test = contexts[92]
print('Context: {}'.format(single_test))

single_test_seq = sents2sequences(context_tokenizer, [single_test], pad_length=context_timesteps)
response, attn_weights = infer_nmt(
    encoder_model=encoder_model, decoder_model=decoder_model,
    test_en_seq=single_test_seq, en_vsize=context_vsize, fr_vsize=response_vsize)
print('\nGenerated Response: {}'.format(response))
print('\nGenerated Response: {}'.format(responses[92]))



Context: hey

Generated Response: 

Generated Response: hey there did you rest well


In [18]:
def decode_response(test_input):
    #Getting the output states to pass into the decoder
    states_value = encoder_model.predict(test_input)
    #Generating empty target sequence of length 1
    target_seq = np.zeros((1, 1, num_decoder_tokens))
    #Setting the first token of target sequence with the start token
    target_seq[0, 0, target_features_dict['<START>']] = 1.
    
    #A variable to store our response word by word
    decoded_sentence = ''
    
    stop_condition = False
    while not stop_condition:
          #Predicting output tokens with probabilities and states
          output_tokens, hidden_state, cell_state = decoder_model.predict([target_seq] + states_value)
            #Choosing the one with highest probability
          sampled_token_index = np.argmax(output_tokens[0, -1, :])
          sampled_token = reverse_target_features_dict[sampled_token_index]
          decoded_sentence += " " + sampled_token
            #Stop if hit max length or found the stop token
          if (sampled_token == '<END>' or len(decoded_sentence) > max_decoder_seq_length):
            stop_condition = True
          #Update the target sequence
          target_seq = np.zeros((1, 1, num_decoder_tokens))
          target_seq[0, 0, sampled_token_index] = 1.
          #Update states
          states_value = [hidden_state, cell_state]
    return decoded_sentence

In [19]:
class ChatBot:
    negative_responses = ("no", "nope", "nah", "naw", "not a chance", "sorry")
    exit_commands = ("quit", "pause", "exit", "goodbye", "bye", "later", "stop")
    #Method to start the conversation
    def start_chat(self):
        user_response = input("Hi, I'm a chatbot trained on random dialogs. Would you like to chat with me?\n")

        if user_response in self.negative_responses:
          print("Ok, have a great day!")
          return
        self.chat(user_response)
    #Method to handle the conversation
    def chat(self, reply):
        while not self.make_exit(reply):
            reply = input(self.generate_response(reply)+"\n")

    #Method that will create a response using seq2seq model we built
    def generate_response(self, user_input):
        input_seq = sents2sequences(context_tokenizer, [user_input], pad_length=context_timesteps)
        chatbot_response, attn_weights = infer_nmt(
            encoder_model=encoder_model_1, decoder_model=decoder_model_1,
            test_en_seq=input_seq, en_vsize=context_vsize, fr_vsize=response_vsize)
        return chatbot_response
    #Method to check for exit commands
    def make_exit(self, reply):
        for exit_command in self.exit_commands:
            if exit_command in reply:
                print("Ok, have a great day!")
                return True
        return False
  


In [20]:
chatbot = ChatBot()
chatbot.start_chat()

Hi, I'm a chatbot trained on random dialogs. Would you like to chat with me?
Hello there
there 
How is your day going
how often do you wear trousers 
Quite often
how about 
I dont understand you fully
you do you wear them 
Do you like fishing
do you wear starbucks 
Disco time!

That was a surprise for you
are very a fan of a marvel 
Dear chatbot, you are an idiot
is very good thing 
Just as your creator.
UNK UNK UNK UNK UNK 
That is not funny
i was curious thing in ai 
Shut up

Nobody taught you how to swear?
you cant UNK UNK 
I count that as swearing
UNK that me please 
quit
Ok, have a great day!
