### Intro: 

Through this tutorial we will implement a Deep NLP ChatBot using Tensorflow. So without further a do let's get right into it.

We'll start by importing the libraries needed for this project.

In [1]:
import numpy as np 
import tensorflow as tf
import re #Helps with data preprocessing
import time
import datetime

In [2]:
#Please make sure your version of Tensorflow is 1.0.0
tf.__version__

'1.0.0'

The dataset used for the training of this ChatBot are taking from: 

https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html

This dataset is called Cornell Movie--Dialogs Corpus, and it contains conversations between actors from a large number of movies, so the type of our ChatBot would be a friend-like ChatBot (able to do casual conversations), for more field specific ChatBots we can use other kind of datasets. Anyway, for further informations about the data used you can look at the link above.

It's important to know that the dataset used is composed of 2 text files: "movie_lines.txt" and "movie_conversations.txt". The first contains the lines from different movies in an unorderly fashion, but these lines have IDs, these IDs are used in the second file to identify the lines that correspond to a certain conversation, so the second file works as a way to order the line fro first file. 


# I. Data preprocesing: 

Generally, this is the longest part of each project, in which we will make the data ready for input into the deep learning model. Luckily the er library is here to carry some load of this phase. Let's begin:


In [3]:
# Loading data: We will load both the lines and conversations

with open("C:/Users/YsfEss/Desktop/data/movie_lines.txt",encoding='utf-8',errors='ignore') as f1:
    lines=f1.read().split('\n') #304714 lines
with open("C:/Users/YsfEss/Desktop/data/movie_conversations.txt",encoding='utf-8',errors='ignore') as f2:
    convos=f2.read().split('\n')

In [4]:
# Now let's create a dictionary that maps each line with its ID.
id2line={}

for line in lines:
    spl=line.split(' +++$+++ ')
    if len(spl)==5:
        id2line[spl[0]]=spl[-1]

In [5]:
# We will now create a list of conversations. (IDs of lines in list)

convoli= []

for conv in convos[:-1]: #The last row of this list is empty
    spl=conv.split(' +++$+++ ')[-1][1:-1].replace("'","").replace(' ','')
    convoli.append(spl.split(','))

In [6]:
# From the list of convo ids we will try build two lists one for 'questions' and the other for 'answers'.

questions=[]
answers=[]

for conv in convoli:
    k=len(conv)
    for i in range(k-1):
        questions.append(id2line[conv[i]])
        answers.append(id2line[conv[i+1]])


In [7]:
# Now for text cleaning

def cleanText(text):
    # text to lower case
    text=text.lower()
    # Now to make it easier for the ChatBot to learn we gonna use re to replace expression like "i'm" with "i am"
    text=re.sub(r"i'm","i am",text)
    text=re.sub(r"she's","she is",text)
    text=re.sub(r"he's","he is",text)
    text=re.sub(r"it's","it is",text)
    text=re.sub(r"that's","that is",text)
    text=re.sub(r"what's","what is",text)
    text=re.sub(r"where's","where is",text)
    text=re.sub(r"\'ve"," have",text)
    text=re.sub(r"\'ll"," will",text)
    text=re.sub(r"\'d"," would",text)
    text=re.sub(r"\'re"," are",text)
    text=re.sub(r"won't","would not",text)
    text=re.sub(r"can't","can not",text)
    text=re.sub(r"wouldn't","would not",text)
    text=re.sub(r"couldn't","could not",text)
    text=re.sub(r"haven't","have not",text)
    text=re.sub(r"didn't","did not",text)
    text=re.sub(r"cannot","can not",text)
    text=re.sub(r"gonna","going to",text)
    text=re.sub(r"wanna","want to",text)
    text=re.sub(r"don't","do not",text)
    text=re.sub(r"[-()/\"#$%^&*()_+@=?<>:;,.!{}'|]","",text)
    #Do as you can in here the better the cleaning the better the result
    return(text)

clean_questions=[cleanText(line) for line in questions if len(cleanText(line))!=0]
clean_answers=[cleanText(line) for line in answers if len(cleanText(line))!=0]

In [8]:
# In order to optimize our ChatBot training we will try to remove infrequent words from both questions and answers lists.
# So the first step to do that is to generate a dictionnary that maps word to their cardinality within the dataset.

wordOccur={}
for question in clean_questions:
    l=question.split()
    for i in range (len(l)) :
        if l[i] in wordOccur.keys():
            wordOccur[l[i]]+=1
        else:
            wordOccur[l[i]]=1
for answer in clean_answers:
    l=answer.split()
    for i in range (len(l)) :
        if l[i] in wordOccur.keys():
            wordOccur[l[i]]+=1
        else:
            wordOccur[l[i]]=1
            
            
# The second step is to set a threshold for the number of occurence of words that will be used in the training of the model.
# Let's create a 2 dictionaries that maps each word from questions/answers to a unique identifier.

treshold=20 #This as of now a hyperparameter of the model, 20 seems reasonable we can either decrease it or increase it based on obtained results.

Qwords=[q.split()[i] for q in clean_questions for i in range(len(q.split()))] #Words in the questions.
Qwords=list(set(Qwords)) #Remove redundencies
Awords=[a.split()[i] for a in clean_answers for i in range(len(a.split()))]   #Words in the answers.
Awords=list(set(Awords))

questionwordsIDs={}

wordID=0
for word , count in wordOccur.items():
    if (count > 20 and word in Qwords):
        questionwordsIDs[word]=wordID
        wordID+=1
        
answerwordsIDs={}
        
wordID=0
for word , count in wordOccur.items():
    if (count > 20 and word in Awords):
        answerwordsIDs[word]=wordID
        wordID+=1


In [9]:
# We will now add tokens necessary for the SEQ2SEQ model to the dictionary with their unique IDs.

tokens=['<PAD>','<EOS>','<OUT>','<SOS>']
for token in tokens:
    questionwordsIDs[token]=len(questionwordsIDs)+1
for token in tokens:
    answerwordsIDs[token]=len(answerwordsIDs)+1

In [10]:
# In the implmentation of the SEQ2SEQ model we will need the inverse mapping ID--> word for the answer dictionary so let's do that.

answerIDs2words={wordID:word for word,wordID in answerwordsIDs.items()}

In [11]:
# Let's add at the end to clean_answers <EOS>.

for i in range (len(clean_answers)):
    clean_answers[i]+=' <EOS>'

# Now we will translate questions and answers into a set of integers which are their IDs as defined before.

codedQuestions=[]
i=0
for question in clean_questions:
    l=question.split()
    temp=[]
    if len(l)>0:
        for word in l:
            if (word not in questionwordsIDs.keys()):
                temp.append(questionwordsIDs['<OUT>'])
            else:
                temp.append(questionwordsIDs[word])
        if len(temp)==0:
            print(i)
        codedQuestions.append(temp)
        i+=1

codedAnswers=[]
for answer in clean_answers:
    l=answer.split()
    temp=[]
    if len(l)>0:
        for word in l:
            if (word not in answerwordsIDs.keys()):
                temp.append(answerwordsIDs['<OUT>'])
            else:
                temp.append(answerwordsIDs[word])
        codedAnswers.append(temp)

# So final step,  before getting into modeling and what we will need to do is sorting the questions and answers by length
# this helps (speed-up) with the learning process. 

SortclQues=[x for x in sorted(codedQuestions,key=len) if len(x)<=25]
SortclAns=[x for x in sorted(codedAnswers,key=len) if len(x)<=25]

# II. Building the SEQ2SEQ model:

Now we will start using Tensorflow to build the architecture of the model that ww will train in the next phase, so let's get into it.

It's important to note that in Tensorflow all variables are tensors, a tensor is a special data structure that is without being mathematically rigorous can be considered as a multidimensional vector, a matrix for example is a rank 2 tensor. These tensor based variables allow a fast computation for deep neural networks, so in order to use this tensor variables we must define them in a Tensorflow placeholder. So the first thing we will do is create placeholders for inputs and targets and also for some hyperparameters. Let's go!



In [12]:
def modelInputs():
    inputs=tf.placeholder(tf.int32,[None,None],name='input') # arguments: type, size(matrix: size of batch + sequence length), name
    targets=tf.placeholder(tf.int32,[None,None],name='target')
    lr=tf.placeholder(tf.float32,name='learning_rate')
    keep_prob=tf.placeholder(tf.float32,name='drop_out_rate') #A hyperparameter that designate the dropout rate, generally it's at 20% (This idea helps prevent overfitting)
    return(inputs,targets,lr,keep_prob)

As you know a RNN model is composed of two main parts, an encoder part that recieves the input sequence, and a decoder that generates dequentially the output. In Tensorflow the decode needs the targets in a particular form which is composed of two main phases. First we must provide taargets by batches (a batch size to specified) and also to ensure every target (answer) of the batch starts with a '< SOS >' tag. So that's the plan of attack for the next step. Let's start.

![image info](./EN-DE.png)



In [13]:
def preprocess_targets(targets,word2int,batch_size): #word2in is the dictionary that maps words to their ID.
    left_side=tf.fill([batch_size,1],word2int['<SOS>'])
    preProTar=tf.concat([left_side,targets],axis=1)
    return(preProTar)

In [14]:
# Now we will officially start the architecture of the model. So first the encoder:

def encoder_rnn(rnn_inputs, rnn_size, num_layers, keep_prob, sequence_length): #rnn_size is number of input tensors in the encoder/ list of length of sequences of the batch
    lstm = tf.contrib.rnn.BasicLSTMCell(rnn_size) #create the LSTM
    lstm_dropout = tf.contrib.rnn.DropoutWrapper(lstm, input_keep_prob = keep_prob) #Creating the the dropout
    # Till now u just created the architecture of one cell of the RNN(LSTM). Now to create the encoder cell.
    encoder_cell = tf.contrib.rnn.MultiRNNCell([lstm_dropout] * num_layers)
    encoder_output, encoder_state = tf.nn.bidirectional_dynamic_rnn(cell_fw = encoder_cell,
                                                                    cell_bw = encoder_cell,
                                                                    sequence_length = sequence_length,
                                                                    inputs = rnn_inputs,
                                                                    dtype = tf.float32)
    # Making the chatbot as good as we can by using a bidirectional RNN.
    return encoder_state

In [15]:
# Now we will implement the fucntion that does the decoding for the training set then returns the decoding
# outputs, we also implemented the attention concept. 

def decode_trainSet(encoder_state,decoder_cell,decoder_embedded_inputs,sequence_length,decoding_scope,output_function,keep_prob,batch_size): #Embeddings are representations of words in a unique vector of numbers, in our case thery are the inputs for the decoder
    attention_states=tf.zeros([batch_size,1,decoder_cell.output_size])
    attention_keys,attention_values,attention_score_function,attention_construct_function=tf.contrib.seq2seq.prepare_attention(attention_states,attention_option='bahdanau',num_units=decoder_cell.output_size)
    training_decoder_function=tf.contrib.seq2seq.attention_decoder_fn_train(encoder_state[0],
                                                                           attention_keys,
                                                                            attention_values,
                                                                            attention_score_function,
                                                                            attention_construct_function,
                                                                           name='att_dec_train')
    decoder_output,_,_=tf.contrib.seq2seq.dynamic_rnn_decoder(decoder_cell,training_decoder_function,decoder_embedded_inputs,sequence_length,decoding_scope)
    decoder_output_drop_out=tf.nn.dropout(decoder_output,keep_prob)
    return(output_function(decoder_output_drop_out))    

In [16]:
def decode_training_set(encoder_state, decoder_cell, decoder_embedded_input, sequence_length, decoding_scope, output_function, keep_prob, batch_size):
    attention_states = tf.zeros([batch_size, 1, decoder_cell.output_size])
    attention_keys, attention_values, attention_score_function, attention_construct_function = tf.contrib.seq2seq.prepare_attention(attention_states, attention_option = "bahdanau", num_units = decoder_cell.output_size)
    training_decoder_function = tf.contrib.seq2seq.attention_decoder_fn_train(encoder_state[0],
                                                                              attention_keys,
                                                                              attention_values,
                                                                              attention_score_function,
                                                                              attention_construct_function,
                                                                              name = "attn_dec_train")
    decoder_output, decoder_final_state, decoder_final_context_state = tf.contrib.seq2seq.dynamic_rnn_decoder(decoder_cell,
                                                                                                              training_decoder_function,
                                                                                                              decoder_embedded_input,
                                                                                                              sequence_length,
                                                                                                              scope = decoding_scope)
    decoder_output_dropout = tf.nn.dropout(decoder_output, keep_prob)
    return output_function(decoder_output_dropout)

In [17]:
# Now for the decoder intended for the test/validation sets. This is going to be very similar to the last part.

def decode_test_set(encoder_state, decoder_cell, decoder_embeddings_matrix, sos_id, eos_id, maximum_length, num_words, decoding_scope, output_function, keep_prob, batch_size):
    attention_states = tf.zeros([batch_size, 1, decoder_cell.output_size])
    attention_keys, attention_values, attention_score_function, attention_construct_function = tf.contrib.seq2seq.prepare_attention(attention_states, attention_option = "bahdanau", num_units = decoder_cell.output_size)
    test_decoder_function = tf.contrib.seq2seq.attention_decoder_fn_inference(output_function,
                                                                              encoder_state[0],
                                                                              attention_keys,
                                                                              attention_values,
                                                                              attention_score_function,
                                                                              attention_construct_function,
                                                                              decoder_embeddings_matrix,
                                                                              sos_id,
                                                                              eos_id,
                                                                              maximum_length,
                                                                              num_words,
                                                                              name = "attn_dec_inf")
    test_predictions,_,_ = tf.contrib.seq2seq.dynamic_rnn_decoder(decoder_cell,
                                                                test_decoder_function,
                                                                scope = decoding_scope)
    return test_predictions

In [18]:
#Now at last we create the decoder

def decoder_rnn(decoder_embedded_input,decoder_embeddings_matrix,encoder_state,num_words,sequence_length,rnn_size,num_layers,word2int,keep_prob,batch_size):
    
    with tf.variable_scope('decoding') as decoding_scope:
        lstm=tf.contrib.rnn.BasicLSTMCell(rnn_size) #the following 3 lines are same as decoder
        lstm_dropOut=tf.contrib.rnn.DropoutWrapper(lstm,input_keep_prob=keep_prob)
        decoder_cell=tf.contrib.rnn.MultiRNNCell([lstm_dropOut]*num_layers)
        weights=tf.truncated_normal_initializer(stddev=0.1)
        biases=tf.zeros_initializer()
        output_function=lambda x: tf.contrib.layers.fully_connected(x,
                                                                   num_words,
                                                                   None,
                                                                   scope=decoding_scope,
                                                                   weights_initializer=weights,
                                                                    biases_initializer=biases)
        training_predictions=decode_training_set(encoder_state,decoder_cell,decoder_embedded_input,sequence_length,decoding_scope,output_function,keep_prob,batch_size)
        
        decoding_scope.reuse_variables()
        test_predictions=decode_test_set(encoder_state,decoder_cell,decoder_embeddings_matrix,word2int['<SOS>'],word2int['<EOS>'],sequence_length-1,num_words,decoding_scope,output_function,keep_prob,batch_size)
    return(training_predictions,test_predictions)

In [19]:
# Building the SEQ2SEQ Model.

def seq2seq_model(inputs,targets,keep_prob,batch_size,sequence_length,answers_num_words,questions_num_words,encoder_embedding_size,decoder_embedding_size,rnn_size,num_layers,questionwordsIDs):
    encoder_embeded_input=tf.contrib.layers.embed_sequence(inputs,
                                                          answers_num_words+1,
                                                          encoder_embedding_size,
                                                          initializer=tf.random_uniform_initializer(1,0))
    encoder_state=encoder_rnn(encoder_embeded_input,rnn_size,num_layers,keep_prob,sequence_length)
    preprocTargets=preprocess_targets(targets,questionwordsIDs,batch_size)
    decoder_embeddings_matrix=tf.Variable(tf.random_uniform([questions_num_words+1,decoder_embedding_size],0,1))
    decoder_embedded_input=tf.nn.embedding_lookup(decoder_embeddings_matrix,preprocTargets)
    training_predictions,test_predictions=decoder_rnn(decoder_embedded_input,decoder_embeddings_matrix,encoder_state,questions_num_words,sequence_length,rnn_size,num_layers,questionwordsIDs,keep_prob,batch_size)
    return(training_predictions,test_predictions)

# III. Training the SEQ2SEQ model:

In [77]:
# We'll start by setting the hyperparameters thar will be used during the training. Obviously those are to be tweaked to make
#the chatbot be better.

epochs=100
batch_size=64 #It is adviiced to use a batch size that is a multiple of 2.
rnn_size=512
num_layers=3
encoding_embedding_size=512
decoding_embedding_size=512
learning_rate=0.01
learning_rate_decay=0.9
min_learning_rate=0.0001
keep_probability=0.5

Tensorflow uses Dataflow programing which is a pradigm that models a program as a oriented graph, for which nodes are the operations and the  edges represent input and output data, this helps with parallelism which is important in Deep learning computations. So to use it we should first create a dataflow graph then create a session to run parts of the graph.

So that's what we will do now defining the session for the training phase.

Note: The only difference with a regular Session is that an InteractiveSession installs itself as the default session on construction. The methods Tensor.eval() and Operation.run() will use that session to run ops.

In [78]:
tf.reset_default_graph() #reseting the tf default graph which we will use.
session= tf.InteractiveSession() # Creating the interactive session.

In [79]:
# Loading the model inputs:

inputs,targets,lr,keep_prob = modelInputs()

In [80]:
# Setting the sequence length

sequence_length=tf.placeholder_with_default(25,None,name='sequence_length')

In [81]:
# Setting the input shape

input_shape=tf.shape(inputs)


In [82]:
# Now for the exciting stuff, by getting the train/test predictions.

training_predictions,test_predictions = seq2seq_model(tf.reverse(inputs,[-1]),
                                                      targets,
                                                      keep_prob,
                                                      batch_size,
                                                      sequence_length,
                                                      len(questionwordsIDs),
                                                      len(answerwordsIDs),
                                                      encoding_embedding_size,
                                                      decoding_embedding_size,
                                                      rnn_size,
                                                      num_layers,
                                                      questionwordsIDs)

In [83]:
# Setting loss error, optimizer and gradient clipping (Forcing gradient to a min/max values if it breaches the bounds).

with tf.name_scope('Optimization'):
    
    loss_error = tf.contrib.seq2seq.sequence_loss(training_predictions,targets,tf.ones([input_shape[0],sequence_length]))
    optimizer=tf.train.AdamOptimizer(learning_rate)
    gradients=optimizer.compute_gradients(loss_error)
    clipped_gradients=[(tf.clip_by_value(grad_tensor,-5.,5.),grad_var) for grad_tensor,grad_var in gradients if grad_tensor is not None]
    oprimizer_gradient_clipping=optimizer.apply_gradients(clipped_gradients)
    

In [84]:
# Now we will apply padding. Which means completing a sentence with n words to reach m>n words using <PAD> tags. 
# this is important in the sense that questions and answers must have same length.\

def apply_padding(batch,word2int): #word2int dict maps a word to integer
    max_seq=max([len(sequence) for sequence in batch])
    return([seq+[word2int['<PAD>']]*(max_seq-len(seq)) for seq in batch])

In [85]:
# split data into batches of answer and questions

def split_into_batches(questions,answers,batch_size):
    Qnum_batch=len(questions)//batch_size
    
    for i in range (Qnum_batch):
        Qbatch=questions[i*batch_size:(i+1)*batch_size]
        Abatch=answers[i*batch_size:(i+1)*batch_size]
        if(len(Qbatch)==len(Abatch)):
            paddedQbatch=np.array(apply_padding(Qbatch,questionwordsIDs))
            paddedAbatch=np.array(apply_padding(Abatch,answerwordsIDs))
    yield  paddedQbatch,paddedAbatch
# Return sends a specified value back to its caller whereas Yield can produce a sequence of values. 
# We should use yield when we want to iterate over a sequence, but don’t want to store the entire sequence in memory.
    

In [86]:
# Splitting data (Q&A) into train/dev/test sets.

train_val_split=int(len(SortclQues)*0.15)

training_quest=SortclQues[train_val_split:]
training_answ=SortclAns[train_val_split:]

validation_quest=SortclQues[0:train_val_split]
validation_answ=SortclAns[0:train_val_split]

In [None]:
# Training;

check_train_loss=100
check_test_loss=((len(SortclQues)//batch_size//2))-1 # we want to check test loss at half way and end of an epoch so this is half number of batches.
total_training_error=0
list_validation_error=[]
early_stopping_check=0
early_stopping_stop=1000
checkpoint='chatbot_weights.ckpt'
session.run(tf.global_variables_initializer())

for epoch in range(1,epochs+1):
    for batch_index,(paddedQbatch,paddedAbatch) in enumerate(split_into_batches(training_quest,training_answ,batch_size)):
        start=time.time()
        _,batch_train_loss_error=session.run([oprimizer_gradient_clipping,loss_error],{inputs:paddedQbatch,
                                                                                       targets:paddedAbatch,
                                                                                       lr:learning_rate,
                                                                                       sequence_length:paddedQbatch.shape[1],
                                                                                       keep_prob:keep_probability})
        total_training_error+=batch_train_loss_error
        end=time.time()
        train_time=end-start
        
        if batch_index%check_train_loss==0:
            print("Epoch: ",epoch,', Batch: ',batch_index,', Training loss error: ',total_training_error/check_train_loss,', Batch Training time: ',train_time,' seconds.')
            total_training_error=0
        if batch_index%check_test_loss==0 and batch_index > 0:
            total_validation_error=0
            start=time.time()
            for batch_index_validation,(paddedQbatch,paddedAbatch) in enumerate(split_into_batches(validation_quest,validation_answ,batch_size)):
                batch_test_loss_error=session.run({inputs:paddedQbatch,
                                                    targets:paddedAbatch,
                                                    lr:learning_rate,
                                                    sequence_length:paddedQbatch.shape[1]})
                
                total_validation_error+=batch_test_loss_error
            end=time.time()
            test_time=end-start
            avg_validation_error=total_validation_error/len(validation_quest)
            print('Validation loss error: ',avg_validation_error,'Batch validation time:',test_time)
            #implementing learning rate decay.
            learning_rate+=learning_rate_decay
            if learning_rate < min_learning_rate:
                learning_rate=min_learning_rate
            # Now for early stopping:
            list_validation_error.append(avg_validation_error)
            if avg_validation_error < min(list_validation_error):
                print('I speak better now') # Meaning we improved validation loss error. It is smaller than before.
                early_stopping_check = 0
                saver= tf.train.saver()
                saver.save(session,checkpoint) #we defined checkpoint before
            else:
                print('Sorry I do not speak better, I need to practice more!')
                early_stopping_check+=1
                if (early_stopping_check == early_stopping_stop):
                    break
    if (early_stopping_check == early_stopping_stop):
        print("Sorry! I can't speak better anymore, this is my limit!")
        break
print('GAME OVER')

In [92]:
batch_index_check_training_loss = 100
batch_index_check_validation_loss = ((len(training_quest)) // batch_size // 2) - 1
total_training_loss_error = 0
list_validation_loss_error = []
early_stopping_check = 0
early_stopping_stop = 1000
checkpoint = "./chatbot_weights.ckpt" 
session.run(tf.global_variables_initializer())
for epoch in range(1, epochs + 1):
    for batch_index, (padded_questions_in_batch, padded_answers_in_batch) in enumerate(split_into_batches(training_quest, training_answ, batch_size)):
        starting_time = time.time()
        _, batch_training_loss_error = session.run([oprimizer_gradient_clipping, loss_error], {inputs: padded_questions_in_batch,
                                                                                               targets: padded_answers_in_batch,
                                                                                               lr: learning_rate,
                                                                                               sequence_length: padded_answers_in_batch.shape[1],
                                                                                               keep_prob: keep_probability})
        total_training_loss_error += batch_training_loss_error
        ending_time = time.time()
        batch_time = ending_time - starting_time
        if batch_index % batch_index_check_training_loss == 0:
            print('Epoch: {:>3}/{}, Batch: {:>4}/{}, Training Loss Error: {:>6.3f}, Training Time on 100 Batches: {:d} seconds'.format(epoch,
                                                                                                                                       epochs,
                                                                                                                                       batch_index,
                                                                                                                                       len(training_quest) // batch_size,
                                                                                                                                       total_training_loss_error / batch_index_check_training_loss,
                                                                                                                                       int(batch_time * batch_index_check_training_loss)))
            total_training_loss_error = 0
        if batch_index % batch_index_check_validation_loss == 0 and batch_index > 0:
            total_validation_loss_error = 0
            starting_time = time.time()
            for batch_index_validation, (padded_questions_in_batch, padded_answers_in_batch) in enumerate(split_into_batches(validation_questions, validation_answers, batch_size)):
                batch_validation_loss_error = session.run(loss_error, {inputs: padded_questions_in_batch,
                                                                       targets: padded_answers_in_batch,
                                                                       lr: learning_rate,
                                                                       sequence_length: padded_answers_in_batch.shape[1],
                                                                       keep_prob: 1})
                total_validation_loss_error += batch_validation_loss_error
            ending_time = time.time()
            batch_time = ending_time - starting_time
            average_validation_loss_error = total_validation_loss_error / (len(validation_questions) / batch_size)
            print('Validation Loss Error: {:>6.3f}, Batch Validation Time: {:d} seconds'.format(average_validation_loss_error, int(batch_time)))
            learning_rate *= learning_rate_decay
            if learning_rate < min_learning_rate:
                learning_rate = min_learning_rate
            list_validation_loss_error.append(average_validation_loss_error)
            if average_validation_loss_error <= min(list_validation_loss_error):
                print('I speak better now!!')
                early_stopping_check = 0
                saver = tf.train.Saver()
                saver.save(session, checkpoint)
            else:
                print("Sorry I do not speak better, I need to practice more.")
                early_stopping_check += 1
                if early_stopping_check == early_stopping_stop:
                    break
    if early_stopping_check == early_stopping_stop:
        print("My apologies, I cannot speak better anymore. This is the best I can do.")
        break
print("Game Over")

Epoch:   1/100, Batch:    0/2698, Training Loss Error:  0.090, Training Time on 100 Batches: 705 seconds
Epoch:   2/100, Batch:    0/2698, Training Loss Error:  0.179, Training Time on 100 Batches: 725 seconds
Epoch:   3/100, Batch:    0/2698, Training Loss Error:  0.150, Training Time on 100 Batches: 746 seconds
Epoch:   4/100, Batch:    0/2698, Training Loss Error:  0.117, Training Time on 100 Batches: 709 seconds
Epoch:   5/100, Batch:    0/2698, Training Loss Error:  0.114, Training Time on 100 Batches: 775 seconds
Epoch:   6/100, Batch:    0/2698, Training Loss Error:  0.150, Training Time on 100 Batches: 717 seconds
Epoch:   7/100, Batch:    0/2698, Training Loss Error:  0.115, Training Time on 100 Batches: 740 seconds
Epoch:   8/100, Batch:    0/2698, Training Loss Error:  0.115, Training Time on 100 Batches: 825 seconds
Epoch:   9/100, Batch:    0/2698, Training Loss Error:  0.105, Training Time on 100 Batches: 875 seconds
Epoch:  10/100, Batch:    0/2698, Training Loss Error: 

Epoch:  79/100, Batch:    0/2698, Training Loss Error:  0.048, Training Time on 100 Batches: 2185 seconds
Epoch:  80/100, Batch:    0/2698, Training Loss Error:  0.048, Training Time on 100 Batches: 2323 seconds
Epoch:  81/100, Batch:    0/2698, Training Loss Error:  0.048, Training Time on 100 Batches: 2309 seconds
Epoch:  82/100, Batch:    0/2698, Training Loss Error:  0.048, Training Time on 100 Batches: 2316 seconds
Epoch:  83/100, Batch:    0/2698, Training Loss Error:  0.048, Training Time on 100 Batches: 2314 seconds
Epoch:  84/100, Batch:    0/2698, Training Loss Error:  0.048, Training Time on 100 Batches: 2425 seconds
Epoch:  85/100, Batch:    0/2698, Training Loss Error:  0.048, Training Time on 100 Batches: 2227 seconds
Epoch:  86/100, Batch:    0/2698, Training Loss Error:  0.048, Training Time on 100 Batches: 2307 seconds
Epoch:  87/100, Batch:    0/2698, Training Loss Error:  0.048, Training Time on 100 Batches: 2396 seconds
Epoch:  88/100, Batch:    0/2698, Training Los