# ChatBot References

Professor Smyth's website discussing projects

https://www.ics.uci.edu/~smyth/courses/cs175/project_reading.html

Chatbot resources on website

https://pytorch.org/tutorials/beginner/chatbot_tutorial.html

https://web.stanford.edu/~jurafsky/slp3/26.pdf

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/12/CortanaLUDialog-FromSLTproceedings.pdf

Dataset for tutorial

https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html

# Start of Tutorial

### Import necessary libraries

In [1]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals

import torch
from torch.jit import script, trace
import torch.nn as nn
from torch import optim
import torch.nn.functional as F
import csv
import random
import re
import os
import unicodedata
import codecs
from io import open
import itertools
import math


USE_CUDA = torch.cuda.is_available()
device = torch.device("cuda" if USE_CUDA else "cpu")

### Preprocess Data

Looking at some files in the data to see how they are structured ...

In [2]:
corpus_name = "cornell movie-dialogs corpus"
corpus = os.path.join("data", corpus_name)

def printLines(file, n=10):
    with open(file, 'rb') as datafile:
        lines = datafile.readlines()
    for line in lines[:n]:
        print(line)

printLines(os.path.join(corpus, "movie_lines.txt"))

b'L1045 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ They do not!\n'
b'L1044 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ They do to!\n'
b'L985 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ I hope so.\n'
b'L984 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ She okay?\n'
b"L925 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ Let's go.\n"
b'L924 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ Wow\n'
b"L872 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ Okay -- you're gonna need to learn how to lie.\n"
b'L871 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ No\n'
b'L870 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ I\'m kidding.  You know how sometimes you just become this "persona"?  And you don\'t know how to quit?\n'
b'L869 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ Like my fear of wearing pastels?\n'


Scan through data files and reformat them to be (sentence, response) pairs separated by a tab eventually. First we create helper functions to help clean up the data a bit and make it more readable ...

In [3]:
# Splits each line of the file into a dictionary of fields
def loadLines(fileName, fields):
    lines = {}
    with open(fileName, 'r', encoding='iso-8859-1') as f:
        for line in f:
            values = line.split(" +++$+++ ")
            # Extract fields
            lineObj = {}
            for i, field in enumerate(fields):
                lineObj[field] = values[i]
            lines[lineObj['lineID']] = lineObj
    return lines


# Groups fields of lines from `loadLines` into conversations based on *movie_conversations.txt*
def loadConversations(fileName, lines, fields):
    conversations = []
    with open(fileName, 'r', encoding='iso-8859-1') as f:
        for line in f:
            values = line.split(" +++$+++ ")
            # Extract fields
            convObj = {}
            for i, field in enumerate(fields):
                convObj[field] = values[i]
            # Convert string to list (convObj["utteranceIDs"] == "['L598485', 'L598486', ...]")
            utterance_id_pattern = re.compile('L[0-9]+')
            lineIds = utterance_id_pattern.findall(convObj["utteranceIDs"])
            # Reassemble lines
            convObj["lines"] = []
            for lineId in lineIds:
                convObj["lines"].append(lines[lineId])
            conversations.append(convObj)
    return conversations


# Extracts pairs of sentences from conversations
def extractSentencePairs(conversations):
    qa_pairs = []
    for conversation in conversations:
        # Iterate over all the lines of the conversation
        for i in range(len(conversation["lines"]) - 1):  # We ignore the last line (no answer for it)
            inputLine = conversation["lines"][i]["text"].strip()
            targetLine = conversation["lines"][i+1]["text"].strip()
            # Filter wrong samples (if one of the lists is empty)
            if inputLine and targetLine:
                qa_pairs.append([inputLine, targetLine])
    return qa_pairs

Next we will be using the functions above we now parse the data into a form that is useful to us and see what a few lines from the data look like ...

In [4]:
# Define path to new file
datafile = os.path.join(corpus, "formatted_movie_lines.txt")

delimiter = '\t'
# Unescape the delimiter
delimiter = str(codecs.decode(delimiter, "unicode_escape"))

# Initialize lines dict, conversations list, and field ids
lines = {}
conversations = []
MOVIE_LINES_FIELDS = ["lineID", "characterID", "movieID", "character", "text"]
MOVIE_CONVERSATIONS_FIELDS = ["character1ID", "character2ID", "movieID", "utteranceIDs"]

# Load lines and process conversations
print("\nProcessing corpus...")
lines = loadLines(os.path.join(corpus, "movie_lines.txt"), MOVIE_LINES_FIELDS)
print("\nLoading conversations...")
conversations = loadConversations(os.path.join(corpus, "movie_conversations.txt"),
                                  lines, MOVIE_CONVERSATIONS_FIELDS)

# Write new csv file
print("\nWriting newly formatted file...")
with open(datafile, 'w', encoding='utf-8') as outputfile:
    writer = csv.writer(outputfile, delimiter=delimiter, lineterminator='\n')
    for pair in extractSentencePairs(conversations):
        writer.writerow(pair)

# Print a sample of lines
print("\nSample lines from file:")
printLines(datafile)


Processing corpus...

Loading conversations...

Writing newly formatted file...

Sample lines from file:
b"Can we make this quick?  Roxanne Korrine and Andrew Barrett are having an incredibly horrendous public break- up on the quad.  Again.\tWell, I thought we'd start with pronunciation, if that's okay with you.\r\n"
b"Well, I thought we'd start with pronunciation, if that's okay with you.\tNot the hacking and gagging and spitting part.  Please.\r\n"
b"Not the hacking and gagging and spitting part.  Please.\tOkay... then how 'bout we try out some French cuisine.  Saturday?  Night?\r\n"
b"You're asking me out.  That's so cute. What's your name again?\tForget it.\r\n"
b"No, no, it's my fault -- we didn't have a proper introduction ---\tCameron.\r\n"
b"Cameron.\tThe thing is, Cameron -- I'm at the mercy of a particularly hideous breed of loser.  My sister.  I can't date until she does.\r\n"
b"The thing is, Cameron -- I'm at the mercy of a particularly hideous breed of loser.  My sister. 

We want to create a vocabulary of words that we see. We represent these words numerically using the index of their first appearance in the history of all added words to the set of vocuabulary. See `addWord()` for specifics.

In [5]:
# Default word tokens
PAD_token = 0  # Used for padding short sentences
SOS_token = 1  # Start-of-sentence token
EOS_token = 2  # End-of-sentence token

class Voc:
    def __init__(self, name):
        self.name = name
        self.trimmed = False
        self.word2index = {}
        self.word2count = {}
        self.index2word = {PAD_token: "PAD", SOS_token: "SOS", EOS_token: "EOS"}
        self.num_words = 3  # Count SOS, EOS, PAD

    def addSentence(self, sentence):
        for word in sentence.split(' '):
            self.addWord(word)

    def addWord(self, word):
        if word not in self.word2index:
            self.word2index[word] = self.num_words
            self.word2count[word] = 1
            self.index2word[self.num_words] = word
            self.num_words += 1
        else:
            self.word2count[word] += 1

    # Remove words below a certain count threshold
    def trim(self, min_count):
        if self.trimmed:
            return
        self.trimmed = True

        keep_words = []

        for k, v in self.word2count.items():
            if v >= min_count:
                keep_words.append(k)

        print('keep_words {} / {} = {:.4f}'.format(
            len(keep_words), len(self.word2index), len(keep_words) / len(self.word2index)
        ))

        # Reinitialize dictionaries
        self.word2index = {}
        self.word2count = {}
        self.index2word = {PAD_token: "PAD", SOS_token: "SOS", EOS_token: "EOS"}
        self.num_words = 3 # Count default tokens

        for word in keep_words:
            self.addWord(word)

Now we finally getting to the part where we begin to make the (sentence, response) pairs. We begin preprocessing by converting the Unicode string texts to ASCII `unicodeToAscii()`. We also make everything lower case, and remove nonletters except for basic punctuation `normalizeString()`. The last thing that is done in preprocessing is ignore sentences beyond a certain length to aid in training `filterPair()` and `MAX_LENGTH`.

In [6]:
MAX_LENGTH = 10  # Maximum sentence length to consider

# Turn a Unicode string to plain ASCII, thanks to
# https://stackoverflow.com/a/518232/2809427
def unicodeToAscii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
    )

# Lowercase, trim, and remove non-letter characters
def normalizeString(s):
    s = unicodeToAscii(s.lower().strip())
    s = re.sub(r"([.!?])", r" \1", s)
    s = re.sub(r"[^a-zA-Z.!?]+", r" ", s)
    s = re.sub(r"\s+", r" ", s).strip()
    return s

# Read query/response pairs and return a voc object
def readVocs(datafile, corpus_name):
    print("Reading lines...")
    # Read the file and split into lines
    lines = open(datafile, encoding='utf-8').\
        read().strip().split('\n')
    # Split every line into pairs and normalize
    pairs = [[normalizeString(s) for s in l.split('\t')] for l in lines]
    voc = Voc(corpus_name)
    return voc, pairs

# Returns True iff both sentences in a pair 'p' are under the MAX_LENGTH threshold
def filterPair(p):
    # Input sequences need to preserve the last word for EOS token
    return len(p[0].split(' ')) < MAX_LENGTH and len(p[1].split(' ')) < MAX_LENGTH

# Filter pairs using filterPair condition
def filterPairs(pairs):
    return [pair for pair in pairs if filterPair(pair)]

# Using the functions defined above, return a populated voc object and pairs list
def loadPrepareData(corpus, corpus_name, datafile, save_dir):
    print("Start preparing training data ...")
    voc, pairs = readVocs(datafile, corpus_name)
    print("Read {!s} sentence pairs".format(len(pairs)))
    pairs = filterPairs(pairs)
    print("Trimmed to {!s} sentence pairs".format(len(pairs)))
    print("Counting words...")
    for pair in pairs:
        voc.addSentence(pair[0])
        voc.addSentence(pair[1])
    print("Counted words:", voc.num_words)
    return voc, pairs


# Load/Assemble voc and pairs
save_dir = os.path.join("data", "save")
voc, pairs = loadPrepareData(corpus, corpus_name, datafile, save_dir)
# Print some pairs to validate
print("\npairs:")
for pair in pairs[:10]:
    print(pair)

Start preparing training data ...
Reading lines...
Read 221282 sentence pairs
Trimmed to 64271 sentence pairs
Counting words...
Counted words: 18008

pairs:
['there .', 'where ?']
['you have my word . as a gentleman', 'you re sweet .']
['hi .', 'looks like things worked out tonight huh ?']
['you know chastity ?', 'i believe we share an art instructor']
['have fun tonight ?', 'tons']
['well no . . .', 'then that s all you had to say .']
['then that s all you had to say .', 'but']
['but', 'you always been this selfish ?']
['do you listen to this crap ?', 'what crap ?']
['what good stuff ?', 'the real you .']


Another way to speed up training is by removing words that are rarely used. We "trim" these words using `Voc.trim()` and as a result must also remove (sentence, response) pairs that include these words.

In [7]:
MIN_COUNT = 10    # Minimum word count threshold for trimming

def trimRareWords(voc, pairs, MIN_COUNT):
    # Trim words used under the MIN_COUNT from the voc
    voc.trim(MIN_COUNT)
    # Filter out pairs with trimmed words
    keep_pairs = []
    for pair in pairs:
        input_sentence = pair[0]
        output_sentence = pair[1]
        keep_input = True
        keep_output = True
        # Check input sentence
        for word in input_sentence.split(' '):
            if word not in voc.word2index:
                keep_input = False
                break
        # Check output sentence
        for word in output_sentence.split(' '):
            if word not in voc.word2index:
                keep_output = False
                break

        # Only keep pairs that do not contain trimmed word(s) in their input or output sentence
        if keep_input and keep_output:
            keep_pairs.append(pair)

    print("Trimmed from {} pairs to {}, {:.4f} of total".format(len(pairs), len(keep_pairs), len(keep_pairs) / len(pairs)))
    return keep_pairs


# Trim voc and pairs
pairs = trimRareWords(voc, pairs, MIN_COUNT)

keep_words 2869 / 18005 = 0.1593
Trimmed from 64271 pairs to 38681, 0.6018 of total


### Using our Data on our Models

The data still needs to be processed more. The model that we use will take numerical values and not actual strings to do computation. We convert our sentences to tensors (vectors) that our model will take as inputs. To do this, we just take every sentence and change it to be a vector of index that corresponds to that word. This is how we will use our data to train the model. 

If we would like to train our model we usually do it with minbatches since it makes things faster. We make a matrix of dimensions (BatchLength, MaxLengthOfSentenceInBatch) represented numerically as mentioned in the previous paragraph. With this we make sure that each row (sentence) of our matrix terminates with `EOS_Token` and is followed by 0 entries until the end of the row.

This implementation almost works, the problem with this is that each row is a sentence and every column is a step in time. However, it is better to think of every row as a step in time and the column to be possible words to choose for that step in time. For this reason, we construct the Matrix as mentioned in the previous paragraph except it is now transposed. 

We define some function that help us achieve this ...

In [8]:
def indexesFromSentence(voc, sentence):
    return [voc.word2index[word] for word in sentence.split(' ')] + [EOS_token]


def zeroPadding(l, fillvalue=PAD_token):
    return list(itertools.zip_longest(*l, fillvalue=fillvalue))

def binaryMatrix(l, value=PAD_token):
    m = []
    for i, seq in enumerate(l):
        m.append([])
        for token in seq:
            if token == PAD_token:
                m[i].append(0)
            else:
                m[i].append(1)
    return m

# Returns padded input sequence tensor and lengths
def inputVar(l, voc):
    indexes_batch = [indexesFromSentence(voc, sentence) for sentence in l]
    lengths = torch.tensor([len(indexes) for indexes in indexes_batch])
    padList = zeroPadding(indexes_batch)
    padVar = torch.LongTensor(padList)
    return padVar, lengths

# Returns padded target sequence tensor, padding mask, and max target length
def outputVar(l, voc):
    indexes_batch = [indexesFromSentence(voc, sentence) for sentence in l]
    max_target_len = max([len(indexes) for indexes in indexes_batch])
    padList = zeroPadding(indexes_batch)
    mask = binaryMatrix(padList)
    mask = torch.BoolTensor(mask)
    padVar = torch.LongTensor(padList)
    return padVar, mask, max_target_len

# Returns all items for a given batch of pairs
def batch2TrainData(voc, pair_batch):
    pair_batch.sort(key=lambda x: len(x[0].split(" ")), reverse=True)
    input_batch, output_batch = [], []
    for pair in pair_batch:
        input_batch.append(pair[0])
        output_batch.append(pair[1])
    inp, lengths = inputVar(input_batch, voc)
    output, mask, max_target_len = outputVar(output_batch, voc)
    return inp, lengths, output, mask, max_target_len


# Example for validation
small_batch_size = 5
batches = batch2TrainData(voc, [random.choice(pairs) for _ in range(small_batch_size)])
input_variable, lengths, target_variable, mask, max_target_len = batches

print("input_variable:", input_variable)
print("lengths:", lengths)
print("target_variable:", target_variable)
print("mask:", mask)
print("max_target_len:", max_target_len)

input_variable: tensor([[  65,    7,   45,   35,  798],
        [ 246,  138,    7,   36,    4],
        [ 132,   81,  332,  258,    2],
        [ 582,   51,   81,    4,    0],
        [ 258, 2104,    6,    2,    0],
        [1218,    4,    2,    0,    0],
        [2147,    2,    0,    0,    0],
        [   4,    0,    0,    0,    0],
        [   2,    0,    0,    0,    0]])
lengths: tensor([9, 7, 6, 5, 3])
target_variable: tensor([[  25,   51,   54,   35,  403],
        [ 130,   48,  710,   36, 1218],
        [ 289,    6,    4,   12, 1219],
        [1317,    2,    2, 2210,    4],
        [ 507,    0,    0,  181,    2],
        [   4,    0,    0,  253,    0],
        [   2,    0,    0,    4,    0],
        [   0,    0,    0,    2,    0]])
mask: tensor([[ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True],
        [ True, False, False,  True,  True],
        [ True, False

Seeing the output from preping the data above, it appears that `input_variable` is the matrix described in the previous secsion. `lenghts` is just a tensor (vector) of how long the sentences in the input were. This will eventually be used for the **decoder** later in the program. `target_variable` seems to be the response that our model is supposed to learn. `mask` looks like it is a mask for responses where true is that there is a word and false is that there is no word there. Not sure why this variable is necessary but maybe it will become clearer at some point in the tutorial. 

### Defining our Model

Our base model is a Sequence-to-Sequence model using two RNN's. The first RNN is what is called the **encoder** and it takes a variable length input and converts it into a *fixed length* "context" vector that is intended to hold onto some semantic meaning of the input. The **decoder** takes the context vector provided by the decoder along with an input word to guess the next word in a sequence. I suppose our (sentence, response) pairs are learned as one long continuous sequence where the sentence is the start of hte sequence and the response is the remainder of the sequence, though this might be entirely incorrect.

Some discussion on the Encoder. The encoder uses a bidirectional GRU (Gated Recurrent Unit), meaning that there is basically two RNN's that make up the encoder, one that goes through the sequence of data in the forward direction while another RNN goes through the sequence in the backward direction. At each time step, both RNN's produce an output and a hidden state vector. At each time step, the output of both RNNs are summed and the output is recorded (somewhere) while the hidden state vectors are pushed along and used in the next step of the RNN. The outputs are summed makes it so that at each time step, the RNN is considering present and future context. 

In [9]:
class EncoderRNN(nn.Module):
    def __init__(self, hidden_size, embedding, n_layers=1, dropout=0):
        super(EncoderRNN, self).__init__()
        self.n_layers = n_layers
        self.hidden_size = hidden_size
        self.embedding = embedding

        # Initialize GRU; the input_size and hidden_size params are both set to 'hidden_size'
        #   because our input size is a word embedding with number of features == hidden_size
        self.gru = nn.LSTM(hidden_size,hidden_size, n_layers,
                          dropout=(0 if n_layers == 1 else dropout), bidirectional=True)

    def forward(self, input_seq, input_lengths, hidden=None):
        # Convert word indexes to embeddings
        embedded = self.embedding(input_seq)
        # Pack padded batch of sequences for RNN module
        packed = nn.utils.rnn.pack_padded_sequence(embedded, input_lengths)
        # Forward pass through GRU
        outputs, hidden = self.gru(packed, hidden)
        # Unpack padding
        outputs, _ = nn.utils.rnn.pad_packed_sequence(outputs)
        # Sum bidirectional GRU outputs
        outputs = outputs[:, :, :self.hidden_size] + outputs[:, : ,self.hidden_size:]
        # Return output and final hidden state
        return outputs, hidden

Some discussion on the Decoder. The decoder uses the context vector that is produced by the encoder. Usually, this is all that will be used by a decoder to produce output but this can result in a loss of information, expecially when the input sentences are very long. To counter this, the decoder also uses its current hidden state as a way of determining what it should be "paying attention" to. These are refered to as attention weights and are multiplied by the outputs from the encoder (the output that is apperantly recorded somewhere, this is where they are used) from the current time step to rescale the values, making less important parts of the encoder output smaller and more important parts of the encoder output larger. This can be further improved by using all of the encoder outputs instead of just the one of the current time step to have a more comprehensive set of attention weights. This is the method implemented below, followed by the implementation of the decoder using this attention method.

In [10]:
# Luong attention layer
class Attn(nn.Module):
    def __init__(self, method, hidden_size):
        super(Attn, self).__init__()
        self.method = method
        if self.method not in ['dot', 'general', 'concat']:
            raise ValueError(self.method, "is not an appropriate attention method.")
        self.hidden_size = hidden_size
        if self.method == 'general':
            self.attn = nn.Linear(self.hidden_size, hidden_size)
        elif self.method == 'concat':
            self.attn = nn.Linear(self.hidden_size * 2, hidden_size)
            self.v = nn.Parameter(torch.FloatTensor(hidden_size))

    def dot_score(self, hidden, encoder_output):
        return torch.sum(hidden * encoder_output, dim=2)

    def general_score(self, hidden, encoder_output):
        energy = self.attn(encoder_output)
        return torch.sum(hidden * energy, dim=2)

    def concat_score(self, hidden, encoder_output):
        energy = self.attn(torch.cat((hidden.expand(encoder_output.size(0), -1, -1), encoder_output), 2)).tanh()
        return torch.sum(self.v * energy, dim=2)

    def forward(self, hidden, encoder_outputs):
        # Calculate the attention weights (energies) based on the given method
        if self.method == 'general':
            attn_energies = self.general_score(hidden, encoder_outputs)
        elif self.method == 'concat':
            attn_energies = self.concat_score(hidden, encoder_outputs)
        elif self.method == 'dot':
            attn_energies = self.dot_score(hidden, encoder_outputs)

        # Transpose max_length and batch_size dimensions
        attn_energies = attn_energies.t()

        # Return the softmax normalized probability scores (with added dimension)
        return F.softmax(attn_energies, dim=1).unsqueeze(1)

In [11]:
class LuongAttnDecoderRNN(nn.Module):
    def __init__(self, attn_model, embedding, hidden_size, output_size, n_layers=1, dropout=0.1):
        super(LuongAttnDecoderRNN, self).__init__()

        # Keep for reference
        self.attn_model = attn_model
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.n_layers = n_layers
        self.dropout = dropout

        # Define layers
        self.embedding = embedding
        self.embedding_dropout = nn.Dropout(dropout)
        self.gru = nn.LSTM(hidden_size, hidden_size, n_layers, dropout=(0 if n_layers == 1 else dropout))
        self.concat = nn.Linear(hidden_size * 2, hidden_size)
        self.out = nn.Linear(hidden_size, output_size)

        self.attn = Attn(attn_model, hidden_size)

    def forward(self, input_step, last_hidden, encoder_outputs):
        # Note: we run this one step (word) at a time
        # Get embedding of current input word
        embedded = self.embedding(input_step)
        embedded = self.embedding_dropout(embedded)
        # Forward through unidirectional GRU
        rnn_output, hidden = self.gru(embedded, last_hidden)
        # Calculate attention weights from the current GRU output
        attn_weights = self.attn(rnn_output, encoder_outputs)
        # Multiply attention weights to encoder outputs to get new "weighted sum" context vector
        context = attn_weights.bmm(encoder_outputs.transpose(0, 1))
        # Concatenate weighted context vector and GRU output using Luong eq. 5
        rnn_output = rnn_output.squeeze(0)
        context = context.squeeze(1)
        concat_input = torch.cat((rnn_output, context), 1)
        concat_output = torch.tanh(self.concat(concat_input))
        # Predict next word using Luong eq. 6
        output = self.out(concat_output)
        output = F.softmax(output, dim=1)
        # Return output and final hidden state
        return output, hidden

### Defining the Training Procedure

Recall earlier in the `mask` produced along side the `target_variable` was produced for an unknown purpose, it turns out its purpose is for determining the loss of the model. This implementation below is the negative-log-likelihood loss and the variable `mask` along with `target_variable`.

In [12]:
def maskNLLLoss(inp, target, mask):
    nTotal = mask.sum()
    crossEntropy = -torch.log(torch.gather(inp, 1, target.view(-1, 1)).squeeze(1))
    loss = crossEntropy.masked_select(mask).mean()
    loss = loss.to(device)
    return loss, nTotal.item()

For training the model, two tricks are used to help convergence. The first trick is refered to as "teacher forcing" which just overrides the prediction of the decoder and uses the actual target value instead, this happens with some small probability p. Doing it too much will make the decoder unable to make predictions on its own and not having this feature will make it so that convergence is just slower. The second trick they do is gradient clipping, meaning that at areas in the feature space where there is a steep gradient and therefore a chance to improve the model very fast, the magnitude of the gradient is limited some value. This is to not drastically overshoot local minima on "cliffs". The training function is implemented below. This trains only one iteration.

In [13]:
def train(input_variable, lengths, target_variable, mask, max_target_len, encoder, decoder, embedding,
          encoder_optimizer, decoder_optimizer, batch_size, clip, max_length=MAX_LENGTH):

    # Zero gradients
    encoder_optimizer.zero_grad()
    decoder_optimizer.zero_grad()

    # Set device options
    input_variable = input_variable.to(device)
    target_variable = target_variable.to(device)
    mask = mask.to(device)
    # Lengths for rnn packing should always be on the cpu
    lengths = lengths.to("cpu")

    # Initialize variables
    loss = 0
    print_losses = []
    n_totals = 0

    # Forward pass through encoder
    encoder_outputs, encoder_hidden = encoder(input_variable, lengths)

    # Create initial decoder input (start with SOS tokens for each sentence)
    decoder_input = torch.LongTensor([[SOS_token for _ in range(batch_size)]])
    decoder_input = decoder_input.to(device)

    # Set initial decoder hidden state to the encoder's final hidden state
        #Get the encoder final h_hidden_state
    encoder_h_hidden, encoder_c_hidden = encoder_hidden
    decoder_h_hidden = encoder_h_hidden[:decoder.n_layers]
    decoder_c_hidden = encoder_c_hidden[:decoder.n_layers]
    # Recombine the final hidden states as hidden tuple
    decoder_hidden = (decoder_h_hidden, decoder_c_hidden)

    # Determine if we are using teacher forcing this iteration
    use_teacher_forcing = True if random.random() < teacher_forcing_ratio else False

    # Forward batch of sequences through decoder one time step at a time
    if use_teacher_forcing:
        for t in range(max_target_len):
            decoder_output, decoder_hidden = decoder(
                decoder_input, decoder_hidden, encoder_outputs
            )
            # Teacher forcing: next input is current target
            decoder_input = target_variable[t].view(1, -1)
            # Calculate and accumulate loss
            mask_loss, nTotal = maskNLLLoss(decoder_output, target_variable[t], mask[t])
            loss += mask_loss
            print_losses.append(mask_loss.item() * nTotal)
            n_totals += nTotal
    else:
        for t in range(max_target_len):
            decoder_output, decoder_hidden = decoder(
                decoder_input, decoder_hidden, encoder_outputs
            )
            # No teacher forcing: next input is decoder's own current output
            _, topi = decoder_output.topk(1)
            decoder_input = torch.LongTensor([[topi[i][0] for i in range(batch_size)]])
            decoder_input = decoder_input.to(device)
            # Calculate and accumulate loss
            mask_loss, nTotal = maskNLLLoss(decoder_output, target_variable[t], mask[t])
            loss += mask_loss
            print_losses.append(mask_loss.item() * nTotal)
            n_totals += nTotal

    # Perform backpropatation
    loss.backward()

    # Clip gradients: gradients are modified in place
    _ = nn.utils.clip_grad_norm_(encoder.parameters(), clip)
    _ = nn.utils.clip_grad_norm_(decoder.parameters(), clip)

    # Adjust model weights
    encoder_optimizer.step()
    decoder_optimizer.step()

    return sum(print_losses) / n_totals


This version of the training function trains on multiple iterations. It is built on the previous training function. This function also saves the current variables in a tarball file. This is to continue training at a later period or just use the current accumulated training to make predictions.

In [14]:
def trainIters(model_name, voc, pairs, encoder, decoder, encoder_optimizer, decoder_optimizer, embedding, 
               encoder_n_layers, decoder_n_layers, save_dir, n_iteration, batch_size, print_every, save_every, 
               clip, corpus_name, loadFilename):

    # Load batches for each iteration
    training_batches = [batch2TrainData(voc, [random.choice(pairs) for _ in range(batch_size)])
                      for _ in range(n_iteration)]

    # Initializations
    print('Initializing ...')
    start_iteration = 1
    print_loss = 0
    if loadFilename:
        start_iteration = checkpoint['iteration'] + 1

    # Training loop
    print("Training...")
    for iteration in range(start_iteration, n_iteration + 1):
        training_batch = training_batches[iteration - 1]
        # Extract fields from batch
        input_variable, lengths, target_variable, mask, max_target_len = training_batch

        # Run a training iteration with batch
        loss = train(input_variable, lengths, target_variable, mask, max_target_len, encoder,
                     decoder, embedding, encoder_optimizer, decoder_optimizer, batch_size, clip)
        print_loss += loss

        # Print progress
        if iteration % print_every == 0:
            print_loss_avg = print_loss / print_every
            print("Iteration: {}; Percent complete: {:.1f}%; Average loss: {:.4f}".format(
                iteration, iteration / n_iteration * 100, print_loss_avg
            ))
            print_loss = 0

        # Save checkpoint
        if (iteration % save_every == 0):
            directory = os.path.join(save_dir, model_name, corpus_name, '{}-{}_{}'.format(
                encoder_n_layers, decoder_n_layers, hidden_size
            ))
            if not os.path.exists(directory):
                os.makedirs(directory)
            torch.save({
                'iteration': iteration,
                'en': encoder.state_dict(),
                'de': decoder.state_dict(),
                'en_opt': encoder_optimizer.state_dict(),
                'de_opt': decoder_optimizer.state_dict(),
                'loss': loss,
                'voc_dict': voc.__dict__,
                'embedding': embedding.state_dict()
            }, os.path.join(directory, '{}_{}.tar'.format(iteration, 'checkpoint')))


### Defining Evaluation

This is the training method used by our bot when we are not using the "teacher forcing" method. This just basically chooses the output of hte decoder to be the output with the highest softmax score, ie the best response given its prior training. I think this is defining a decoder method for whenever we are actually using our model instead of when it is training.

In [15]:
class GreedySearchDecoder(nn.Module):
    def __init__(self, encoder, decoder):
        super(GreedySearchDecoder, self).__init__()
        self.encoder = encoder
        self.decoder = decoder

    def forward(self, input_seq, input_length, max_length):
        # Forward input through encoder model
        encoder_outputs, encoder_hidden = self.encoder(input_seq, input_length)
        # Prepare encoder's final hidden layer to be first hidden input to the decoder
        #Get the encoder final h_hidden_state
        encoder_h_hidden, encoder_c_hidden= encoder_hidden
        decoder_h_hidden = encoder_h_hidden[:decoder.n_layers]
        decoder_c_hidden = encoder_c_hidden[:decoder.n_layers]
        # Recombine the final hidden states as hidden tuple
        decoder_hidden = (decoder_h_hidden, decoder_c_hidden)
        # Initialize decoder input with SOS_token
        decoder_input = torch.ones(1, 1, device=device, dtype=torch.long) * SOS_token
        # Initialize tensors to append decoded words to
        all_tokens = torch.zeros([0], device=device, dtype=torch.long)
        all_scores = torch.zeros([0], device=device)
        # Iteratively decode one word token at a time
        for _ in range(max_length):
            # Forward pass through decoder
            decoder_output, decoder_hidden = self.decoder(decoder_input, decoder_hidden, encoder_outputs)
            # Obtain most likely word token and its softmax score
            decoder_scores, decoder_input = torch.max(decoder_output, dim=1)
            # Record token and score
            all_tokens = torch.cat((all_tokens, decoder_input), dim=0)
            all_scores = torch.cat((all_scores, decoder_scores), dim=0)
            # Prepare current token to be next decoder input (add a dimension)
            decoder_input = torch.unsqueeze(decoder_input, 0)
        # Return collections of word tokens and scores
        return all_tokens, all_scores

The `evaluateInput` function is meant to take input from the user, prepare it so that it may be fed to our model, and pring out the response from our bot. `evaluate` handles most of hte actual calculation while `evaluateInput` handles most of the user interaction phase. 

In [16]:
def evaluate(encoder, decoder, searcher, voc, sentence, max_length=MAX_LENGTH):
    ### Format input sentence as a batch
    # words -> indexes
    indexes_batch = [indexesFromSentence(voc, sentence)]
    # Create lengths tensor
    lengths = torch.tensor([len(indexes) for indexes in indexes_batch])
    # Transpose dimensions of batch to match models' expectations
    input_batch = torch.LongTensor(indexes_batch).transpose(0, 1)
    # Use appropriate device
    input_batch = input_batch.to(device)
    #lengths = lengths.to(device) # removed bc of gpu cpu tensor difference ###################################
    # Decode sentence with searcher
    tokens, scores = searcher(input_batch, lengths, max_length)
    # indexes -> words
    decoded_words = [voc.index2word[token.item()] for token in tokens]
    return decoded_words


def evaluateInput(encoder, decoder, searcher, voc):
    input_sentence = ''
    while(1):
        try:
            # Get input sentence
            input_sentence = input('> ')
            # Check if it is quit case
            if input_sentence == 'q' or input_sentence == 'quit': break
            # Normalize sentence
            input_sentence = normalizeString(input_sentence)
            # Evaluate sentence
            output_words = evaluate(encoder, decoder, searcher, voc, input_sentence)
            # Format and print response sentence
            output_words[:] = [x for x in output_words if not (x == 'EOS' or x == 'PAD')]
            print('Bot:', ' '.join(output_words), len(output_words))

        except KeyError:
            print("Error: Encountered unknown word.")


### Running the Model

This defines and builds our model based on the following parameters. Notice that parts of the code are commented out as these are other possibilities ofr initialization. attn_model can be given one of three different models. We can also choose to load a model that we previously trained.

In [17]:
# Configure models
model_name = 'cb_model'
attn_model = 'dot'
# attn_model = 'general'
# attn_model = 'concat'
hidden_size = 500
encoder_n_layers = 2
decoder_n_layers = 2
dropout = 0.1
batch_size = 64

# Set checkpoint to load from; set to None if starting from scratch
loadFilename = None
checkpoint_iter = 4000
#loadFilename = os.path.join(save_dir, model_name, corpus_name,
#                            '{}-{}_{}'.format(encoder_n_layers, decoder_n_layers, hidden_size),
#                            '{}_checkpoint.tar'.format(checkpoint_iter))


# Load model if a loadFilename is provided
if loadFilename:
    # If loading on same machine the model was trained on
    checkpoint = torch.load(loadFilename)
    # If loading a model trained on GPU to CPU
    #checkpoint = torch.load(loadFilename, map_location=torch.device('cpu'))
    encoder_sd = checkpoint['en']
    decoder_sd = checkpoint['de']
    encoder_optimizer_sd = checkpoint['en_opt']
    decoder_optimizer_sd = checkpoint['de_opt']
    embedding_sd = checkpoint['embedding']
    voc.__dict__ = checkpoint['voc_dict']


print('Building encoder and decoder ...')
# Initialize word embeddings
embedding = nn.Embedding(voc.num_words, hidden_size)
if loadFilename:
    embedding.load_state_dict(embedding_sd)
# Initialize encoder & decoder models
encoder = EncoderRNN(hidden_size, embedding, encoder_n_layers, dropout)
decoder = LuongAttnDecoderRNN(attn_model, embedding, hidden_size, voc.num_words, decoder_n_layers, dropout)
if loadFilename:
    encoder.load_state_dict(encoder_sd)
    decoder.load_state_dict(decoder_sd)
# Use appropriate device
encoder = encoder.to(device)
decoder = decoder.to(device)
print('Models built and ready to go!')

Building encoder and decoder ...
Models built and ready to go!


Now we train our model using `trainIters`. We first set some training parameters and initialize optimizers and prepare our model for training. Training is likely to take a very long time, thankfully there is the option to load from a previous model and this piece of code here doesn't need to be run every time.

In [18]:
# Safty precaution so that we dont accidentally begin training a model we didnt want to train
TRAIN_THE_MODEL = True

In [19]:
if TRAIN_THE_MODEL:
    
    # Configure training/optimization
    clip = 50.0
    teacher_forcing_ratio = 1.0
    learning_rate = 0.0001
    decoder_learning_ratio = 5.0
    n_iteration = 5000
    print_every = 1
    save_every = 2000

    # Ensure dropout layers are in train mode
    encoder.train()
    decoder.train()

    # Initialize optimizers
    print('Building optimizers ...')
    encoder_optimizer = optim.Adam(encoder.parameters(), lr=learning_rate)
    decoder_optimizer = optim.Adam(decoder.parameters(), lr=learning_rate * decoder_learning_ratio)
    if loadFilename:
        encoder_optimizer.load_state_dict(encoder_optimizer_sd)
        decoder_optimizer.load_state_dict(decoder_optimizer_sd)

    # If you have cuda, configure cuda to call
    for state in encoder_optimizer.state.values():
        for k, v in state.items():
            if isinstance(v, torch.Tensor):
                state[k] = v.cuda()
    
    for state in decoder_optimizer.state.values():
        for k, v in state.items():
            if isinstance(v, torch.Tensor):
                state[k] = v.cuda()
    
    # Run training iterations
    print("Starting Training!")
    trainIters(model_name, voc, pairs, encoder, decoder, encoder_optimizer, decoder_optimizer,
               embedding, encoder_n_layers, decoder_n_layers, save_dir, n_iteration, batch_size,
               print_every, save_every, clip, corpus_name, loadFilename)
    
    TRAIN_THE_MODEL = False

else:
    
    print(f"Make the variable 'TRAIN_THE_MODEL' in the code block above True and run the code block")
    print(f"Value of 'TRAIN_THE_MODEL' currently: {TRAIN_THE_MODEL}")


Building optimizers ...
Starting Training!
Initializing ...
Training...
Iteration: 1; Percent complete: 0.0%; Average loss: 7.9593
Iteration: 2; Percent complete: 0.0%; Average loss: 7.9163
Iteration: 3; Percent complete: 0.1%; Average loss: 7.8661
Iteration: 4; Percent complete: 0.1%; Average loss: 7.7844
Iteration: 5; Percent complete: 0.1%; Average loss: 7.6542
Iteration: 6; Percent complete: 0.1%; Average loss: 7.3791
Iteration: 7; Percent complete: 0.1%; Average loss: 7.0492
Iteration: 8; Percent complete: 0.2%; Average loss: 6.4180
Iteration: 9; Percent complete: 0.2%; Average loss: 6.1644
Iteration: 10; Percent complete: 0.2%; Average loss: 6.1926
Iteration: 11; Percent complete: 0.2%; Average loss: 5.9613
Iteration: 12; Percent complete: 0.2%; Average loss: 5.7327
Iteration: 13; Percent complete: 0.3%; Average loss: 5.5210
Iteration: 14; Percent complete: 0.3%; Average loss: 5.1009
Iteration: 15; Percent complete: 0.3%; Average loss: 5.2959
Iteration: 16; Percent complete: 0.3%

Iteration: 137; Percent complete: 2.7%; Average loss: 4.0546
Iteration: 138; Percent complete: 2.8%; Average loss: 4.0738
Iteration: 139; Percent complete: 2.8%; Average loss: 4.2626
Iteration: 140; Percent complete: 2.8%; Average loss: 4.0095
Iteration: 141; Percent complete: 2.8%; Average loss: 4.0648
Iteration: 142; Percent complete: 2.8%; Average loss: 4.1093
Iteration: 143; Percent complete: 2.9%; Average loss: 4.2627
Iteration: 144; Percent complete: 2.9%; Average loss: 4.2479
Iteration: 145; Percent complete: 2.9%; Average loss: 3.8643
Iteration: 146; Percent complete: 2.9%; Average loss: 4.1709
Iteration: 147; Percent complete: 2.9%; Average loss: 4.0260
Iteration: 148; Percent complete: 3.0%; Average loss: 4.1082
Iteration: 149; Percent complete: 3.0%; Average loss: 4.1793
Iteration: 150; Percent complete: 3.0%; Average loss: 4.1311
Iteration: 151; Percent complete: 3.0%; Average loss: 4.1266
Iteration: 152; Percent complete: 3.0%; Average loss: 4.0362
Iteration: 153; Percent 

Iteration: 273; Percent complete: 5.5%; Average loss: 3.6444
Iteration: 274; Percent complete: 5.5%; Average loss: 4.0186
Iteration: 275; Percent complete: 5.5%; Average loss: 3.9116
Iteration: 276; Percent complete: 5.5%; Average loss: 3.8258
Iteration: 277; Percent complete: 5.5%; Average loss: 4.0779
Iteration: 278; Percent complete: 5.6%; Average loss: 3.6664
Iteration: 279; Percent complete: 5.6%; Average loss: 3.7911
Iteration: 280; Percent complete: 5.6%; Average loss: 3.9515
Iteration: 281; Percent complete: 5.6%; Average loss: 3.5568
Iteration: 282; Percent complete: 5.6%; Average loss: 3.8039
Iteration: 283; Percent complete: 5.7%; Average loss: 3.9449
Iteration: 284; Percent complete: 5.7%; Average loss: 3.5479
Iteration: 285; Percent complete: 5.7%; Average loss: 3.8812
Iteration: 286; Percent complete: 5.7%; Average loss: 3.9106
Iteration: 287; Percent complete: 5.7%; Average loss: 3.8936
Iteration: 288; Percent complete: 5.8%; Average loss: 3.6509
Iteration: 289; Percent 

Iteration: 408; Percent complete: 8.2%; Average loss: 3.3500
Iteration: 409; Percent complete: 8.2%; Average loss: 3.5003
Iteration: 410; Percent complete: 8.2%; Average loss: 3.7070
Iteration: 411; Percent complete: 8.2%; Average loss: 3.6201
Iteration: 412; Percent complete: 8.2%; Average loss: 3.4974
Iteration: 413; Percent complete: 8.3%; Average loss: 3.6167
Iteration: 414; Percent complete: 8.3%; Average loss: 3.6288
Iteration: 415; Percent complete: 8.3%; Average loss: 3.4178
Iteration: 416; Percent complete: 8.3%; Average loss: 3.4333
Iteration: 417; Percent complete: 8.3%; Average loss: 3.8205
Iteration: 418; Percent complete: 8.4%; Average loss: 3.6484
Iteration: 419; Percent complete: 8.4%; Average loss: 3.4927
Iteration: 420; Percent complete: 8.4%; Average loss: 3.4783
Iteration: 421; Percent complete: 8.4%; Average loss: 3.8243
Iteration: 422; Percent complete: 8.4%; Average loss: 3.7140
Iteration: 423; Percent complete: 8.5%; Average loss: 3.6752
Iteration: 424; Percent 

Iteration: 541; Percent complete: 10.8%; Average loss: 3.3046
Iteration: 542; Percent complete: 10.8%; Average loss: 3.2824
Iteration: 543; Percent complete: 10.9%; Average loss: 3.5113
Iteration: 544; Percent complete: 10.9%; Average loss: 3.4473
Iteration: 545; Percent complete: 10.9%; Average loss: 3.3963
Iteration: 546; Percent complete: 10.9%; Average loss: 3.5689
Iteration: 547; Percent complete: 10.9%; Average loss: 3.5047
Iteration: 548; Percent complete: 11.0%; Average loss: 3.4491
Iteration: 549; Percent complete: 11.0%; Average loss: 3.4484
Iteration: 550; Percent complete: 11.0%; Average loss: 3.4699
Iteration: 551; Percent complete: 11.0%; Average loss: 3.4890
Iteration: 552; Percent complete: 11.0%; Average loss: 3.6175
Iteration: 553; Percent complete: 11.1%; Average loss: 3.4402
Iteration: 554; Percent complete: 11.1%; Average loss: 3.6675
Iteration: 555; Percent complete: 11.1%; Average loss: 3.2820
Iteration: 556; Percent complete: 11.1%; Average loss: 3.4926
Iteratio

Iteration: 674; Percent complete: 13.5%; Average loss: 3.3318
Iteration: 675; Percent complete: 13.5%; Average loss: 3.3817
Iteration: 676; Percent complete: 13.5%; Average loss: 3.5271
Iteration: 677; Percent complete: 13.5%; Average loss: 3.4480
Iteration: 678; Percent complete: 13.6%; Average loss: 3.6226
Iteration: 679; Percent complete: 13.6%; Average loss: 3.4038
Iteration: 680; Percent complete: 13.6%; Average loss: 3.4371
Iteration: 681; Percent complete: 13.6%; Average loss: 3.3671
Iteration: 682; Percent complete: 13.6%; Average loss: 3.3567
Iteration: 683; Percent complete: 13.7%; Average loss: 3.5549
Iteration: 684; Percent complete: 13.7%; Average loss: 3.4314
Iteration: 685; Percent complete: 13.7%; Average loss: 3.4826
Iteration: 686; Percent complete: 13.7%; Average loss: 3.4244
Iteration: 687; Percent complete: 13.7%; Average loss: 3.5411
Iteration: 688; Percent complete: 13.8%; Average loss: 3.2513
Iteration: 689; Percent complete: 13.8%; Average loss: 3.3512
Iteratio

Iteration: 807; Percent complete: 16.1%; Average loss: 3.4973
Iteration: 808; Percent complete: 16.2%; Average loss: 3.4568
Iteration: 809; Percent complete: 16.2%; Average loss: 3.4920
Iteration: 810; Percent complete: 16.2%; Average loss: 3.4444
Iteration: 811; Percent complete: 16.2%; Average loss: 3.0338
Iteration: 812; Percent complete: 16.2%; Average loss: 3.3290
Iteration: 813; Percent complete: 16.3%; Average loss: 3.3955
Iteration: 814; Percent complete: 16.3%; Average loss: 3.3141
Iteration: 815; Percent complete: 16.3%; Average loss: 3.1997
Iteration: 816; Percent complete: 16.3%; Average loss: 3.2680
Iteration: 817; Percent complete: 16.3%; Average loss: 3.3698
Iteration: 818; Percent complete: 16.4%; Average loss: 3.2751
Iteration: 819; Percent complete: 16.4%; Average loss: 3.3554
Iteration: 820; Percent complete: 16.4%; Average loss: 3.1978
Iteration: 821; Percent complete: 16.4%; Average loss: 3.3333
Iteration: 822; Percent complete: 16.4%; Average loss: 3.5190
Iteratio

Iteration: 941; Percent complete: 18.8%; Average loss: 3.4188
Iteration: 942; Percent complete: 18.8%; Average loss: 3.3069
Iteration: 943; Percent complete: 18.9%; Average loss: 3.3232
Iteration: 944; Percent complete: 18.9%; Average loss: 3.2653
Iteration: 945; Percent complete: 18.9%; Average loss: 3.3742
Iteration: 946; Percent complete: 18.9%; Average loss: 3.1244
Iteration: 947; Percent complete: 18.9%; Average loss: 3.1648
Iteration: 948; Percent complete: 19.0%; Average loss: 3.0755
Iteration: 949; Percent complete: 19.0%; Average loss: 3.1918
Iteration: 950; Percent complete: 19.0%; Average loss: 3.4290
Iteration: 951; Percent complete: 19.0%; Average loss: 3.2581
Iteration: 952; Percent complete: 19.0%; Average loss: 3.4296
Iteration: 953; Percent complete: 19.1%; Average loss: 3.1056
Iteration: 954; Percent complete: 19.1%; Average loss: 3.2203
Iteration: 955; Percent complete: 19.1%; Average loss: 3.4827
Iteration: 956; Percent complete: 19.1%; Average loss: 3.3495
Iteratio

Iteration: 1073; Percent complete: 21.5%; Average loss: 3.3017
Iteration: 1074; Percent complete: 21.5%; Average loss: 3.3503
Iteration: 1075; Percent complete: 21.5%; Average loss: 3.1148
Iteration: 1076; Percent complete: 21.5%; Average loss: 3.1607
Iteration: 1077; Percent complete: 21.5%; Average loss: 3.3203
Iteration: 1078; Percent complete: 21.6%; Average loss: 3.2128
Iteration: 1079; Percent complete: 21.6%; Average loss: 3.4219
Iteration: 1080; Percent complete: 21.6%; Average loss: 3.1439
Iteration: 1081; Percent complete: 21.6%; Average loss: 3.1274
Iteration: 1082; Percent complete: 21.6%; Average loss: 3.3447
Iteration: 1083; Percent complete: 21.7%; Average loss: 3.3455
Iteration: 1084; Percent complete: 21.7%; Average loss: 3.0052
Iteration: 1085; Percent complete: 21.7%; Average loss: 3.3594
Iteration: 1086; Percent complete: 21.7%; Average loss: 3.0725
Iteration: 1087; Percent complete: 21.7%; Average loss: 3.1402
Iteration: 1088; Percent complete: 21.8%; Average loss:

Iteration: 1206; Percent complete: 24.1%; Average loss: 3.2597
Iteration: 1207; Percent complete: 24.1%; Average loss: 3.4205
Iteration: 1208; Percent complete: 24.2%; Average loss: 3.1578
Iteration: 1209; Percent complete: 24.2%; Average loss: 3.0280
Iteration: 1210; Percent complete: 24.2%; Average loss: 3.0351
Iteration: 1211; Percent complete: 24.2%; Average loss: 3.0238
Iteration: 1212; Percent complete: 24.2%; Average loss: 3.2267
Iteration: 1213; Percent complete: 24.3%; Average loss: 2.8200
Iteration: 1214; Percent complete: 24.3%; Average loss: 2.8642
Iteration: 1215; Percent complete: 24.3%; Average loss: 3.0005
Iteration: 1216; Percent complete: 24.3%; Average loss: 3.2224
Iteration: 1217; Percent complete: 24.3%; Average loss: 3.2559
Iteration: 1218; Percent complete: 24.4%; Average loss: 3.2985
Iteration: 1219; Percent complete: 24.4%; Average loss: 3.3132
Iteration: 1220; Percent complete: 24.4%; Average loss: 3.0778
Iteration: 1221; Percent complete: 24.4%; Average loss:

Iteration: 1339; Percent complete: 26.8%; Average loss: 3.2322
Iteration: 1340; Percent complete: 26.8%; Average loss: 3.2624
Iteration: 1341; Percent complete: 26.8%; Average loss: 3.2604
Iteration: 1342; Percent complete: 26.8%; Average loss: 2.9682
Iteration: 1343; Percent complete: 26.9%; Average loss: 3.0639
Iteration: 1344; Percent complete: 26.9%; Average loss: 3.2385
Iteration: 1345; Percent complete: 26.9%; Average loss: 2.9467
Iteration: 1346; Percent complete: 26.9%; Average loss: 3.1456
Iteration: 1347; Percent complete: 26.9%; Average loss: 3.0390
Iteration: 1348; Percent complete: 27.0%; Average loss: 2.7740
Iteration: 1349; Percent complete: 27.0%; Average loss: 3.1261
Iteration: 1350; Percent complete: 27.0%; Average loss: 3.3410
Iteration: 1351; Percent complete: 27.0%; Average loss: 3.2476
Iteration: 1352; Percent complete: 27.0%; Average loss: 3.3477
Iteration: 1353; Percent complete: 27.1%; Average loss: 3.1191
Iteration: 1354; Percent complete: 27.1%; Average loss:

Iteration: 1472; Percent complete: 29.4%; Average loss: 3.2972
Iteration: 1473; Percent complete: 29.5%; Average loss: 3.1677
Iteration: 1474; Percent complete: 29.5%; Average loss: 3.0387
Iteration: 1475; Percent complete: 29.5%; Average loss: 2.9974
Iteration: 1476; Percent complete: 29.5%; Average loss: 3.2374
Iteration: 1477; Percent complete: 29.5%; Average loss: 3.0763
Iteration: 1478; Percent complete: 29.6%; Average loss: 3.0703
Iteration: 1479; Percent complete: 29.6%; Average loss: 3.1010
Iteration: 1480; Percent complete: 29.6%; Average loss: 2.9833
Iteration: 1481; Percent complete: 29.6%; Average loss: 3.1506
Iteration: 1482; Percent complete: 29.6%; Average loss: 3.2570
Iteration: 1483; Percent complete: 29.7%; Average loss: 3.0503
Iteration: 1484; Percent complete: 29.7%; Average loss: 2.9894
Iteration: 1485; Percent complete: 29.7%; Average loss: 3.1239
Iteration: 1486; Percent complete: 29.7%; Average loss: 2.8622
Iteration: 1487; Percent complete: 29.7%; Average loss:

Iteration: 1603; Percent complete: 32.1%; Average loss: 2.9020
Iteration: 1604; Percent complete: 32.1%; Average loss: 3.2179
Iteration: 1605; Percent complete: 32.1%; Average loss: 3.0460
Iteration: 1606; Percent complete: 32.1%; Average loss: 3.0291
Iteration: 1607; Percent complete: 32.1%; Average loss: 3.1721
Iteration: 1608; Percent complete: 32.2%; Average loss: 3.0566
Iteration: 1609; Percent complete: 32.2%; Average loss: 2.9097
Iteration: 1610; Percent complete: 32.2%; Average loss: 3.0788
Iteration: 1611; Percent complete: 32.2%; Average loss: 3.1649
Iteration: 1612; Percent complete: 32.2%; Average loss: 2.8251
Iteration: 1613; Percent complete: 32.3%; Average loss: 2.9317
Iteration: 1614; Percent complete: 32.3%; Average loss: 2.8920
Iteration: 1615; Percent complete: 32.3%; Average loss: 2.9791
Iteration: 1616; Percent complete: 32.3%; Average loss: 3.2452
Iteration: 1617; Percent complete: 32.3%; Average loss: 2.8108
Iteration: 1618; Percent complete: 32.4%; Average loss:

Iteration: 1734; Percent complete: 34.7%; Average loss: 2.9625
Iteration: 1735; Percent complete: 34.7%; Average loss: 3.1435
Iteration: 1736; Percent complete: 34.7%; Average loss: 2.6712
Iteration: 1737; Percent complete: 34.7%; Average loss: 3.0196
Iteration: 1738; Percent complete: 34.8%; Average loss: 3.0304
Iteration: 1739; Percent complete: 34.8%; Average loss: 3.0313
Iteration: 1740; Percent complete: 34.8%; Average loss: 3.1217
Iteration: 1741; Percent complete: 34.8%; Average loss: 3.1880
Iteration: 1742; Percent complete: 34.8%; Average loss: 2.9959
Iteration: 1743; Percent complete: 34.9%; Average loss: 2.9409
Iteration: 1744; Percent complete: 34.9%; Average loss: 2.9178
Iteration: 1745; Percent complete: 34.9%; Average loss: 2.6743
Iteration: 1746; Percent complete: 34.9%; Average loss: 2.9670
Iteration: 1747; Percent complete: 34.9%; Average loss: 3.1451
Iteration: 1748; Percent complete: 35.0%; Average loss: 2.9403
Iteration: 1749; Percent complete: 35.0%; Average loss:

Iteration: 1867; Percent complete: 37.3%; Average loss: 2.9397
Iteration: 1868; Percent complete: 37.4%; Average loss: 3.1245
Iteration: 1869; Percent complete: 37.4%; Average loss: 2.9305
Iteration: 1870; Percent complete: 37.4%; Average loss: 2.8337
Iteration: 1871; Percent complete: 37.4%; Average loss: 2.9611
Iteration: 1872; Percent complete: 37.4%; Average loss: 2.9240
Iteration: 1873; Percent complete: 37.5%; Average loss: 2.8908
Iteration: 1874; Percent complete: 37.5%; Average loss: 2.9480
Iteration: 1875; Percent complete: 37.5%; Average loss: 3.0909
Iteration: 1876; Percent complete: 37.5%; Average loss: 2.6216
Iteration: 1877; Percent complete: 37.5%; Average loss: 2.9018
Iteration: 1878; Percent complete: 37.6%; Average loss: 3.1919
Iteration: 1879; Percent complete: 37.6%; Average loss: 3.2359
Iteration: 1880; Percent complete: 37.6%; Average loss: 2.8504
Iteration: 1881; Percent complete: 37.6%; Average loss: 3.0447
Iteration: 1882; Percent complete: 37.6%; Average loss:

Iteration: 1998; Percent complete: 40.0%; Average loss: 3.0890
Iteration: 1999; Percent complete: 40.0%; Average loss: 2.7906
Iteration: 2000; Percent complete: 40.0%; Average loss: 2.8742
Iteration: 2001; Percent complete: 40.0%; Average loss: 2.9050
Iteration: 2002; Percent complete: 40.0%; Average loss: 2.8779
Iteration: 2003; Percent complete: 40.1%; Average loss: 2.8430
Iteration: 2004; Percent complete: 40.1%; Average loss: 2.8954
Iteration: 2005; Percent complete: 40.1%; Average loss: 3.1238
Iteration: 2006; Percent complete: 40.1%; Average loss: 3.1128
Iteration: 2007; Percent complete: 40.1%; Average loss: 2.6986
Iteration: 2008; Percent complete: 40.2%; Average loss: 2.9152
Iteration: 2009; Percent complete: 40.2%; Average loss: 2.8812
Iteration: 2010; Percent complete: 40.2%; Average loss: 2.8305
Iteration: 2011; Percent complete: 40.2%; Average loss: 2.8615
Iteration: 2012; Percent complete: 40.2%; Average loss: 3.0732
Iteration: 2013; Percent complete: 40.3%; Average loss:

Iteration: 2130; Percent complete: 42.6%; Average loss: 2.7744
Iteration: 2131; Percent complete: 42.6%; Average loss: 3.2777
Iteration: 2132; Percent complete: 42.6%; Average loss: 2.8065
Iteration: 2133; Percent complete: 42.7%; Average loss: 2.6520
Iteration: 2134; Percent complete: 42.7%; Average loss: 2.8711
Iteration: 2135; Percent complete: 42.7%; Average loss: 2.8899
Iteration: 2136; Percent complete: 42.7%; Average loss: 2.8597
Iteration: 2137; Percent complete: 42.7%; Average loss: 2.8868
Iteration: 2138; Percent complete: 42.8%; Average loss: 2.8708
Iteration: 2139; Percent complete: 42.8%; Average loss: 2.8715
Iteration: 2140; Percent complete: 42.8%; Average loss: 2.8438
Iteration: 2141; Percent complete: 42.8%; Average loss: 2.8332
Iteration: 2142; Percent complete: 42.8%; Average loss: 2.9749
Iteration: 2143; Percent complete: 42.9%; Average loss: 3.0678
Iteration: 2144; Percent complete: 42.9%; Average loss: 3.1096
Iteration: 2145; Percent complete: 42.9%; Average loss:

Iteration: 2261; Percent complete: 45.2%; Average loss: 2.7179
Iteration: 2262; Percent complete: 45.2%; Average loss: 2.6865
Iteration: 2263; Percent complete: 45.3%; Average loss: 2.6562
Iteration: 2264; Percent complete: 45.3%; Average loss: 2.7668
Iteration: 2265; Percent complete: 45.3%; Average loss: 2.7129
Iteration: 2266; Percent complete: 45.3%; Average loss: 2.8324
Iteration: 2267; Percent complete: 45.3%; Average loss: 3.0167
Iteration: 2268; Percent complete: 45.4%; Average loss: 3.0096
Iteration: 2269; Percent complete: 45.4%; Average loss: 2.5561
Iteration: 2270; Percent complete: 45.4%; Average loss: 2.6835
Iteration: 2271; Percent complete: 45.4%; Average loss: 3.1427
Iteration: 2272; Percent complete: 45.4%; Average loss: 2.7986
Iteration: 2273; Percent complete: 45.5%; Average loss: 2.9186
Iteration: 2274; Percent complete: 45.5%; Average loss: 3.0663
Iteration: 2275; Percent complete: 45.5%; Average loss: 2.7462
Iteration: 2276; Percent complete: 45.5%; Average loss:

Iteration: 2394; Percent complete: 47.9%; Average loss: 2.8643
Iteration: 2395; Percent complete: 47.9%; Average loss: 2.7439
Iteration: 2396; Percent complete: 47.9%; Average loss: 2.5542
Iteration: 2397; Percent complete: 47.9%; Average loss: 2.9876
Iteration: 2398; Percent complete: 48.0%; Average loss: 2.8257
Iteration: 2399; Percent complete: 48.0%; Average loss: 2.9476
Iteration: 2400; Percent complete: 48.0%; Average loss: 2.7532
Iteration: 2401; Percent complete: 48.0%; Average loss: 3.1027
Iteration: 2402; Percent complete: 48.0%; Average loss: 2.9711
Iteration: 2403; Percent complete: 48.1%; Average loss: 2.5498
Iteration: 2404; Percent complete: 48.1%; Average loss: 2.7806
Iteration: 2405; Percent complete: 48.1%; Average loss: 2.8392
Iteration: 2406; Percent complete: 48.1%; Average loss: 2.6944
Iteration: 2407; Percent complete: 48.1%; Average loss: 2.7152
Iteration: 2408; Percent complete: 48.2%; Average loss: 2.7419
Iteration: 2409; Percent complete: 48.2%; Average loss:

Iteration: 2525; Percent complete: 50.5%; Average loss: 2.7262
Iteration: 2526; Percent complete: 50.5%; Average loss: 2.6802
Iteration: 2527; Percent complete: 50.5%; Average loss: 2.7193
Iteration: 2528; Percent complete: 50.6%; Average loss: 2.6300
Iteration: 2529; Percent complete: 50.6%; Average loss: 2.7909
Iteration: 2530; Percent complete: 50.6%; Average loss: 2.7510
Iteration: 2531; Percent complete: 50.6%; Average loss: 2.8625
Iteration: 2532; Percent complete: 50.6%; Average loss: 2.8379
Iteration: 2533; Percent complete: 50.7%; Average loss: 2.5447
Iteration: 2534; Percent complete: 50.7%; Average loss: 2.6977
Iteration: 2535; Percent complete: 50.7%; Average loss: 2.9552
Iteration: 2536; Percent complete: 50.7%; Average loss: 2.6661
Iteration: 2537; Percent complete: 50.7%; Average loss: 2.7417
Iteration: 2538; Percent complete: 50.8%; Average loss: 2.7229
Iteration: 2539; Percent complete: 50.8%; Average loss: 2.7143
Iteration: 2540; Percent complete: 50.8%; Average loss:

Iteration: 2656; Percent complete: 53.1%; Average loss: 2.8912
Iteration: 2657; Percent complete: 53.1%; Average loss: 2.9352
Iteration: 2658; Percent complete: 53.2%; Average loss: 2.7332
Iteration: 2659; Percent complete: 53.2%; Average loss: 2.7960
Iteration: 2660; Percent complete: 53.2%; Average loss: 2.6278
Iteration: 2661; Percent complete: 53.2%; Average loss: 2.8293
Iteration: 2662; Percent complete: 53.2%; Average loss: 2.9540
Iteration: 2663; Percent complete: 53.3%; Average loss: 2.8952
Iteration: 2664; Percent complete: 53.3%; Average loss: 2.6417
Iteration: 2665; Percent complete: 53.3%; Average loss: 2.6208
Iteration: 2666; Percent complete: 53.3%; Average loss: 2.6427
Iteration: 2667; Percent complete: 53.3%; Average loss: 2.9195
Iteration: 2668; Percent complete: 53.4%; Average loss: 2.6476
Iteration: 2669; Percent complete: 53.4%; Average loss: 2.6617
Iteration: 2670; Percent complete: 53.4%; Average loss: 2.5980
Iteration: 2671; Percent complete: 53.4%; Average loss:

Iteration: 2788; Percent complete: 55.8%; Average loss: 2.7770
Iteration: 2789; Percent complete: 55.8%; Average loss: 2.7523
Iteration: 2790; Percent complete: 55.8%; Average loss: 2.7031
Iteration: 2791; Percent complete: 55.8%; Average loss: 2.7811
Iteration: 2792; Percent complete: 55.8%; Average loss: 2.6698
Iteration: 2793; Percent complete: 55.9%; Average loss: 2.6288
Iteration: 2794; Percent complete: 55.9%; Average loss: 2.4820
Iteration: 2795; Percent complete: 55.9%; Average loss: 2.6402
Iteration: 2796; Percent complete: 55.9%; Average loss: 2.6685
Iteration: 2797; Percent complete: 55.9%; Average loss: 2.5680
Iteration: 2798; Percent complete: 56.0%; Average loss: 2.6454
Iteration: 2799; Percent complete: 56.0%; Average loss: 2.7257
Iteration: 2800; Percent complete: 56.0%; Average loss: 2.6394
Iteration: 2801; Percent complete: 56.0%; Average loss: 2.9878
Iteration: 2802; Percent complete: 56.0%; Average loss: 2.4373
Iteration: 2803; Percent complete: 56.1%; Average loss:

Iteration: 2920; Percent complete: 58.4%; Average loss: 2.7586
Iteration: 2921; Percent complete: 58.4%; Average loss: 2.7053
Iteration: 2922; Percent complete: 58.4%; Average loss: 2.7576
Iteration: 2923; Percent complete: 58.5%; Average loss: 2.7756
Iteration: 2924; Percent complete: 58.5%; Average loss: 2.6991
Iteration: 2925; Percent complete: 58.5%; Average loss: 2.8105
Iteration: 2926; Percent complete: 58.5%; Average loss: 2.5271
Iteration: 2927; Percent complete: 58.5%; Average loss: 2.8418
Iteration: 2928; Percent complete: 58.6%; Average loss: 2.5189
Iteration: 2929; Percent complete: 58.6%; Average loss: 2.8256
Iteration: 2930; Percent complete: 58.6%; Average loss: 2.5653
Iteration: 2931; Percent complete: 58.6%; Average loss: 2.6109
Iteration: 2932; Percent complete: 58.6%; Average loss: 2.5659
Iteration: 2933; Percent complete: 58.7%; Average loss: 2.5584
Iteration: 2934; Percent complete: 58.7%; Average loss: 2.7201
Iteration: 2935; Percent complete: 58.7%; Average loss:

Iteration: 3051; Percent complete: 61.0%; Average loss: 2.5433
Iteration: 3052; Percent complete: 61.0%; Average loss: 2.2703
Iteration: 3053; Percent complete: 61.1%; Average loss: 2.6872
Iteration: 3054; Percent complete: 61.1%; Average loss: 2.6027
Iteration: 3055; Percent complete: 61.1%; Average loss: 2.5404
Iteration: 3056; Percent complete: 61.1%; Average loss: 2.8068
Iteration: 3057; Percent complete: 61.1%; Average loss: 2.4158
Iteration: 3058; Percent complete: 61.2%; Average loss: 2.7808
Iteration: 3059; Percent complete: 61.2%; Average loss: 2.5118
Iteration: 3060; Percent complete: 61.2%; Average loss: 2.6446
Iteration: 3061; Percent complete: 61.2%; Average loss: 2.4382
Iteration: 3062; Percent complete: 61.2%; Average loss: 2.6038
Iteration: 3063; Percent complete: 61.3%; Average loss: 2.6224
Iteration: 3064; Percent complete: 61.3%; Average loss: 2.4442
Iteration: 3065; Percent complete: 61.3%; Average loss: 2.6576
Iteration: 3066; Percent complete: 61.3%; Average loss:

Iteration: 3183; Percent complete: 63.7%; Average loss: 2.5804
Iteration: 3184; Percent complete: 63.7%; Average loss: 2.6512
Iteration: 3185; Percent complete: 63.7%; Average loss: 2.7630
Iteration: 3186; Percent complete: 63.7%; Average loss: 2.5716
Iteration: 3187; Percent complete: 63.7%; Average loss: 2.6421
Iteration: 3188; Percent complete: 63.8%; Average loss: 2.4693
Iteration: 3189; Percent complete: 63.8%; Average loss: 2.6570
Iteration: 3190; Percent complete: 63.8%; Average loss: 2.6503
Iteration: 3191; Percent complete: 63.8%; Average loss: 2.3478
Iteration: 3192; Percent complete: 63.8%; Average loss: 2.6266
Iteration: 3193; Percent complete: 63.9%; Average loss: 2.5360
Iteration: 3194; Percent complete: 63.9%; Average loss: 2.4765
Iteration: 3195; Percent complete: 63.9%; Average loss: 2.5262
Iteration: 3196; Percent complete: 63.9%; Average loss: 2.3177
Iteration: 3197; Percent complete: 63.9%; Average loss: 2.5593
Iteration: 3198; Percent complete: 64.0%; Average loss:

Iteration: 3315; Percent complete: 66.3%; Average loss: 2.3760
Iteration: 3316; Percent complete: 66.3%; Average loss: 2.7370
Iteration: 3317; Percent complete: 66.3%; Average loss: 2.6924
Iteration: 3318; Percent complete: 66.4%; Average loss: 2.6426
Iteration: 3319; Percent complete: 66.4%; Average loss: 2.5792
Iteration: 3320; Percent complete: 66.4%; Average loss: 2.4334
Iteration: 3321; Percent complete: 66.4%; Average loss: 2.2510
Iteration: 3322; Percent complete: 66.4%; Average loss: 2.5555
Iteration: 3323; Percent complete: 66.5%; Average loss: 2.5732
Iteration: 3324; Percent complete: 66.5%; Average loss: 2.3977
Iteration: 3325; Percent complete: 66.5%; Average loss: 2.4571
Iteration: 3326; Percent complete: 66.5%; Average loss: 2.4417
Iteration: 3327; Percent complete: 66.5%; Average loss: 2.4911
Iteration: 3328; Percent complete: 66.6%; Average loss: 2.4999
Iteration: 3329; Percent complete: 66.6%; Average loss: 2.2735
Iteration: 3330; Percent complete: 66.6%; Average loss:

Iteration: 3447; Percent complete: 68.9%; Average loss: 2.2748
Iteration: 3448; Percent complete: 69.0%; Average loss: 2.3517
Iteration: 3449; Percent complete: 69.0%; Average loss: 2.5598
Iteration: 3450; Percent complete: 69.0%; Average loss: 2.2788
Iteration: 3451; Percent complete: 69.0%; Average loss: 2.3811
Iteration: 3452; Percent complete: 69.0%; Average loss: 2.4347
Iteration: 3453; Percent complete: 69.1%; Average loss: 2.3901
Iteration: 3454; Percent complete: 69.1%; Average loss: 2.3927
Iteration: 3455; Percent complete: 69.1%; Average loss: 2.3549
Iteration: 3456; Percent complete: 69.1%; Average loss: 2.6585
Iteration: 3457; Percent complete: 69.1%; Average loss: 2.3367
Iteration: 3458; Percent complete: 69.2%; Average loss: 2.4026
Iteration: 3459; Percent complete: 69.2%; Average loss: 2.4993
Iteration: 3460; Percent complete: 69.2%; Average loss: 2.7432
Iteration: 3461; Percent complete: 69.2%; Average loss: 2.4122
Iteration: 3462; Percent complete: 69.2%; Average loss:

Iteration: 3578; Percent complete: 71.6%; Average loss: 2.1584
Iteration: 3579; Percent complete: 71.6%; Average loss: 2.2109
Iteration: 3580; Percent complete: 71.6%; Average loss: 2.3358
Iteration: 3581; Percent complete: 71.6%; Average loss: 2.4064
Iteration: 3582; Percent complete: 71.6%; Average loss: 2.4487
Iteration: 3583; Percent complete: 71.7%; Average loss: 2.4416
Iteration: 3584; Percent complete: 71.7%; Average loss: 2.7296
Iteration: 3585; Percent complete: 71.7%; Average loss: 2.1832
Iteration: 3586; Percent complete: 71.7%; Average loss: 2.3975
Iteration: 3587; Percent complete: 71.7%; Average loss: 2.4777
Iteration: 3588; Percent complete: 71.8%; Average loss: 2.1488
Iteration: 3589; Percent complete: 71.8%; Average loss: 2.4692
Iteration: 3590; Percent complete: 71.8%; Average loss: 2.3958
Iteration: 3591; Percent complete: 71.8%; Average loss: 2.2761
Iteration: 3592; Percent complete: 71.8%; Average loss: 2.2718
Iteration: 3593; Percent complete: 71.9%; Average loss:

Iteration: 3710; Percent complete: 74.2%; Average loss: 2.4537
Iteration: 3711; Percent complete: 74.2%; Average loss: 2.5486
Iteration: 3712; Percent complete: 74.2%; Average loss: 2.6213
Iteration: 3713; Percent complete: 74.3%; Average loss: 2.1501
Iteration: 3714; Percent complete: 74.3%; Average loss: 2.5694
Iteration: 3715; Percent complete: 74.3%; Average loss: 2.4567
Iteration: 3716; Percent complete: 74.3%; Average loss: 2.3506
Iteration: 3717; Percent complete: 74.3%; Average loss: 2.2491
Iteration: 3718; Percent complete: 74.4%; Average loss: 2.4422
Iteration: 3719; Percent complete: 74.4%; Average loss: 2.6309
Iteration: 3720; Percent complete: 74.4%; Average loss: 2.3316
Iteration: 3721; Percent complete: 74.4%; Average loss: 2.2493
Iteration: 3722; Percent complete: 74.4%; Average loss: 2.3730
Iteration: 3723; Percent complete: 74.5%; Average loss: 2.2579
Iteration: 3724; Percent complete: 74.5%; Average loss: 2.4403
Iteration: 3725; Percent complete: 74.5%; Average loss:

Iteration: 3843; Percent complete: 76.9%; Average loss: 2.2189
Iteration: 3844; Percent complete: 76.9%; Average loss: 2.3433
Iteration: 3845; Percent complete: 76.9%; Average loss: 2.2896
Iteration: 3846; Percent complete: 76.9%; Average loss: 2.2936
Iteration: 3847; Percent complete: 76.9%; Average loss: 2.2408
Iteration: 3848; Percent complete: 77.0%; Average loss: 2.1660
Iteration: 3849; Percent complete: 77.0%; Average loss: 2.3058
Iteration: 3850; Percent complete: 77.0%; Average loss: 2.4322
Iteration: 3851; Percent complete: 77.0%; Average loss: 2.2864
Iteration: 3852; Percent complete: 77.0%; Average loss: 2.0840
Iteration: 3853; Percent complete: 77.1%; Average loss: 2.3076
Iteration: 3854; Percent complete: 77.1%; Average loss: 2.2306
Iteration: 3855; Percent complete: 77.1%; Average loss: 2.3159
Iteration: 3856; Percent complete: 77.1%; Average loss: 2.2211
Iteration: 3857; Percent complete: 77.1%; Average loss: 2.3363
Iteration: 3858; Percent complete: 77.2%; Average loss:

Iteration: 3975; Percent complete: 79.5%; Average loss: 2.2864
Iteration: 3976; Percent complete: 79.5%; Average loss: 2.1976
Iteration: 3977; Percent complete: 79.5%; Average loss: 2.3987
Iteration: 3978; Percent complete: 79.6%; Average loss: 2.2182
Iteration: 3979; Percent complete: 79.6%; Average loss: 2.1422
Iteration: 3980; Percent complete: 79.6%; Average loss: 2.4047
Iteration: 3981; Percent complete: 79.6%; Average loss: 2.2039
Iteration: 3982; Percent complete: 79.6%; Average loss: 2.3658
Iteration: 3983; Percent complete: 79.7%; Average loss: 2.4202
Iteration: 3984; Percent complete: 79.7%; Average loss: 2.2079
Iteration: 3985; Percent complete: 79.7%; Average loss: 2.2952
Iteration: 3986; Percent complete: 79.7%; Average loss: 2.2565
Iteration: 3987; Percent complete: 79.7%; Average loss: 2.0186
Iteration: 3988; Percent complete: 79.8%; Average loss: 2.1679
Iteration: 3989; Percent complete: 79.8%; Average loss: 2.3388
Iteration: 3990; Percent complete: 79.8%; Average loss:

Iteration: 4107; Percent complete: 82.1%; Average loss: 2.0636
Iteration: 4108; Percent complete: 82.2%; Average loss: 2.2243
Iteration: 4109; Percent complete: 82.2%; Average loss: 2.2883
Iteration: 4110; Percent complete: 82.2%; Average loss: 2.2142
Iteration: 4111; Percent complete: 82.2%; Average loss: 2.2087
Iteration: 4112; Percent complete: 82.2%; Average loss: 2.2571
Iteration: 4113; Percent complete: 82.3%; Average loss: 2.2441
Iteration: 4114; Percent complete: 82.3%; Average loss: 1.9638
Iteration: 4115; Percent complete: 82.3%; Average loss: 2.3796
Iteration: 4116; Percent complete: 82.3%; Average loss: 2.2688
Iteration: 4117; Percent complete: 82.3%; Average loss: 2.1996
Iteration: 4118; Percent complete: 82.4%; Average loss: 2.1198
Iteration: 4119; Percent complete: 82.4%; Average loss: 2.0684
Iteration: 4120; Percent complete: 82.4%; Average loss: 2.3430
Iteration: 4121; Percent complete: 82.4%; Average loss: 2.0254
Iteration: 4122; Percent complete: 82.4%; Average loss:

Iteration: 4238; Percent complete: 84.8%; Average loss: 2.4722
Iteration: 4239; Percent complete: 84.8%; Average loss: 1.9090
Iteration: 4240; Percent complete: 84.8%; Average loss: 2.1417
Iteration: 4241; Percent complete: 84.8%; Average loss: 2.1751
Iteration: 4242; Percent complete: 84.8%; Average loss: 2.3054
Iteration: 4243; Percent complete: 84.9%; Average loss: 1.9388
Iteration: 4244; Percent complete: 84.9%; Average loss: 2.2481
Iteration: 4245; Percent complete: 84.9%; Average loss: 2.1329
Iteration: 4246; Percent complete: 84.9%; Average loss: 2.1753
Iteration: 4247; Percent complete: 84.9%; Average loss: 2.1168
Iteration: 4248; Percent complete: 85.0%; Average loss: 2.2916
Iteration: 4249; Percent complete: 85.0%; Average loss: 2.0997
Iteration: 4250; Percent complete: 85.0%; Average loss: 2.2542
Iteration: 4251; Percent complete: 85.0%; Average loss: 2.1129
Iteration: 4252; Percent complete: 85.0%; Average loss: 1.9821
Iteration: 4253; Percent complete: 85.1%; Average loss:

Iteration: 4368; Percent complete: 87.4%; Average loss: 2.0523
Iteration: 4369; Percent complete: 87.4%; Average loss: 2.0760
Iteration: 4370; Percent complete: 87.4%; Average loss: 2.1682
Iteration: 4371; Percent complete: 87.4%; Average loss: 2.0732
Iteration: 4372; Percent complete: 87.4%; Average loss: 2.0640
Iteration: 4373; Percent complete: 87.5%; Average loss: 2.1069
Iteration: 4374; Percent complete: 87.5%; Average loss: 2.1501
Iteration: 4375; Percent complete: 87.5%; Average loss: 2.0437
Iteration: 4376; Percent complete: 87.5%; Average loss: 2.1802
Iteration: 4377; Percent complete: 87.5%; Average loss: 2.1072
Iteration: 4378; Percent complete: 87.6%; Average loss: 1.8927
Iteration: 4379; Percent complete: 87.6%; Average loss: 2.0044
Iteration: 4380; Percent complete: 87.6%; Average loss: 2.4334
Iteration: 4381; Percent complete: 87.6%; Average loss: 2.0987
Iteration: 4382; Percent complete: 87.6%; Average loss: 2.3430
Iteration: 4383; Percent complete: 87.7%; Average loss:

Iteration: 4499; Percent complete: 90.0%; Average loss: 2.1584
Iteration: 4500; Percent complete: 90.0%; Average loss: 1.8663
Iteration: 4501; Percent complete: 90.0%; Average loss: 2.0400
Iteration: 4502; Percent complete: 90.0%; Average loss: 2.1359
Iteration: 4503; Percent complete: 90.1%; Average loss: 1.9118
Iteration: 4504; Percent complete: 90.1%; Average loss: 1.9485
Iteration: 4505; Percent complete: 90.1%; Average loss: 2.1368
Iteration: 4506; Percent complete: 90.1%; Average loss: 2.1534
Iteration: 4507; Percent complete: 90.1%; Average loss: 1.7915
Iteration: 4508; Percent complete: 90.2%; Average loss: 2.1070
Iteration: 4509; Percent complete: 90.2%; Average loss: 2.1812
Iteration: 4510; Percent complete: 90.2%; Average loss: 2.0344
Iteration: 4511; Percent complete: 90.2%; Average loss: 2.0102
Iteration: 4512; Percent complete: 90.2%; Average loss: 2.1149
Iteration: 4513; Percent complete: 90.3%; Average loss: 1.8497
Iteration: 4514; Percent complete: 90.3%; Average loss:

Iteration: 4632; Percent complete: 92.6%; Average loss: 1.9796
Iteration: 4633; Percent complete: 92.7%; Average loss: 2.1731
Iteration: 4634; Percent complete: 92.7%; Average loss: 2.0520
Iteration: 4635; Percent complete: 92.7%; Average loss: 1.9134
Iteration: 4636; Percent complete: 92.7%; Average loss: 1.9936
Iteration: 4637; Percent complete: 92.7%; Average loss: 2.0030
Iteration: 4638; Percent complete: 92.8%; Average loss: 2.1929
Iteration: 4639; Percent complete: 92.8%; Average loss: 1.9522
Iteration: 4640; Percent complete: 92.8%; Average loss: 2.2269
Iteration: 4641; Percent complete: 92.8%; Average loss: 1.8345
Iteration: 4642; Percent complete: 92.8%; Average loss: 2.0609
Iteration: 4643; Percent complete: 92.9%; Average loss: 2.0609
Iteration: 4644; Percent complete: 92.9%; Average loss: 1.8726
Iteration: 4645; Percent complete: 92.9%; Average loss: 2.0373
Iteration: 4646; Percent complete: 92.9%; Average loss: 1.9909
Iteration: 4647; Percent complete: 92.9%; Average loss:

Iteration: 4764; Percent complete: 95.3%; Average loss: 1.9864
Iteration: 4765; Percent complete: 95.3%; Average loss: 1.9690
Iteration: 4766; Percent complete: 95.3%; Average loss: 1.9039
Iteration: 4767; Percent complete: 95.3%; Average loss: 1.8893
Iteration: 4768; Percent complete: 95.4%; Average loss: 1.9856
Iteration: 4769; Percent complete: 95.4%; Average loss: 2.0903
Iteration: 4770; Percent complete: 95.4%; Average loss: 1.9832
Iteration: 4771; Percent complete: 95.4%; Average loss: 1.8085
Iteration: 4772; Percent complete: 95.4%; Average loss: 2.0202
Iteration: 4773; Percent complete: 95.5%; Average loss: 2.2801
Iteration: 4774; Percent complete: 95.5%; Average loss: 1.9309
Iteration: 4775; Percent complete: 95.5%; Average loss: 2.0911
Iteration: 4776; Percent complete: 95.5%; Average loss: 1.8911
Iteration: 4777; Percent complete: 95.5%; Average loss: 1.7102
Iteration: 4778; Percent complete: 95.6%; Average loss: 1.8941
Iteration: 4779; Percent complete: 95.6%; Average loss:

Iteration: 4895; Percent complete: 97.9%; Average loss: 1.9503
Iteration: 4896; Percent complete: 97.9%; Average loss: 2.0707
Iteration: 4897; Percent complete: 97.9%; Average loss: 2.0721
Iteration: 4898; Percent complete: 98.0%; Average loss: 1.9337
Iteration: 4899; Percent complete: 98.0%; Average loss: 2.0071
Iteration: 4900; Percent complete: 98.0%; Average loss: 1.6646
Iteration: 4901; Percent complete: 98.0%; Average loss: 1.8533
Iteration: 4902; Percent complete: 98.0%; Average loss: 1.8753
Iteration: 4903; Percent complete: 98.1%; Average loss: 1.9057
Iteration: 4904; Percent complete: 98.1%; Average loss: 1.8447
Iteration: 4905; Percent complete: 98.1%; Average loss: 2.0366
Iteration: 4906; Percent complete: 98.1%; Average loss: 1.8140
Iteration: 4907; Percent complete: 98.1%; Average loss: 2.0602
Iteration: 4908; Percent complete: 98.2%; Average loss: 1.9090
Iteration: 4909; Percent complete: 98.2%; Average loss: 2.0353
Iteration: 4910; Percent complete: 98.2%; Average loss:

Run the model :)

In [20]:
# Set dropout layers to eval mode
encoder.eval()
decoder.eval()

# Initialize search module
searcher = GreedySearchDecoder(encoder, decoder)

# Begin chatting (uncomment and run the following line to begin)
evaluateInput(encoder, decoder, searcher, voc)

> Hello
Bot: hello . . . . 5
> How is your day
Bot: i don t know what i m saying . 9
> How are you
Bot: i m fine . . 5
> What is the weather like today?
Bot: no sir . . . . 6
> Are you hungry?
Bot: no . . . . . 6
> Can we get something to eat?
Bot: sure . . . . . 6
> Where should we eat?
Bot: i don t know . . . 7
> I love you
Bot: i love you . . . 6
> Really?
Bot: yeah . . . . . 6
> Dam
Error: Encountered unknown word.
> Damn
Bot: what ? ? . . 5
> I dont believe you
Error: Encountered unknown word.
> I do not believe you
Bot: i don t know . . 6
> quit


### Notes on the Model

Overall, the model is OK. It was not as intelligent as i thought it would be, but I suppose that is a good thing. One of the things that I noticed from playing with the bot was that everything is pretty predictable. In fact, it feels less like it is responding to you and more like it has memories what to day based on an input string from the user. I think the first thing we can do to improve the model is to add some randomness to the model, so asking "hello" wont keep generating the same output. To improve this one a bit more we can change some of the variable like the learning rate and the depth. Eventually, to improve the overall feeling of the chat bot we can implement an entirely new model. Try to give a more meaningful number to words rather than what index they first appear in (ask professor if he thinks this would even matter, or would this just be a change of domain).