# LSTM Bot

## Project Overview

In this project, you will build a chatbot that can converse with you at the command line. The chatbot will use a Sequence to Sequence text generation architecture with an LSTM as it's memory unit. You will also learn to use pretrained word embeddings to improve the performance of the model. At the conclusion of the project, you will be able to show your chatbot to potential employers.

Additionally, you have the option to use pretrained word embeddings in your model. We have loaded Brown Embeddings from Gensim in the starter code below. You can compare the performance of your model with pre-trained embeddings against a model without the embeddings.



---



A sequence to sequence model (Seq2Seq) has two components:
- An Encoder consisting of an embedding layer and LSTM unit.
- A Decoder consisting of an embedding layer, LSTM unit, and linear output unit.

The Seq2Seq model works by accepting an input into the Encoder, passing the hidden state from the Encoder to the Decoder, which the Decoder uses to output a series of token predictions.

## Dependencies

- Pytorch
- Numpy
- Pandas
- NLTK
- Gzip
- Gensim


Please choose a dataset from the Torchtext website. We recommend looking at the Squad dataset first. Here is a link to the website where you can view your options:

- https://pytorch.org/text/stable/datasets.html





## Import Libaries

In [2]:
# download liabries needed
# need to restart the kernal after running this cell
!pip install torch==1.12.0 torchdata==0.4.0 torchtext==0.13.0

Defaulting to user installation because normal site-packages is not writeable
Collecting torch==1.12.0
  Downloading torch-1.12.0-cp37-cp37m-manylinux1_x86_64.whl (776.3 MB)
[K     |████████████████████████████████| 776.3 MB 12 kB/s s eta 0:00:01
[?25hCollecting torchdata==0.4.0
  Downloading torchdata-0.4.0-cp37-cp37m-manylinux2014_x86_64.whl (4.4 MB)
[K     |████████████████████████████████| 4.4 MB 32.0 MB/s eta 0:00:01
[?25hCollecting torchtext==0.13.0
  Downloading torchtext-0.13.0-cp37-cp37m-manylinux1_x86_64.whl (1.9 MB)
[K     |████████████████████████████████| 1.9 MB 32.3 MB/s eta 0:00:01
Collecting portalocker>=2.0.0
  Downloading portalocker-2.7.0-py2.py3-none-any.whl (15 kB)
[31mERROR: torchvision 0.10.0 has requirement torch==1.9.0, but you'll have torch 1.12.0 which is incompatible.[0m
Installing collected packages: torch, portalocker, torchdata, torchtext
Successfully installed portalocker-2.7.0 torch-1.12.0 torchdata-0.4.0 torchtext-0.13.0


In [1]:
# import libraries
import gensim
import nltk
from nltk.stem import *
from nltk.tokenize import RegexpTokenizer
import numpy as np
import pandas as pd
import gzip
from nltk.corpus import brown

import torchtext

import string
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold

import torch
import torch.nn as nn
import torch.optim as optim

import random
import math

In [2]:
# set the random seeds
SEED=42

## Step 1: Build Vocabulary & create the Word Embeddings
reference: 
- https://pytorch.org/tutorials/beginner/chatbot_tutorial.html#load-and-trim-data
- https://github.com/iJoud/Seq2Seq-Chatbot/blob/main/(Starter%20Code)%20Chatbot%20With%20LSTM%20and%20Pretrained%20Embeddings.ipynb

In [3]:
# download the data
nltk.download('brown')
nltk.download('punkt')

# Output, save, and load brown embeddings

model = gensim.models.Word2Vec(brown.sents())
model.save('brown.embedding')

w2v = gensim.models.Word2Vec.load('brown.embedding')


[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Unzipping corpora/brown.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


In [4]:
# define a torch.device. This is used to tell torchText to put the tensors on the GPU or not
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [5]:
# function to load the data
def loadDF(path):
    
    '''
    You will use this function to load the dataset into a Pandas Dataframe for processing.
    Number of lines per split:
        train: 87599
        dev: 10570
    '''
    # load data
    train_iter, valid_iter = torchtext.datasets.SQuAD1(root=path, split=('train', 'dev'))
    
    # returns: DataPipe that yields data points from SQuaAD1 dataset which consist of context, question, 
    # list of answers and corresponding index in context
    # convert dataPipe to dictionary 
    # make simple pairs of questions and answers
    
#     train_dict = [{'src': question, 'trg': answers[0]} for _, question, answers, _ in train_iter]
#     valid_dict = [{'src': question, 'trg': answers[0]} for _, question, answers, _ in valid_iter]

    train_dict = {'src': [], 'trg': []}
    valid_dict = {'src': [], 'trg': []}
    
    for _, question, answers, _ in train_iter:
        train_dict['src'].append(question)
        train_dict['trg'].append(answers[0])

    for _, question, answers, _ in valid_iter:
        valid_dict['src'].append(question)
        valid_dict['trg'].append(answers[0])
        
    # convert Dictionaries to Pandas DataFrame
    train_df = pd.DataFrame(train_dict)    
    valid_df = pd.DataFrame(valid_dict)
    
    df = train_df.append(valid_df)
    
    return df

def prepare_text(sentence):
    
    '''
    Our text needs to be cleaned with a tokenizer. This function will perform that task.
    https://www.nltk.org/api/nltk.tokenize.html
    '''
    # example input: What is the Grotto at Notre Dame?
    # remove punctuation: 'what is the grotto at notre dame'
    # stemmer: 'what is the grotto at notr dame'
    # RegexpTokenizer: ['what', 'is', 'the', 'grotto', 'at', 'notr', 'dame']

    # clean text
    sentence = ''.join([s.lower() for s in sentence if s not in string.punctuation])
    
    stemmer = snowball.SnowballStemmer('english')
    sentence = ' '.join(stemmer.stem(w) for w in sentence.split())
    
    # tokenize text
    tokens = RegexpTokenizer(r'\w+').tokenize(sentence)
    
    return tokens


def train_test_split(SRC, TRG):
    
    '''
    Input: SRC, our list of questions from the dataset
            TRG, our list of responses from the dataset
    Output: Training and test datasets for SRC & TRG

    '''
    # split the data to train and test set
    SRC_train_dataset = SRC.sample(frac=0.8, random_state=SEED)
    SRC_test_dataset = SRC.drop(SRC_train_dataset.index)

    TRG_train_dataset = TRG.sample(frac=0.8, random_state=SEED)
    TRG_test_dataset = TRG.drop(TRG_train_dataset.index)
    
    return SRC_train_dataset, SRC_test_dataset, TRG_train_dataset, TRG_test_dataset


In [6]:
# test loadDF function
data_df = loadDF('data')

# to make implementation test quicker, grab a subset of whole dataset
data = data_df.iloc[:8000, :]

In [55]:
# check data
data.head(15)

Unnamed: 0,src,trg
0,"[to, whom, did, the, virgin, mari, alleg, appe...","[saint, bernadett, soubir]"
1,"[what, is, in, front, of, the, notr, dame, mai...","[a, copper, statu, of, christ]"
2,"[the, basilica, of, the, sacr, heart, at, notr...","[the, main, build]"
3,"[what, is, the, grotto, at, notr, dame]","[a, marian, place, of, prayer, and, reflect]"
4,"[what, sit, on, top, of, the, main, build, at,...","[a, golden, statu, of, the, virgin, mari]"
5,"[when, did, the, scholast, magazin, of, notr, ...","[septemb, 1876]"
6,"[how, often, is, notr, dame, the, juggler, pub...",[twice]
7,"[what, is, the, daili, student, paper, at, not...","[the, observ]"
8,"[how, mani, student, news, paper, are, found, ...",[three]
9,"[in, what, year, did, the, student, paper, com...",[1987]


In [8]:
# prepare the data
data.loc[:, 'src'] = data['src'].apply(prepare_text)
data.loc[:, 'trg'] = data['trg'].apply(prepare_text)
data.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[selected_item_labels] = value


Unnamed: 0,src,trg
0,"[to, whom, did, the, virgin, mari, alleg, appe...","[saint, bernadett, soubir]"
1,"[what, is, in, front, of, the, notr, dame, mai...","[a, copper, statu, of, christ]"
2,"[the, basilica, of, the, sacr, heart, at, notr...","[the, main, build]"
3,"[what, is, the, grotto, at, notr, dame]","[a, marian, place, of, prayer, and, reflect]"
4,"[what, sit, on, top, of, the, main, build, at,...","[a, golden, statu, of, the, virgin, mari]"


In [9]:
# get min, max length of dataset
max_src = 0 
max_trg = 0

for idx, r in data.iterrows():
    max_src = len(r['src']) if len(r['src']) > max_src else max_src
    max_trg = len(r['trg']) if len(r['trg']) > max_trg else max_trg

print(f'max_src: {max_src}, max_trg: {max_trg}')

max_src: 29, max_trg: 43


In [10]:
# define the Vovabulary Object
# ref: https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html

# Default word tokens
SOS_token = 0  # Start-of-sentence token
EOS_token = 1  # End-of-sentence token

class Vocab:
    def __init__(self, name):
        self.name = name
        self.trimmed = False
        self.word2index = {"SOS": SOS_token, "EOS": EOS_token}
        self.word2count = {}
        self.index2word = {SOS_token: "SOS", EOS_token: "EOS"}
        self.n_words = 2  # Count SOS, EOS
        
    def addWord(self, word):
        if word not in self.word2index:
            self.word2index[word] = self.n_words
            self.word2count[word] = 1
            self.index2word[self.n_words] = word
            self.n_words += 1
        else:
            self.word2count[word] += 1


In [11]:
# build vocabularies for questions "source" and answers "target"
vocab_src = Vocab(name='src')
vocab_trg = Vocab(name='trg')

for idx, r in data.iterrows():
    for w in r['src']:
        vocab_src.addWord(w)
    for w in r['trg']:
        vocab_trg.addWord(w)


In [12]:
print(f"tokens in vocab_src: {vocab_src.n_words}")
print(f"tokens in vocab_trg: {vocab_trg.n_words}")

tokens in vocab_src: 5980
tokens in vocab_trg: 6151


In [13]:
# split the train, test and valid dataset
SRC = data[['src']]
TRG = data[['trg']]
SRC_train_dataset, SRC_test_dataset, TRG_train_dataset, TRG_test_dataset = train_test_split(SRC, TRG)


In [14]:
print(f'SRC_train_dataset size: {SRC_train_dataset.shape}')
print(f'SRC_test_dataset size: {SRC_test_dataset.shape}')

print(f'TRG_train_dataset size: {TRG_train_dataset.shape}')
print(f'TRG_test_dataset size: {TRG_test_dataset.shape}')


SRC_train_dataset size: (6400, 1)
SRC_test_dataset size: (1600, 1)
TRG_train_dataset size: (6400, 1)
TRG_test_dataset size: (1600, 1)


In [15]:
# example of SRC_train_dataset
SRC_train_dataset.head()

Unnamed: 0,src
2215,"[which, dynasti, rule, all, of, china]"
2582,"[what, other, appl, product, was, unveil, on, ..."
1662,"[to, whom, did, chopin, reveal, in, letter, wh..."
3027,"[which, recur, jame, bond, charact, appear, in..."
4343,"[what, is, brooklyn, public, librari, system, ..."


In [16]:
# example of TRG_train_dataset
TRG_train_dataset.head()

Unnamed: 0,trg
2215,"[the, yuan, dynasti]"
2582,"[seventh, generat, ipod, nano]"
1662,"[tytus, woyciechowski]"
3027,"[m, q, and, eve, moneypenni]"
4343,"[brooklyn, public, librari]"


In [17]:
def createPairs(srs, trg):
    # convert df to list of pairs
    src_list = srs['src'].apply(lambda x: " ".join(x)).to_list()
    trg_list = trg['trg'].apply(lambda x: " ".join(x)).to_list()
    return [list(i) for i in zip(src_list, trg_list)]

In [18]:
# create pairs for traning and test data
train_pairs = createPairs(SRC_train_dataset, TRG_train_dataset)
test_pairs = createPairs(SRC_test_dataset, TRG_test_dataset)

In [19]:
# quick check
train_pairs[:5]

[['which dynasti rule all of china', 'the yuan dynasti'],
 ['what other appl product was unveil on septemb 12 2012',
  'seventh generat ipod nano'],
 ['to whom did chopin reveal in letter which part of his work were about the sing student he was infatu with',
  'tytus woyciechowski'],
 ['which recur jame bond charact appear in spectr', 'm q and eve moneypenni'],
 ['what is brooklyn public librari system call', 'brooklyn public librari']]

In [20]:
# ref: https://github.com/iJoud/Seq2Seq-Chatbot/blob/main/src/Data.py
def toTensor(vocab, sentence):
    # convert list of words "sentence" to a torch tensor of indices
#     indices = [vocab.word2index['SOS']]
    indices = [vocab.word2index[word] for word in sentence.split(' ')]
    indices.append(vocab.word2index['EOS'])
    return torch.Tensor(indices).long().to(device).view(-1, 1)

In [21]:
# convert train and test data to tensor
train_src = [toTensor(vocab_src, pair[0]) for pair in train_pairs]
train_trg = [toTensor(vocab_trg, pair[1]) for pair in train_pairs]
test_src = [toTensor(vocab_src, pair[0]) for pair in test_pairs]
test_trg = [toTensor(vocab_trg, pair[1]) for pair in test_pairs]

In [22]:
# quick check
train_src

[tensor([[  27],
         [2634],
         [2165],
         [ 190],
         [  17],
         [ 939],
         [   1]], device='cuda:0'),
 tensor([[  14],
         [ 180],
         [2912],
         [1159],
         [  75],
         [2994],
         [  31],
         [1247],
         [2299],
         [ 257],
         [   1]], device='cuda:0'),
 tensor([[   2],
         [   3],
         [   4],
         [2092],
         [ 897],
         [  10],
         [ 471],
         [  27],
         [ 259],
         [  17],
         [ 225],
         [ 226],
         [  81],
         [ 594],
         [   5],
         [ 733],
         [  42],
         [ 561],
         [  75],
         [2203],
         [  96],
         [   1]], device='cuda:0'),
 tensor([[  27],
         [ 697],
         [ 689],
         [3325],
         [ 830],
         [   9],
         [  10],
         [3320],
         [   1]], device='cuda:0'),
 tensor([[  14],
         [  15],
         [1426],
         [  52],
         [ 141],
      

## Step 2: Create the Encoder
The Encoder's job is to create a representation of the input sequence. Then, it captures the representation in the hidden state of the LSTM. And finally, it passes the hidden state to the second half of Seq2Seq.

The layers of The Encoder are:
- The Embedding Layer
- The LSTM
- Dropout Layer (optional)

The parameters of The Encoder are:

- The input size
- The hidden size
- The embedding size

ref:
https://github.com/bentrevett/pytorch-seq2seq/blob/master/1%20-%20Sequence%20to%20Sequence%20Learning%20with%20Neural%20Networks.ipynb
https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html

In [23]:
# Create the Encoder
class Encoder(nn.Module):
    
    def __init__(self, input_size, hidden_size, embedding_size, n_layers, dropout):
        super(Encoder, self).__init__()
        
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.embedding_size = embedding_size
        self.n_layers = n_layers
        
        # self.embedding provides a vector representation of the inputs to our model
        self.embedding = nn.Embedding(self.input_size, self.embedding_size)
        
        # self.lstm, accepts the vectorized input and passes a hidden state
        self.lstm = nn.LSTM(self.embedding_size, self.hidden_size, n_layers, dropout = dropout)
        
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, i):
        
        '''
        Inputs: i, the src vector
        Outputs: o, the encoder outputs
                h, the hidden state
                c, the cell state
        '''
        #Inputs = [input_size, batch size]
        embedded = self.dropout(self.embedding(i))
        
        #embedded = [input_size, batch size, embedding_size]
        o, (h, c) = self.lstm(embedded)
        
        #outputs = [input_size, batch size, hidden_size * n directions]
        #hidden = [n_layers * n directions, batch size, hidden_size]
        #cell = [n_layers * n directions, batch size, hidden_size]
        
        #outputs are always from the top hidden layer
        
        return o, h, c


## Step 3: Create the Decoder
The Decoder's job is to output a prediction based on the hidden state of The Encoder. It combines this with information from the N-1 prediction to create an output.

The layers of The Decoder are:

- The Embedding Layer
- The LSTM
- The Linear Output Layer

The parameters of The Encoder are:

- The output size
- The hidden size
- The embedding size

ref: https://github.com/bentrevett/pytorch-seq2seq/blob/master/1%20-%20Sequence%20to%20Sequence%20Learning%20with%20Neural%20Networks.ipynb

In [24]:
# Create the Decoder
class Decoder(nn.Module):
      
    def __init__(self, hidden_size, output_size, embedding_size, n_layers, dropout):
        super(Decoder, self).__init__()
        
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.embedding_size = embedding_size
        self.n_layers = n_layers
        
        # self.embedding provides a vector representation of the target to our model
        self.embedding = nn.Embedding(self.output_size, self.embedding_size)
        
        # self.lstm, accepts the embeddings and outputs a hidden stat
        self.lstm = nn.LSTM(self.embedding_size, self.hidden_size, n_layers, dropout=dropout)

        # self.ouput, predicts on the hidden state via a linear output layer     
        self.out = nn.Linear(self.hidden_size, self.output_size)
        self.softmax = nn.LogSoftmax(dim= 1)
        
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, i, h, c):
        
        '''
        Inputs: i, the target vector
        Outputs: o, the prediction
                h, the hidden state
        '''
        # as we are only decoding one token at a time, the input tokens will always have a sequence length of 1
#         i = i.unsqueeze(0)
        x = self.dropout(self.embedding(i))
        x = x.view(1, 1, -1)
        x, (h, c) = self.lstm(x, (h, c))
        x = self.softmax(self.out(x[0]))
        
#         output, (h, c) = self.lstm(embedded, (h, c))

        #output = [seq len, batch size, hid dim * n directions]
        #hidden = [n layers * n directions, batch size, hid dim]
        #cell = [n layers * n directions, batch size, hid dim]
        
        #seq len and n directions will always be 1 in the decoder, therefore:
        #output = [1, batch size, hid dim]
        #hidden = [n layers, batch size, hid dim]
        #cell = [n layers, batch size, hid dim]
        
#         o = self.out(output.squeeze(0))
#         o = self.softmax(self.output(x[0]))
        
        return x, h


## Step 4: Combine into a Seq2Seq Architecture
This will handle:

- receiving the input/source sentence
- using the encoder to produce the context vectors
- using the decoder to produce the predicted output/target sentence

During each iteration of the loop, we:
- pass the input, previous hidden and previous cell states into the decoder
- receive a prediction, next hidden state and next cell state from the decoder
- place our prediction, output in our tensor of predictions(outputs)
- decide if we are going to "teacher force" or not
    - if we do, the next input is the ground-truth next token in the sequence
    - if we don't, the next input is the predicted next token in the sequence, which we get by doing an argmax over the output tensor
- Once we've made all of our predictions, we return our tensor full of predictions(outputs)

In [25]:
# Combine them into a Seq2Seq Architecture
class Seq2Seq(nn.Module):
    
    def __init__(self, encoder_input_size, encoder_hidden_size, encoder_embedding_size, encoder_n_layers, encoder_dropout,\
                 decoder_hidden_size, decoder_output_size, decoder_embedding_size, decoder_n_layers, decoder_dropout):

        super(Seq2Seq, self).__init__()
     
        self.encoder = Encoder(encoder_input_size, encoder_hidden_size, encoder_embedding_size, \
                               encoder_n_layers, encoder_dropout)
        self.decoder = Decoder(decoder_hidden_size, decoder_output_size, decoder_embedding_size, \
                               decoder_n_layers, decoder_dropout)
    
    def forward(self, src, trg, src_len, trg_len, teacher_forcing_ratio = 0.5):      
        # teacher_forcing_ratio is probability to use teacher forcing
        # e.g. if teacher_forcing_ratio is 0.75 we use ground-truth inputs 75% of the time
            
        # create an outputs tensor that will store all of our predictions
#         batch_size = trg.shape[1]
#         trg_len = trg.shape[0]
#         trg_vocab_size = self.decoder.output_size
        
        # tensor to store decoder outputs
#         output = torch.zeros(trg_len, batch_size, trg_vocab_size).to(device)
        output = {'decoder_output':[]}
        
        # feed the input/source sentence, src, into the encoder and receive out final hidden and cell states
        # last hidden state of the encoder is used as the initial hidden state of the decoder
        encoder_output, h, c = self.encoder(src)
        
        # first input to the decoder is the <sos> tokens, ss our trg tensor already has the <sos> token appended
        # We know how long our target sentences should be (trg_len), so we loop that many times
        # The last token input into the decoder is the one before the <eos> token 
        # - the <eos> token is never input into the decoder
        
#         decoder_input = trg[0,:]
        decoder_input = torch.Tensor([[SOS_token]]).long().to(device) # 0 = SOS_token
        
        for t in range(1, trg_len):
            
            #insert input token embedding, previous hidden and previous cell states
            #receive output tensor (predictions) and new hidden and cell states            
            decoder_output, h = self.decoder(decoder_input, h, c)
            
            #place predictions in a tensor holding predictions for each token
#             output[t] = decoder_output
            output['decoder_output'].append(decoder_output)
            
            #decide if we are going to use teacher forcing or not
            teacher_force = random.random() < teacher_forcing_ratio

            #get the highest predicted token from our predictions
            top1 = decoder_output.argmax(1)

            #if teacher forcing, use actual next token as next input
            #if not, use predicted token
            if self.training: # Model not in eval mode
                decoder_input = trg[t] if teacher_force else top1
            else:
                _, top_index = decoder_output.data.topk(1)
                decoder_input = top_index.squeeze().detach() 
                
        return output


## Step 5: Train & evaluate model

ref: https://learn.udacity.com/nanodegrees/nd101/parts/cd1822/lessons/23e85aa3-ecde-4dc6-ae90-dc7144383206/concepts/e1424b2b-4627-4953-b342-78b4c444478a

In [50]:
# nitialize the model
# the input and output dimensions are defined by the size of the vocabulary
# the embedding dimesions and dropout for the encoder and decoder can be different, 
# but the size of the hidden/cell states must be the same.
# then define the encoder, decoder and then Seq2Seq model, which we place on the device.

encoder_input_size = vocab_src.n_words
decoder_output_size = vocab_trg.n_words
encoder_embedding_size = 256
decoder_embedding_size = 256
learning_rate = 0.01
batch_size = 64
hidden_size = 512
n_layers = 2
encoder_dropout = 0.3
decoder_dropout = 0.3
epochs = 50

model_path = 'seq2seq.pt'

model = Seq2Seq(encoder_input_size, hidden_size, encoder_embedding_size, n_layers, encoder_dropout,\
                 hidden_size, decoder_output_size, decoder_embedding_size, n_layers, decoder_dropout)

In [51]:
# define optimizer, which we use to update our parameters in the training loop. 
# optimizer = optim.Adam(model.parameters(), lr=learning_rate)
optimizer = optim.SGD(model.parameters(), lr=learning_rate)

# define loss function
# by passing the index of the <pad> token as the ignore_index argument we ignore the loss whenever the target token 
# is a padding token
# criterion = nn.CrossEntropyLoss(ignore_index = SOS_token)
criterion = nn.NLLLoss()


In [52]:
def train(source_data, target_data, model, epochs, batch_size, print_every, optimizer, criterion):
    
    model.to(device)
    
    total_training_loss = 0
    total_valid_loss = 0
    loss = 0
#     best_valid_loss = 0
    
    # use cross validation
    kf = KFold(n_splits=epochs, shuffle=True)

    for e, (train_id, valid_id) in enumerate(kf.split(source_data), 1):
        model.train()
        for i in range(0, len(train_id)):

            src = source_data[i]
            trg = target_data[i]

            output = model(src, trg, src.size(0), trg.size(0))

            current_loss = 0
            for (s, t) in zip(output["decoder_output"], trg): 
                current_loss += criterion(s, t)

            loss += current_loss
            total_training_loss += (current_loss.item() / trg.size(0)) # add the iteration loss

            if i % batch_size == 0 or i == (len(train_id)-1):
                loss.backward()
                optimizer.step()
                optimizer.zero_grad()
                loss = 0

        # validation set 
        model.eval()
        for i in range(0, len(valid_id)):
            src = source_data[i]
            trg = target_data[i]

            output = model(src, trg, src.size(0), trg.size(0))

            current_loss = 0
            for (s, t) in zip(output["decoder_output"], trg): 
                current_loss += criterion(s, t)

            total_valid_loss += (current_loss.item() / trg.size(0)) # add the iteration loss
        
#         if total_valid_loss < best_valid_loss:
#             best_valid_loss = total_valid_loss
#             torch.save(model.state_dict(), model_path)

        if e % print_every == 0:
            training_loss_average = total_training_loss / (len(train_id)*print_every)
            validation_loss_average = total_valid_loss / (len(valid_id)*print_every)
            print(f"{e}/{epochs} Epoch  -  Training Loss = {training_loss_average:.4f}  -  Validation Loss = {validation_loss_average:.4f}")
            total_training_loss = 0
            total_valid_loss = 0 

In [53]:
# train the model
train(source_data = train_src,
      target_data = train_trg,
      model = model,
      print_every = 5,
      epochs = epochs,
      batch_size = batch_size,
      optimizer = optimizer, 
      criterion = criterion)

5/50 Epoch  -  Training Loss = 5.0446  -  Validation Loss = 5.1252
10/50 Epoch  -  Training Loss = 4.3634  -  Validation Loss = 4.5126
15/50 Epoch  -  Training Loss = 3.0252  -  Validation Loss = 3.0896
20/50 Epoch  -  Training Loss = 1.3847  -  Validation Loss = 1.2616
25/50 Epoch  -  Training Loss = 0.5130  -  Validation Loss = 0.3478
30/50 Epoch  -  Training Loss = 0.2533  -  Validation Loss = 0.1525
35/50 Epoch  -  Training Loss = 0.1558  -  Validation Loss = 0.0818
40/50 Epoch  -  Training Loss = 0.1039  -  Validation Loss = 0.0458
45/50 Epoch  -  Training Loss = 0.0734  -  Validation Loss = 0.0264
50/50 Epoch  -  Training Loss = 0.0500  -  Validation Loss = 0.0195


In [56]:
# Test loss

# save the model
torch.save(model, model_path)

# load the model
# model.load_state_dict(torch.load(model_path))
model = torch.load(model_path, map_location=torch.device('cuda'))

model.eval()

# test_loss = evaluate(model, SRC_test, TRG_test, criterion)
total_valid_loss = 0
    
for i in range(0, len(test_src)):
    src = test_src[i]
    trg = test_trg[i]

    output = model(src, trg, src.size(0), trg.size(0))

    current_loss = 0
    for (s, t) in zip(output["decoder_output"], trg): 
        current_loss += criterion(s, t)

    total_valid_loss += (current_loss.item() / trg.size(0)) # add the iteration loss

test_loss_average = total_valid_loss / (len(test_src))
print(f"Test Loss = {test_loss_average:.4f}")


Test Loss = 6.5190


## Step 6: Interact with the Chatbot

In [54]:
# simple interactive interfaces
print("Type 'stop' to exit chat")
ANSWER_LENGTH = 12

while True:
    src = input(">")
    
    # If STOP in input, stop script
    if "stop" == src.strip():
        break
    
    # get the answer
    try:
        # clean the input
        src = toTensor(vocab_src, " ".join(prepare_text(src)))
    except:
        print("Error: Word not in the vocabulary.")
        break
    
    answer_words = []
    
    output = model(src, None, src.size(0), max_trg)

    for tensor in output['decoder_output']:

        _, top_token = tensor.data.topk(1)
        if top_token.item() == 1:
            break
        else:
            word = vocab_trg.index2word[top_token.item()]
            answer_words.append(word)

    # write out an answer for user        
    print("<", ' '.join(answer_words), "\n")
    

Type 'stop' to exit chat
>What is the Grotto at Notre Dame?
< old observ center of stanford bangladesh bangladesh and and bangladesh bangladesh mission of of bangladesh bangladesh of bangladesh bangladesh and and holi holi and and to to bangladesh bangladesh and and and holi bangladesh of of bangladesh bangladesh and and mission bangladesh 

>What is in front of the Notre Dame Main Building?
< a copper statu of christ christ christ of christ christ christ of of christ christ christ of christ christ christ of of christ christ copper of christ christ christ of of christ christ copper of christ christ christ christ christ christ christ 

>What sits on top of the Main Building at Notre Dame? 
< theodor m hesburgh tunnel and and and and and and and live in in in in in and and and and and live in in in in in bangladesh and and bangladesh and and and and and live in in in in 

>The Basilica of the Sacred heart at Notre Dame is beside to which structure? 
< the main build build build main main