# LSTM Bot

## Project Overview

In this project, you will build a chatbot that can converse with you at the command line. The chatbot will use a Sequence to Sequence text generation architecture with an LSTM as it's memory unit. You will also learn to use pretrained word embeddings to improve the performance of the model. At the conclusion of the project, you will be able to show your chatbot to potential employers.

Additionally, you have the option to use pretrained word embeddings in your model. We have loaded Brown Embeddings from Gensim in the starter code below. You can compare the performance of your model with pre-trained embeddings against a model without the embeddings.



---



A sequence to sequence model (Seq2Seq) has two components:
- An Encoder consisting of an embedding layer and LSTM unit.
- A Decoder consisting of an embedding layer, LSTM unit, and linear output unit.

The Seq2Seq model works by accepting an input into the Encoder, passing the hidden state from the Encoder to the Decoder, which the Decoder uses to output a series of token predictions.

## Dependencies

- Pytorch
- Numpy
- Pandas
- NLTK
- Gzip
- Gensim


Please choose a dataset from the Torchtext website. We recommend looking at the Squad dataset first. Here is a link to the website where you can view your options:

- https://pytorch.org/text/stable/datasets.html





In [1]:
import gensim
import nltk
import numpy as np
import pandas as pd
# import gzip
import torch
from nltk.corpus import brown

In [2]:
# Choice of size of SQuAD data

DATA_SIZE = 5000

# Choice of model parameters

hidden_size = 100
learning_rate = 0.01
epochs = 65

In [3]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

print(device)

cuda


In [4]:
w2v = gensim.models.Word2Vec.load('brown.embedding')

In [5]:
from torchtext.datasets import SQuAD2
from torch.utils.data.datapipes.iter import IterableWrapper
from nltk.tokenize import wordpunct_tokenize

# from nltk.tokenize import sent_tokenize

def loadDF(path):
    '''
  
      You will use this function to load the dataset into a Pandas Dataframe for processing.

    '''
    data_pipe = SQuAD2(path, split='train')
    
    # Convert DataPipe to an iterable
    iterable_data = IterableWrapper(data_pipe)
    
    # Convert DataPipe to dictionary with lists of question and answers
    data_dict = {
        'Question': [],
        'Answers': []
    }
    
    for _, question, answers, _ in iterable_data:
        data_dict['Question'].append(question)
        data_dict['Answers'].append(answers)
        
    
    # Convert dictionary to pandas dataframe
    data_df = pd.DataFrame(data_dict)
    
    return data_df


def prepare_text(sentence):
    
    '''

    Our text needs to be cleaned with a tokenizer. This function will perform that task.
    https://www.nltk.org/api/nltk.tokenize.html

    '''
    if isinstance(sentence, list):

        # Answers are contained in string within a list
        tokens = wordpunct_tokenize(sentence[0].lower())
    else:
        tokens = wordpunct_tokenize(sentence.lower())
    
    return tokens


## Step 1: Build your Vocabulary & create the Word Embeddings

* The most important part of this step is to create your Vocabulary object using a corpus of data drawn from TorchText.

(Extra Credit)

* Use Gensim to extract the word embeddings from one of its corpus'.
* Use NLTK and Gensim to create a function to clean your text and look up the index of a word's embeddings.

In [6]:
data_df = loadDF('data')

data_df = data_df.iloc[:DATA_SIZE, :]

# Remove questions that did not have answers
empty_list = ['']
data_df = data_df = data_df[data_df.Answers.apply(lambda x: x != empty_list)]

In [7]:
data_df['Question'] = data_df['Question'].apply(prepare_text)
data_df['Answers'] = data_df['Answers'].apply(prepare_text)

In [8]:
data_df.head(20)

Unnamed: 0,Question,Answers
0,"[when, did, beyonce, start, becoming, popular, ?]","[in, the, late, 1990s]"
1,"[what, areas, did, beyonce, compete, in, when,...","[singing, and, dancing]"
2,"[when, did, beyonce, leave, destiny, ', s, chi...",[2003]
3,"[in, what, city, and, state, did, beyonce, gro...","[houston, ,, texas]"
4,"[in, which, decade, did, beyonce, become, famo...","[late, 1990s]"
5,"[in, what, r, &, b, group, was, she, the, lead...","[destiny, ', s, child]"
6,"[what, album, made, her, a, worldwide, known, ...","[dangerously, in, love]"
7,"[who, managed, the, destiny, ', s, child, grou...","[mathew, knowles]"
8,"[when, did, beyoncé, rise, to, fame, ?]","[late, 1990s]"
9,"[what, role, did, beyoncé, have, in, destiny, ...","[lead, singer]"


In [9]:
# Concatenate all lists in the 'Question' column
Q_vocab_list = sum(data_df['Question'], [])
A_vocab_list = sum(data_df['Answers'], [])
# Return list of unique words
Q_vocab_list = list(set(Q_vocab_list))
A_vocab_list = list(set(A_vocab_list))
# Add the <EOS> token
A_vocab_list.append('<EOS>')

Q_vocab_list.sort()
A_vocab_list.sort()

In [10]:
def prepare_word_embeddings(vocab_list, w2v_index_to_key_list, w2v_wv_vectors):
    key_list = []
    new_embeddings = []
    
    # generic word embeddings for missing word
    # picked a random word ""
    rand_embedding = w2v_wv_vectors[5225]

    embedding_dim = w2v_wv_vectors.shape[1]

    # if word is found in the brown corpus, then use its embeddings, 
    # else randomly assign new word embedding
    for word in vocab_list:
        try:
            key_list.append(w2v_index_to_key_list.index(word))
        except:
            key_list.append(-9999)

    for key in key_list:
        if key != -9999:
            new_embeddings.append(w2v_wv_vectors[key])
        else:
            new_embeddings.append(rand_embedding)

    # convert a list of np.arrays to a stack of np.arrays
    new_embeddings = np.vstack(new_embeddings)
    
    return new_embeddings

Q_embeddings = prepare_word_embeddings(Q_vocab_list, w2v.wv.index_to_key, w2v.wv.vectors)
A_embeddings = prepare_word_embeddings(A_vocab_list, w2v.wv.index_to_key, w2v.wv.vectors)

In [11]:
class Vocab:
    def __init__(self, model, vocab_list):
        self.model = model
        
        self.words = vocab_list
        # contains dictionary with key, value (index, word)
        self.index2word = dict(enumerate(self.words))
        # contains dictionary with key, value (word, index)
        self.word2index = {word: index for index, word in self.index2word.items()}

    def indexWord(self, word):
        return self.word2index[word]

In [12]:
Q_vocab = Vocab(w2v, Q_vocab_list)
A_vocab = Vocab(w2v, A_vocab_list)

# Define the EOS TOKEN index as global variable
EOS_INDEX = A_vocab_list.index('<EOS>')

def encode_Q_sentence(words):
    output = [Q_vocab.indexWord(word) for word in words]
    return output

def encode_A_sentence(words):
    output = [A_vocab.indexWord(word) for word in words]
    output.append(EOS_INDEX)
    return output

data_df['Question_encoded'] = data_df['Question'].apply(encode_Q_sentence)
data_df['Answers_encoded'] = data_df['Answers'].apply(encode_A_sentence)

In [13]:
data_df.head(50)

Unnamed: 0,Question,Answers,Question_encoded,Answers_encoded
0,"[when, did, beyonce, start, becoming, popular, ?]","[in, the, late, 1990s]","[5296, 1498, 699, 4631, 659, 3690, 222]","[1973, 3597, 2216, 223, 417]"
1,"[what, areas, did, beyonce, compete, in, when,...","[singing, and, dancing]","[5293, 500, 1498, 699, 1130, 2510, 5296, 4408,...","[3356, 531, 1151, 417]"
2,"[when, did, beyonce, leave, destiny, ', s, chi...",[2003],"[5296, 1498, 699, 2853, 1469, 5, 4246, 967, 43...","[236, 417]"
3,"[in, what, city, and, state, did, beyonce, gro...","[houston, ,, texas]","[2510, 5293, 1015, 438, 4637, 1498, 699, 2253,...","[1919, 11, 3591, 417]"
4,"[in, which, decade, did, beyonce, become, famo...","[late, 1990s]","[2510, 5299, 1391, 1498, 699, 657, 1901, 222]","[2216, 223, 417]"
5,"[in, what, r, &, b, group, was, she, the, lead...","[destiny, ', s, child]","[2510, 5293, 3917, 4, 601, 2250, 5253, 4408, 4...","[1210, 4, 3202, 927, 417]"
6,"[what, album, made, her, a, worldwide, known, ...","[dangerously, in, love]","[5293, 381, 2981, 2352, 225, 5359, 2771, 518, ...","[1153, 1973, 2299, 417]"
7,"[who, managed, the, destiny, ', s, child, grou...","[mathew, knowles]","[5303, 3013, 4887, 1469, 5, 4246, 967, 2250, 222]","[2383, 2172, 417]"
8,"[when, did, beyoncé, rise, to, fame, ?]","[late, 1990s]","[5296, 1498, 700, 4179, 4954, 1896, 222]","[2216, 223, 417]"
9,"[what, role, did, beyoncé, have, in, destiny, ...","[lead, singer]","[5293, 4201, 1498, 700, 2312, 2510, 1469, 5, 4...","[2223, 3355, 417]"


In [14]:
data_df.tail(50)

Unnamed: 0,Question,Answers,Question_encoded,Answers_encoded
4950,"[what, was, the, name, of, west, ', s, fashion...","[dw, kanye, west]","[5293, 5253, 4887, 3241, 3374, 5291, 5, 4246, ...","[1314, 2128, 3880, 417]"
4951,"[the, fashion, line, shown, in, paris, receive...","[mixed, -, to, -, negative]","[4887, 1910, 2904, 4440, 2510, 3508, 3984, 529...","[2485, 12, 3640, 12, 2581, 417]"
4952,"[on, what, day, did, west, release, his, secon...","[march, 6, ,, 2012]","[3396, 5293, 1372, 1498, 5291, 4063, 2381, 433...","[2363, 350, 11, 246, 417]"
4953,"[what, brand, struck, a, deal, with, kanye, an...",[adidas],"[5293, 773, 4696, 225, 1382, 5335, 2738, 438, ...","[449, 417]"
4954,"[how, many, "", seasons, "", of, clothing, did, ...",[3],"[2441, 3029, 0, 4332, 0, 3374, 1047, 1498, 273...","[280, 417]"
4955,"[what, were, the, shoes, designed, by, kanye, ...","[adidas, yeezy, boosts]","[5293, 5290, 4887, 4425, 1463, 829, 2738, 438,...","[449, 3971, 751, 417]"
4956,"[how, many, pairs, of, shoes, were, sold, in, ...",[9000],"[2441, 3029, 3490, 3374, 4425, 5290, 4526, 251...","[402, 417]"
4957,"[what, shoe, was, announced, on, twitter, by, ...","[adidas, yeezy, boosts]","[5293, 4424, 5253, 451, 3396, 5069, 829, 2738,...","[449, 3971, 751, 417]"
4958,"[in, what, year, did, kanye, premier, his, sea...",[2015],"[2510, 5293, 5385, 1498, 2738, 3748, 2381, 433...","[249, 417]"
4959,"[what, album, release, coincided, with, kanye,...","[the, life, of, pablo]","[5293, 381, 4063, 1059, 5335, 2738, 5, 4246, 5...","[3597, 2253, 2673, 2741, 417]"


### Sanity Check on Question and Answer Embeddings

In [15]:
# the embedding weights are the same for both question and answer. 
# This is expected as the same word "in" is looked up in the Q and A vector representation

print(Q_embeddings[Q_vocab.word2index['in']])
print(A_embeddings[A_vocab.word2index['in']])

[ 0.08612397  0.4683421   1.1202817   0.89838433 -0.02509592 -0.59979767
  1.5051852   0.72202337 -0.48813638  0.24514244  0.4102122   0.0113057
  0.5960555  -0.45577106  0.14036338  0.21082228 -1.4051661  -0.15541624
 -1.0806677  -0.8043154   0.07190805 -0.16877684  0.3798714  -0.37297195
 -2.3234746  -0.04577581  0.57446414 -0.18885757 -1.1524972   0.47721928
  0.17444377 -0.31108376  0.2087786   0.70430166  0.42223585  0.41096598
  0.70646995  0.24673785 -0.84970194  0.5304875   0.08908914 -0.58689964
 -0.4281494   0.71896255  1.9707432  -0.2528021  -0.08564414 -0.0645953
  0.9101291  -1.0712231   0.22804962  0.50006104  0.33930728 -0.13008718
 -1.3227125   0.20961478  0.39179873  0.84373707  0.30207354 -0.3978397
  0.662881    0.6367943  -1.0932344  -1.1294899   0.26111436  0.40094912
 -0.16343357  1.5931644  -0.26111376 -0.3025634  -0.04134805 -0.36030748
 -0.04972831  0.71029353 -0.52045906  0.856472    0.07366011  0.3339381
  0.12269638 -0.8888106  -0.79770845  0.5684257  -0.501

### Convert list of indices in Question and Answers to torch tensor 

In [16]:
SRC = data_df['Question_encoded'].tolist()

TRG = data_df['Answers_encoded'].tolist()

## Step 2: Create the Encoder

* A Seq2Seq architecture consists of an encoder and a decoder unit. You will use Pytorch to build a full Seq2Seq model.

* The first step of the architecture is to create an encoder with an LSTM unit.

(Extra Credit)

* Load your pretrained embeddings into the LSTM unit.

In [17]:
import torch.nn as nn

In [18]:
Q_embeddings = torch.Tensor(Q_embeddings)
A_embeddings = torch.Tensor(A_embeddings)

_, Q_embedding_dim = Q_embeddings.shape
_, A_embedding_dim = A_embeddings.shape

In [19]:
class Encoder(nn.Module):
    def __init__(self, input_size, hidden_size, pretrained_embeddings=None):
        super(Encoder, self).__init__()
        
        self.input_size = input_size
        self.hidden_size = hidden_size
        
        # self.embedding provides a vector representation of the inputs to our model
        if pretrained_embeddings is not None:
            self.embedding = nn.Embedding.from_pretrained(pretrained_embeddings)
        else:
            self.embedding = nn.Embedding(input_size, hidden_size)
            
        self.lstm = nn.LSTM(self.hidden_size, self.hidden_size)

    def forward(self, x, hidden, cell_state):
        x = self.embedding(x)
        x = x.view(1, 1, -1)
        x, (hidden, cell_state) = self.lstm(x, (hidden, cell_state))
        return x, hidden, cell_state

## Step 3: Create the Decoder

* The second step of the architecture is to create a decoder using a second LSTM unit.

In [20]:
import torch.nn.functional as F

class Decoder(nn.Module):
    def __init__(self, hidden_size, output_size, pretrained_embeddings=None):
        super(Decoder, self).__init__()
        
        self.hidden_size = hidden_size
        self.output_size = output_size
        
        # self.embedding is a vector representation of the target to our model
        if pretrained_embeddings is not None:
            self.embedding = nn.Embedding.from_pretrained(pretrained_embeddings)
        else:
            self.embedding = nn.Embedding(output_size, hidden_size)
            
        self.lstm = nn.LSTM(self.hidden_size, self.hidden_size)
        self.fc = nn.Linear(self.hidden_size, self.output_size)

    def forward(self, x, hidden, cell_state):
        x = self.embedding(x)
        x = x.view(1, 1, -1)
        x, (hidden, cell_state) = self.lstm(x, (hidden, cell_state))
        x = self.fc(x[0])
        x = F.log_softmax(x, dim=1)
        
        return x, hidden, cell_state

## Step 4: Combine them into a Seq2Seq Architecture

* To finalize your model, you will combine the encoder and decoder units into a working model.

* The Seq2Seq2 model must be able to instantiate the encoder and decoder. Then, it will accept the inputs for these units and manage their interaction to get an output using the forward pass function.

In [34]:
import random

class Seq2Seq(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, Q_embeddings=None, A_embeddings=None):
        super(Seq2Seq, self).__init__()
        
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        
        self.encoder = Encoder(self.input_size, self.hidden_size, Q_embeddings)
        self.decoder = Decoder(self.hidden_size, self.output_size, A_embeddings)
        
    def forward(self, src, trg, src_len, trg_len, teacher_force=1, top_k = 1):
        
        output = {
            'decoder_output':[]
        }
        
        encoder_hidden = torch.zeros([1, 1, self.hidden_size]).to(device)
        cell_state = torch.zeros([1, 1, self.hidden_size]).to(device)  
        
        for i in range(src_len):
            encoder_output, encoder_hidden, cell_state = self.encoder(src[i], encoder_hidden, cell_state)

        decoder_input = torch.Tensor([[0]]).long().to(device)
        decoder_hidden = encoder_hidden
        
        for i in range(trg_len):
            decoder_output, decoder_hidden, cell_state = self.decoder(decoder_input, decoder_hidden, cell_state)
            output['decoder_output'].append(decoder_output)
            
            if self.training:
                decoder_input = trg[i] if random.random() > teacher_force else decoder_output.argmax(1)
            else:
                _, top_index = decoder_output.data.topk(top_k)
                decoder_input = top_index.squeeze().detach()
                
        return output

## Step 5: Train & evaluate your model

* Finally you will train and evaluate your model using a Pytorch training loop.

### Training the model

In [23]:
from sklearn.model_selection import KFold
    
def get_train_val_indices(data, total_epochs, random_seed=1):
    
    # Create the KFold object with the random seed 
    kf = KFold(n_splits=total_epochs, shuffle=True, random_state=random_seed)
    # Get all train and val indices at once
    all_train_indices, all_val_indices = zip(*kf.split(data))
    
    return all_train_indices, all_val_indices

In [24]:
import torch.nn as nn

def train(source_data, target_data, output_size, model, epochs, print_every, learning_rate, device, top_k=1):

#     optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
#     criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
    criterion = nn.NLLLoss()
    model.to(device)
    
    total_train_loss = 0
    total_val_loss = 0
    loss = 0
    

    # Use KFold to obtain the train and val indices for all epochs all_train_indices, all_val_indices
    train_indices, val_indices = get_train_val_indices(source_data, epochs)
    
   
    for epoch in range(epochs):
        # train set
        model.train()
        
        current_train_epoch_loss = 0
        current_val_epoch_loss = 0
        
        train_size = len(train_indices[epoch])
        for i in range(train_size):
            train_index = train_indices[epoch][i]
            
            src = source_data[train_index]
            trg = target_data[train_index]
            
            output = model(
                src = src, 
                trg = trg, 
                src_len = src.size(0), 
                trg_len = trg.size(0), 
                top_k = top_k)
            
            output = torch.stack(output["decoder_output"]).squeeze()
            loss = criterion(output, trg)

            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
            
            current_train_epoch_loss += loss.item()  # Accumulate the loss item-wise
        
        current_train_epoch_loss /= train_size  # Calculate average loss per batch
        total_train_loss += current_train_epoch_loss
    
        # validation set 
        model.eval()
        
        val_size = len(val_indices[epoch])
        for j in range(val_size):
            val_index = val_indices[epoch][j]
            
            src = source_data[val_index]
            trg = target_data[val_index]


            output = model(
                src = src, 
                trg = trg, 
                src_len = src.size(0), 
                trg_len = trg.size(0), 
                top_k = top_k)
            
            output = torch.stack(output["decoder_output"]).squeeze()
            
            loss = criterion(output, trg)
            
            current_val_epoch_loss += loss.item()  # Accumulate the loss item-wise
        
        current_val_epoch_loss /= val_size  # Calculate average loss per batch
        total_val_loss += current_val_epoch_loss
        
        if (epoch + 1) % print_every == 0:
            train_loss_average = total_train_loss / (print_every)
            val_loss_average = total_val_loss / (print_every)
            print("{}/{} Epoch  -  Training Loss = {:.4f}  -  Validation Loss = {:.4f}".format((epoch + 1), epochs, train_loss_average, val_loss_average))
            total_train_loss = 0
            total_val_loss = 0
        
    del train_indices, val_indices

In [25]:
input_size = len(Q_vocab_list)
output_size = len(A_vocab_list)

In [26]:
SRC_tensor = [torch.tensor(src_items).to(device) for src_items in SRC]
TRG_tensor = [torch.tensor(trg_items).to(device) for trg_items in TRG]

In [27]:
# without pretrained embeddings
seq2seq = Seq2Seq(input_size, hidden_size, output_size)

In [28]:
train(source_data = SRC_tensor,
      target_data = TRG_tensor,
      output_size = output_size,
      model = seq2seq,
      epochs = epochs,
      print_every = 1,
      learning_rate=learning_rate,
      device = device,
      top_k = 1)

1/65 Epoch  -  Training Loss = 5.8328  -  Validation Loss = 5.6909
2/65 Epoch  -  Training Loss = 5.4950  -  Validation Loss = 5.4832
3/65 Epoch  -  Training Loss = 5.3877  -  Validation Loss = 5.3397
4/65 Epoch  -  Training Loss = 5.3194  -  Validation Loss = 5.2804
5/65 Epoch  -  Training Loss = 5.2691  -  Validation Loss = 5.3494
6/65 Epoch  -  Training Loss = 5.2280  -  Validation Loss = 5.3965
7/65 Epoch  -  Training Loss = 5.1998  -  Validation Loss = 5.2261
8/65 Epoch  -  Training Loss = 5.1717  -  Validation Loss = 5.2898
9/65 Epoch  -  Training Loss = 5.1507  -  Validation Loss = 5.1738
10/65 Epoch  -  Training Loss = 5.1322  -  Validation Loss = 5.0322
11/65 Epoch  -  Training Loss = 5.1067  -  Validation Loss = 5.4198
12/65 Epoch  -  Training Loss = 5.0905  -  Validation Loss = 5.2833
13/65 Epoch  -  Training Loss = 5.0761  -  Validation Loss = 5.1627
14/65 Epoch  -  Training Loss = 5.0591  -  Validation Loss = 5.1888
15/65 Epoch  -  Training Loss = 5.0436  -  Validation Los

In [29]:
model_path = 'seq2seq_v1.pt'

torch.save(seq2seq, model_path)

In [30]:
seq2seq = torch.load(model_path, map_location=torch.device('cuda'))
seq2seq.eval()

Seq2Seq(
  (encoder): Encoder(
    (embedding): Embedding(5437, 100)
    (lstm): LSTM(100, 100)
  )
  (decoder): Decoder(
    (embedding): Embedding(4020, 100)
    (lstm): LSTM(100, 100)
    (fc): Linear(in_features=100, out_features=4020, bias=True)
  )
)

## Step 6: Interact with the Chatbot

* Demonstrate your chatbot by converting the outputs of the model to text and displaying it's responses at the command line.

## Version 1 - without trained embeddings

In [31]:
def sample(sentence, trg, vocab, model, top_k=5):
    
    try:
        sentence = prepare_text(sentence)
        src = encode_Q_sentence(sentence)
    except:
        print("Warning: Word does not exist in vocabulary!")
        return
    
    answer_words = []
    
    src = [torch.tensor(s).to(device) for s in src]
    src = torch.tensor(src).to(device)

    
    output = model(src, trg, src.size(0), len(trg))

    for o in output['decoder_output']:
        
        top_v, top_i = o.data.topk(top_k)
        top_v = torch.exp(top_v)
        sampled_top = torch.multinomial(top_v/top_v.sum(), 1, replacement=True)
        top_i = top_i[0][sampled_top.item()]
        
        if top_i.item() == EOS_INDEX:
            # when EOS is reached
            break
        else:
            word = vocab.index2word[top_i.item()]
            answer_words.append(word)
            
    print("<", ' '.join(answer_words), "\n")

In [32]:
print("Type 'exit' to finish the chat.\n", "-"*30, '\n')
while (True):
    src = input("> ")
    if src.strip() == "exit":
        break
    sample(src, TRG_tensor, A_vocab, seq2seq)

Type 'exit' to finish the chat.
 ------------------------------ 

> which decade did beyonce become famous?
< late 1990s 

> when did beyonce become popular?
< 1990s 1990s 

> which artist did beyonce marry?
< jay z 

> when did beyonce take a hiatus?
< 21 

> what magazine rate beyonce as the most popular?
< heat 

> what race was beyonce's father?
< the mother carter 

> in what year kanye premier?
< 2013 

> what brand did kanye struck a deal with?
< adidas - 

> the fashion line in paris shown what review?
< the and , of 

> exit


## Version 2 - with trained embeddings

In [35]:
# with pretrained embeddings
seq2seq = Seq2Seq(input_size, hidden_size, output_size, Q_embeddings, A_embeddings)

In [36]:
train(source_data = SRC_tensor,
      target_data = TRG_tensor,
      output_size = output_size,
      model = seq2seq,
      epochs = epochs,
      print_every = 1,
      learning_rate=learning_rate,
      device = device,
      top_k = 1)

1/65 Epoch  -  Training Loss = 6.0508  -  Validation Loss = 5.7471
2/65 Epoch  -  Training Loss = 5.5484  -  Validation Loss = 5.5019
3/65 Epoch  -  Training Loss = 5.4314  -  Validation Loss = 5.3690
4/65 Epoch  -  Training Loss = 5.3635  -  Validation Loss = 5.3351
5/65 Epoch  -  Training Loss = 5.3163  -  Validation Loss = 5.4205
6/65 Epoch  -  Training Loss = 5.2785  -  Validation Loss = 5.4462
7/65 Epoch  -  Training Loss = 5.2516  -  Validation Loss = 5.2689
8/65 Epoch  -  Training Loss = 5.2238  -  Validation Loss = 5.3442
9/65 Epoch  -  Training Loss = 5.2025  -  Validation Loss = 5.2245
10/65 Epoch  -  Training Loss = 5.1842  -  Validation Loss = 5.0959
11/65 Epoch  -  Training Loss = 5.1584  -  Validation Loss = 5.4630
12/65 Epoch  -  Training Loss = 5.1409  -  Validation Loss = 5.3569
13/65 Epoch  -  Training Loss = 5.1254  -  Validation Loss = 5.1821
14/65 Epoch  -  Training Loss = 5.1070  -  Validation Loss = 5.2380
15/65 Epoch  -  Training Loss = 5.0913  -  Validation Los

In [37]:
model_path = 'seq2seq_v2.pt'

torch.save(seq2seq, model_path)

In [38]:
print("Type 'exit' to finish the chat.\n", "-"*30, '\n')
while (True):
    src = input("> ")
    if src.strip() == "exit":
        break
    sample(src, TRG_tensor, A_vocab, seq2seq)

Type 'exit' to finish the chat.
 ------------------------------ 

> which decade did beyonce become famous?
< graduation - 

> when did beyonce become popular?
< september 2015 , 2011 

> which artist did beyonce marry?
< jay z 

> when did beyonce take a hiatus?
< january 2005 

> what magazine rate beyonce as the most popular?
< yeezus - of 

> what race was beyonce's father?
< paris 

> in what year kanye premier?
< 1825 

> what brand did kanye struck a deal with?
< adidas 

> the fashion line in paris shown what review?
< coachella 

> exit
