# LSTM Bot

## Project Overview

In this project, you will build a chatbot that can converse with you at the command line. The chatbot will use a Sequence to Sequence text generation architecture with an LSTM as it's memory unit. You will also learn to use pretrained word embeddings to improve the performance of the model. At the conclusion of the project, you will be able to show your chatbot to potential employers.

Additionally, you have the option to use pretrained word embeddings in your model. We have loaded Brown Embeddings from Gensim in the starter code below. You can compare the performance of your model with pre-trained embeddings against a model without the embeddings.



---



A sequence to sequence model (Seq2Seq) has two components:
- An Encoder consisting of an embedding layer and LSTM unit.
- A Decoder consisting of an embedding layer, LSTM unit, and linear output unit.

The Seq2Seq model works by accepting an input into the Encoder, passing the hidden state from the Encoder to the Decoder, which the Decoder uses to output a series of token predictions.

## Dependencies

- Pytorch
- Numpy
- Pandas
- NLTK
- Gzip
- Gensim


Please choose a dataset from the Torchtext website. We recommend looking at the Squad dataset first. Here is a link to the website where you can view your options:

- https://pytorch.org/text/stable/datasets.html





In [1]:
from src.Data import loadDF, tokenizer, getPairs, add_symbols, create_word_embedding, add_symbols2
from src.Models import Seq2Seq, Encoder, Decoder
from src.Vocab import Vocab
from src.Train import train
from src.Evaluate import evaluate
from src.Chat import chat
from src.ValEarlyStop import ValidationLossEarlyStopping

import torch
from sklearn.model_selection import KFold
from torch.utils.data import DataLoader
import torch.nn as nn
import torch.optim as optim

import numpy as np
import random, math, time

import gensim
import nltk
from nltk.corpus import brown


%load_ext autoreload
%autoreload 2




In [2]:

SEED = 1234

random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
# torch.cuda.manual_seed(SEED)
# torch.backends.cudnn.deterministic = True

<torch._C.Generator at 0x7f94793452f0>

In [3]:

# nltk.download('brown')
# nltk.download('punkt')

print([" ".join(sent) for sent in brown.sents()[0:3]])

# Output, save, and load brown embeddings

# The default value of vector_size is 100.
# model = gensim.models.Word2Vec(brown.sents(), size=100)
# model.save('brown.embedding')

# i can use googlenews vector which doesn't need training but it is very very large
# model = gensim.models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)
w2v = gensim.models.Word2Vec.load('brown.embedding')


["The Fulton County Grand Jury said Friday an investigation of Atlanta's recent primary election produced `` no evidence '' that any irregularities took place .", "The jury further said in term-end presentments that the City Executive Committee , which had over-all charge of the election , `` deserves the praise and thanks of the City of Atlanta '' for the manner in which the election was conducted .", "The September-October term jury had been charged by Fulton Superior Court Judge Durwood Pye to investigate reports of possible `` irregularities '' in the hard-fought primary which was won by Mayor-nominate Ivan Allen Jr. ."]


In [4]:
data_df = loadDF('data')
# I will take only the first 5,000 Q&A to avoid CUDA out of memory error due to the large dataset
data_df = data_df.iloc[:5000, :]

  return train_df.append(validation_df)


In [5]:
data_df.describe()

Unnamed: 0,Question,Answer
count,5000,5000
unique,4983,3642
top,Who was Alexander Scriabin's teacher?,Manhattan
freq,2,21


In [6]:
data_df.head()

Unnamed: 0,Question,Answer
0,To whom did the Virgin Mary allegedly appear i...,Saint Bernadette Soubirous
1,What is in front of the Notre Dame Main Building?,a copper statue of Christ
2,The Basilica of the Sacred heart at Notre Dame...,the Main Building
3,What is the Grotto at Notre Dame?,a Marian place of prayer and reflection
4,What sits on top of the Main Building at Notre...,a golden statue of the Virgin Mary


In [7]:
data_df['Question'], data_df['Qtoken'] = data_df['Question'].apply(tokenizer).str
data_df['Answer'], data_df['Atoken'] = data_df['Answer'].apply(tokenizer).str

  data_df['Question'], data_df['Qtoken'] = data_df['Question'].apply(tokenizer).str
  data_df['Answer'], data_df['Atoken'] = data_df['Answer'].apply(tokenizer).str


In [8]:
data_df.head()

Unnamed: 0,Question,Answer,Qtoken,Atoken
0,to whom did the virgin mari alleg appear in 18...,saint bernadett soubir,"[to, whom, did, the, virgin, mari, alleg, appe...","[saint, bernadett, soubir]"
1,what is in front of the notr dame main build,a copper statu of christ,"[what, is, in, front, of, the, notr, dame, mai...","[a, copper, statu, of, christ]"
2,the basilica of the sacr heart at notr dame is...,the main build,"[the, basilica, of, the, sacr, heart, at, notr...","[the, main, build]"
3,what is the grotto at notr dame,a marian place of prayer and reflect,"[what, is, the, grotto, at, notr, dame]","[a, marian, place, of, prayer, and, reflect]"
4,what sit on top of the main build at notr dame,a golden statu of the virgin mari,"[what, sit, on, top, of, the, main, build, at,...","[a, golden, statu, of, the, virgin, mari]"


In [9]:
pairs_sequence = getPairs(data_df)
first_five_items = pairs_sequence[:5]
# import itertools
# first_five_items = list(itertools.islice(pairs_sequence, 5))
print(len(pairs_sequence))
first_five_items

5000


[['to whom did the virgin mari alleg appear in 1858 in lourd franc',
  'saint bernadett soubir'],
 ['what is in front of the notr dame main build', 'a copper statu of christ'],
 ['the basilica of the sacr heart at notr dame is besid to which structur',
  'the main build'],
 ['what is the grotto at notr dame', 'a marian place of prayer and reflect'],
 ['what sit on top of the main build at notr dame',
  'a golden statu of the virgin mari']]

In [10]:
# max_src, max_trg = getMaxLen(pairs_sequence)
# max_src, max_trg

In [11]:
data_vocab = Vocab(data_df)
print("total of unique questions and answers in dataset: ", len(data_vocab.text))
# A_vocab = Vocab()

data_vocab.build_word_vocab()

print({k: data_vocab.index2word[k] for k in list(data_vocab.index2word)[:10]})
print({k: data_vocab.word2index[k] for k in list(data_vocab.word2index)[:10]})
print(data_vocab.word_vocab[:10])
print(data_vocab['the'])
print(data_vocab['oov'])

# # build vocabularies for questions "source" and answers "target"
# for pair in pairs_sequence:
#     Q_vocab.add_words(pair[0])
#     A_vocab.add_words(pair[1])

total of unique questions and answers in dataset:  8505
raw-vocab: 6321
vocab-length: 6324
word2idx-length: 6324
{0: '<pad>', 1: '<sos>', 2: '<eos>', 3: 'the', 4: 'what', 5: 'of', 6: 'in', 7: 'did', 8: 'was', 9: 'to'}
{'<pad>': 0, '<sos>': 1, '<eos>': 2, 'the': 3, 'what': 4, 'of': 5, 'in': 6, 'did': 7, 'was': 8, 'to': 9}
['<pad>', '<sos>', '<eos>', 'the', 'what', 'of', 'in', 'did', 'was', 'to']
3
0


In [12]:
src_data_vocab = Vocab(data_df, source=True)
print("total of unique questions in dataset: ", len(src_data_vocab.text))
# A_vocab = Vocab()

src_data_vocab.build_word_vocab()

trg_data_vocab = Vocab(data_df, source=False)
print("total of unique answers in dataset: ", len(trg_data_vocab.text))
# A_vocab = Vocab()

trg_data_vocab.build_word_vocab()

# # build vocabularies for questions "source" and answers "target"
# for pair in pairs_sequence:
#     Q_vocab.add_words(pair[0])
#     A_vocab.add_words(pair[1])

total of unique questions in dataset:  4980
raw-vocab: 4504
vocab-length: 4507
word2idx-length: 4507
total of unique answers in dataset:  3525
raw-vocab: 4081
vocab-length: 4084
word2idx-length: 4084


In [13]:
print(len(pairs_sequence))

5000


In [14]:
# from torchdata.datapipes.iter import IterableWrapper
# tmp = IterableWrapper(pairs_sequence).sharding_filter()
# a, b = next(iter(tmp))
# print(a)
# print(b)


In [15]:
# source_data = [toTensor(data_vocab, pair[0]) for pair in pairs_sequence]
# target_data = [toTensor(data_vocab, pair[1]) for pair in pairs_sequence]

In [16]:
# print(source_data[10].shape)
# print(source_data[0].view(-1).shape)
# print(source_data[0])

In [17]:
weights_matrix, words_found = create_word_embedding(w2v.wv, word_vocab=src_data_vocab.word_vocab)

In [18]:
print("Total words found in glove vocab: {0} from {1}".format(words_found, len(data_vocab)))

Total words found in glove vocab: 0 from 6324


In [19]:
np.save('seq2seqEmb_vt.npy', weights_matrix)

practice to understand batch generation

In [20]:
print(data_vocab['whom'])

183


In [21]:
# print(len(source_data))

In [22]:
# def get_batches(tmp, batch_size, seq_length):

#     n_batches = int(tmp.shape[0]/(batch_size*seq_length))
#     print(n_batches)
#     tmp = tmp[:n_batches*batch_size*seq_length]
#     tmp = tmp.reshape(batch_size, -1)
#     print(tmp.shape)
#     print(tmp)
#     print((tmp[0:2, :-1]).shape)
#     ## now, we have to Iterate over the batches using a window of size seq_length
#     for n in range(0, tmp.shape[1], seq_length):
#         # The features
#         x = tmp[:, n:n+seq_length]
#         # The targets, shifted by one
#         y = np.zeros_like(x)
#         try:
#             y[:, :-1], y[:, -1] = x[:, 1:], tmp[:, n+seq_length]
#         except IndexError:
#             y[:, :-1], y[:, -1] = x[:, 1:], tmp[:, 0]
#         yield x, y

# seq_length = 4
# batch_size = 10
# tmp = np.array([55, 20, 48, 54, 76, 36, 12,  4, 81,  7,  7,  7, 57, 48, 54, 54, 65,
#         4, 66, 48, 69, 78,  9, 78, 36, 19,  4, 48, 12, 36,  4, 48,  9,  9,
#         4, 48,  9, 78, 17, 36, 47,  4, 36, 27, 36, 12, 65,  4, 44, 29, 20,
#        48, 54, 54, 65,  4, 66, 48, 69, 78,  9, 65,  4, 78, 19,  4, 44, 29,
#        20, 48, 54, 54, 65,  4, 78, 29,  4, 78, 76, 19,  4, 18, 46, 29,  7,
#        46, 48, 65,  0,  7,  7, 52, 27, 36, 12, 65, 76, 20, 78, 29, 213, 4])
# i = 0
# for d in (get_batches(tmp, batch_size, seq_length)):
#     i +=1
# print(f"The number of batches (iterations): {i}")
# # next(get_batches(tmp, batch_size, seq_length))

In [23]:
# from sklearn.model_selection import KFold

# kf = KFold(n_splits=10, shuffle=True)
# tmp = source_data[:20]
# for e, (train_index, test_index) in enumerate(kf.split(tmp), 1):
#     print(f"Iteration: {e}")
#     print(f"{train_index}->{len(train_index)}")
#     print(f"{test_index}->{len(test_index)}")

    
#     # break
    

In [24]:
from torch.nn.utils.rnn import pad_sequence
# import logging
# logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', handlers=[logging.StreamHandler()])


def generate_batch(batch):
    src_batch = []
    trg_batch = []
    src_len = []
    i = 0
    # print(type(batch))
    for src, trg in batch:
        i += 1
        #split sentence into tokens
        _, src_tokens = tokenizer(src)
        # logging.warning(f'iteration {i}:\n {src}'); # why prints 3 times while batch is 1?
        _, trg_tokens = tokenizer(trg)
        #convert tokens to index and to tensor and add <sos> and <eos> to each sentence
        src_tensor = add_symbols(torch.tensor(src_data_vocab(src_tokens)).long(), src_data_vocab)
        trg_tensor = add_symbols2(torch.tensor(trg_data_vocab(trg_tokens)).long(), trg_data_vocab)
        src_batch.append(src_tensor)
        #track length of each source sentence, not useful in this model. Will be useful in further models
        src_len.append(len(src_tensor))
        trg_batch.append(trg_tensor)
        # logging.warning(f'iteration {i}:\n {(src_tensor)}');
    src_len = torch.tensor(src_len, dtype = torch.int)
    src_batch = pad_sequence(src_batch, padding_value=src_data_vocab['<pad>'])
    trg_batch = pad_sequence(trg_batch, padding_value=trg_data_vocab['<pad>'])
    src_len, idx = torch.sort(src_len,descending=True)
    #src_len is not useful in this model
    # logging.warning(f'lsrc_batch:{len(src_batch)}')
    return src_batch, src_len, trg_batch

In [25]:
len(pairs_sequence)

5000

In [26]:
train_dataloader = DataLoader(pairs_sequence, batch_size=5, collate_fn=generate_batch)
sr, srlen, tg = (next(iter(train_dataloader)))
print(len(srlen))
# print(len(list(train_dataloader))) # it is grouped to 5 items per batch

5


In [27]:
sr

tensor([[   1,    1,    1,    1,    1],
        [   9,    4,    3,    4,    4],
        [ 155,   11, 1555,   11, 1288],
        [   7,    6,    5,    3,   22],
        [   3, 1287,    3, 1945,  338],
        [2666,    5, 1556,   26,    5],
        [1942,    3, 1085,   35,    3],
        [1943,   35,   26,   34,  242],
        [ 234,   34,   35,    2,   88],
        [   6,  242,   34,    0,   26],
        [2667,   88,   11,    0,   35],
        [   6,    2,  409,    0,   34],
        [1944,    0,    9,    0,    2],
        [1084,    0,   14,    0,    0],
        [   2,    0,  549,    0,    0],
        [   0,    0,    2,    0,    0]])

In [28]:
tg

tensor([[ 483,    6,    3,    6,    6],
        [1436,  738,  740, 1439, 1441],
        [1437,  739,  156,  741,  739],
        [   2,    5,    2,    5,    5],
        [   0, 1438,    0,  742,    3],
        [   0,    2,    0,    4,  743],
        [   0,    0,    0, 1440,  256],
        [   0,    0,    0,    2,    2]])

In [29]:
def epoch_time(start_time, end_time):
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time/60)
    elapsed_secs = int(elapsed_time - (elapsed_mins*60))
    
    return elapsed_mins, elapsed_secs

In [36]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
INPUT_DIM = len(src_data_vocab)
OUTPUT_DIM = len(trg_data_vocab)
ENC_EMB_DIM = 128 # 256//(2*2)
DEC_EMB_DIM = 128 # 256//(2*2)
HID_DIM = 128
N_LAYERS = 1
ENC_DROPOUT = 0.2
DEC_DROPOUT = 0.2
RNN_DROPOUT = 0.0

enc = Encoder(INPUT_DIM, ENC_EMB_DIM, HID_DIM, N_LAYERS, ENC_DROPOUT, RNN_DROPOUT, weights_matrix)
dec = Decoder(OUTPUT_DIM, DEC_EMB_DIM, HID_DIM, N_LAYERS, DEC_DROPOUT, RNN_DROPOUT)

model = Seq2Seq(enc, dec, device).to(device)

In [37]:
def init_weights(m):
    for name, param in m.named_parameters():
        if not name.startswith('encoder.embedding'):  # Exclude encoder embedding parameters
            nn.init.uniform_(param.data, -0.08, 0.08)
            # nn.init.zeros_(param.data)

model.apply(init_weights)

Seq2Seq(
  (encoder): Encoder(
    (embedding): Embedding(4507, 128)
    (rnn): LSTM(128, 128)
    (dropout): Dropout(p=0.2, inplace=False)
  )
  (decoder): Decoder(
    (embedding): Embedding(4084, 128)
    (rnn): LSTM(128, 128)
    (fc_out): Linear(in_features=128, out_features=4084, bias=True)
    (dropout): Dropout(p=0.2, inplace=False)
  )
)

In [38]:
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f'The model has {count_parameters(model):,} trainable parameters')

The model has 1,890,676 trainable parameters


In [45]:
optimizer = optim.Adam(model.parameters(), lr=0.001)
# optimizer = optim.SGD(model.parameters(), lr=0.1)
# optimizer = optim.Adamax(model.parameters())

# ignore the loss whenever the target token is a padding token.
TRG_PAD_IDX = trg_data_vocab['<pad>']
criterion = nn.CrossEntropyLoss(ignore_index=TRG_PAD_IDX)


In [46]:
# After multiple training using the adam optimizer and batch of 12 and 10 folds
# I have changed the batch size to 128
# changed the optimizer to sgd
# changed the fold to 20

# finally i have changed the lr in sgd from 0.01 to 0.1 but did not train to the end. i just saved the model
# after reloading the model, trained it for 40 epochs.
# then changed rhe optimizer to adamax with default learning rate

# model.load_state_dict(torch.load('seq2seq_adam_stemmer_brown_embedding.pt'))

In [47]:
N_EPOCHS = 40
CLIP = 1
BATCH_SIZE = 64

# speedup the training by reducing the size to grasp how the model is doing
half_length = len(pairs_sequence) // 1
cut_list = pairs_sequence[:half_length]

# Initialize K-Fold cross-validation
kf = KFold(n_splits=20, shuffle=True)

# Lists to store performance metrics for each fold
fold_metrics = []


# Loop through each fold
for fold_x, (train_indices, val_indices) in enumerate(kf.split(cut_list), 1):
    train_data = torch.utils.data.Subset(cut_list, train_indices)
    val_data = torch.utils.data.Subset(cut_list, val_indices)

    train_dataloader = DataLoader(train_data, batch_size=BATCH_SIZE, collate_fn=generate_batch)
    val_dataloader = DataLoader(val_data, batch_size=BATCH_SIZE, collate_fn=generate_batch)
    
    early_stop = ValidationLossEarlyStopping(patience=15, min_delta=0.0001)
    best_val_loss = float('inf')
    # Training/Validation loop
    for epoch in range(N_EPOCHS):

        start_time = time.time()

        train_loss, answer_token = train(model, train_dataloader, optimizer, criterion, CLIP, trg_data_vocab)
        val_loss = evaluate(model, val_dataloader, criterion)

        end_time = time.time()

        epoch_mins, epoch_secs = epoch_time(start_time, end_time)

        if val_loss < best_val_loss:
            best_val_loss = val_loss
            torch.save(model.state_dict(), f'seq2seq_fold_{fold_x:02}.pt')
        print(f'Epoch: {epoch+1:02} | Time: {epoch_mins}m {epoch_secs}s')
        print(f'\tTrain Loss: {train_loss:.3f} | Train PPL: {math.exp(train_loss):7.3f}')
        print(f'\t Val. Loss: {val_loss:.3f} |  Val. PPL: {math.exp(val_loss):7.3f}')
        
        if(early_stop(val_loss)):
            print(f"Repeated slow change in validation loss for {early_stop.patience} times.")
            print(f"Early stopping at epoch {epoch+1:02} ...")
            # print(f"The last output of the decoder: {answer_token}")
            break # let 
    fold_metrics.append(val_loss)
    break
    # create new model for the next fold
    # model = Seq2Seq(enc, dec, device).to(device)
    # model.apply(init_weights)

Training: 100%|██████████| 75/75 [00:04<00:00, 16.95it/s]
Evaluation: 100%|██████████| 4/4 [00:00<00:00, 25.27it/s]


Epoch: 01 | Time: 0m 4s
	Train Loss: 1.106 | Train PPL:   3.022
	 Val. Loss: 1.059 |  Val. PPL:   2.882


Training: 100%|██████████| 75/75 [00:04<00:00, 17.31it/s]
Evaluation: 100%|██████████| 4/4 [00:00<00:00, 30.14it/s]


Epoch: 02 | Time: 0m 4s
	Train Loss: 1.045 | Train PPL:   2.843
	 Val. Loss: 1.107 |  Val. PPL:   3.025


Training: 100%|██████████| 75/75 [00:04<00:00, 17.11it/s]
Evaluation: 100%|██████████| 4/4 [00:00<00:00, 25.97it/s]


Epoch: 03 | Time: 0m 4s
	Train Loss: 1.007 | Train PPL:   2.738
	 Val. Loss: 1.154 |  Val. PPL:   3.171


Training: 100%|██████████| 75/75 [00:04<00:00, 16.70it/s]
Evaluation: 100%|██████████| 4/4 [00:00<00:00, 22.96it/s]


Epoch: 04 | Time: 0m 4s
	Train Loss: 0.984 | Train PPL:   2.674
	 Val. Loss: 1.197 |  Val. PPL:   3.310


Training: 100%|██████████| 75/75 [00:04<00:00, 16.98it/s]
Evaluation: 100%|██████████| 4/4 [00:00<00:00, 23.99it/s]


Epoch: 05 | Time: 0m 4s
	Train Loss: 0.965 | Train PPL:   2.624
	 Val. Loss: 1.233 |  Val. PPL:   3.432


Training: 100%|██████████| 75/75 [00:04<00:00, 16.62it/s]
Evaluation: 100%|██████████| 4/4 [00:00<00:00, 23.20it/s]


Epoch: 06 | Time: 0m 4s
	Train Loss: 0.948 | Train PPL:   2.580
	 Val. Loss: 1.272 |  Val. PPL:   3.567


Training: 100%|██████████| 75/75 [00:04<00:00, 16.78it/s]
Evaluation: 100%|██████████| 4/4 [00:00<00:00, 27.97it/s]


Epoch: 07 | Time: 0m 4s
	Train Loss: 0.936 | Train PPL:   2.549
	 Val. Loss: 1.306 |  Val. PPL:   3.690


Training: 100%|██████████| 75/75 [00:04<00:00, 16.86it/s]
Evaluation: 100%|██████████| 4/4 [00:00<00:00, 24.31it/s]


Epoch: 08 | Time: 0m 4s
	Train Loss: 0.916 | Train PPL:   2.499
	 Val. Loss: 1.336 |  Val. PPL:   3.802


Training: 100%|██████████| 75/75 [00:04<00:00, 17.08it/s]
Evaluation: 100%|██████████| 4/4 [00:00<00:00, 37.45it/s]


Epoch: 09 | Time: 0m 4s
	Train Loss: 0.909 | Train PPL:   2.483
	 Val. Loss: 1.363 |  Val. PPL:   3.909


Training: 100%|██████████| 75/75 [00:04<00:00, 16.58it/s]
Evaluation: 100%|██████████| 4/4 [00:00<00:00, 24.67it/s]


Epoch: 10 | Time: 0m 4s
	Train Loss: 0.894 | Train PPL:   2.444
	 Val. Loss: 1.391 |  Val. PPL:   4.019


Training: 100%|██████████| 75/75 [00:04<00:00, 17.00it/s]
Evaluation: 100%|██████████| 4/4 [00:00<00:00, 24.93it/s]


Epoch: 11 | Time: 0m 4s
	Train Loss: 0.891 | Train PPL:   2.438
	 Val. Loss: 1.418 |  Val. PPL:   4.130


Training: 100%|██████████| 75/75 [00:04<00:00, 17.07it/s]
Evaluation: 100%|██████████| 4/4 [00:00<00:00, 30.12it/s]


Epoch: 12 | Time: 0m 4s
	Train Loss: 0.883 | Train PPL:   2.417
	 Val. Loss: 1.440 |  Val. PPL:   4.223


Training: 100%|██████████| 75/75 [00:04<00:00, 17.00it/s]
Evaluation: 100%|██████████| 4/4 [00:00<00:00, 22.05it/s]


Epoch: 13 | Time: 0m 4s
	Train Loss: 0.871 | Train PPL:   2.390
	 Val. Loss: 1.466 |  Val. PPL:   4.333


Training: 100%|██████████| 75/75 [00:04<00:00, 16.52it/s]
Evaluation: 100%|██████████| 4/4 [00:00<00:00, 24.12it/s]


Epoch: 14 | Time: 0m 4s
	Train Loss: 0.865 | Train PPL:   2.375
	 Val. Loss: 1.488 |  Val. PPL:   4.429


Training: 100%|██████████| 75/75 [00:04<00:00, 16.68it/s]
Evaluation: 100%|██████████| 4/4 [00:00<00:00, 32.18it/s]


Epoch: 15 | Time: 0m 4s
	Train Loss: 0.858 | Train PPL:   2.358
	 Val. Loss: 1.514 |  Val. PPL:   4.544


Training: 100%|██████████| 75/75 [00:04<00:00, 17.45it/s]
Evaluation: 100%|██████████| 4/4 [00:00<00:00, 30.40it/s]

Epoch: 16 | Time: 0m 4s
	Train Loss: 0.853 | Train PPL:   2.347
	 Val. Loss: 1.535 |  Val. PPL:   4.643
Repeated slow change in validation loss for 15 times.
Early stopping at epoch 16 ...





In [48]:
torch.save(model.state_dict(), f'seq2seq_adam_stemmer_brown_embedding.pt')

In [49]:
# Compute the mean and standard deviation of the accuracy across folds
mean_accuracy = sum(fold_metrics) / len(fold_metrics)
std_accuracy = (sum((x - mean_accuracy) ** 2 for x in fold_metrics) / len(fold_metrics)) ** 0.5

print(f"Mean Accuracy: {mean_accuracy:.2f}")
print(f"Standard Deviation: {std_accuracy:.2f}")
print(f"All folds: {fold_metrics}")

Mean Accuracy: 1.54
Standard Deviation: 0.00
All folds: [1.535272091627121]


In [62]:
enc = Encoder(INPUT_DIM, ENC_EMB_DIM, HID_DIM, N_LAYERS, ENC_DROPOUT, RNN_DROPOUT, weights_matrix)
dec = Decoder(OUTPUT_DIM, DEC_EMB_DIM, HID_DIM, N_LAYERS, DEC_DROPOUT, RNN_DROPOUT)

model = Seq2Seq(enc, dec, device).to(device)
model.load_state_dict(torch.load('seq2seq_fold_01.pt'))

<All keys matched successfully>

In [None]:
print("Type 'exit' to finish the chat.\n", "-"*30, '\n')
while (True):
    src = input("> ")
    if src.strip() == "exit":
        break
    chat(src, trg_data_vocab, model, 10)
    
# to whom did the virgin mary allegedly appear in 1858 in lourdes france
# what is the grotto at notre-dame?

Type 'exit' to finish the chat.
 ------------------------------ 



>  what is the grotto at notre-dame?


[1, 3328, 12, 4, 2]
< wen for and 



>  to whom did the virgin mary allegedly appear in 1858 in lourdes france


[1, 2]
<  

