# LSTM Bot

## Project Overview

In this project, you will build a chatbot that can converse with you at the command line. The chatbot will use a Sequence to Sequence text generation architecture with an LSTM as it's memory unit. You will also learn to use pretrained word embeddings to improve the performance of the model. At the conclusion of the project, you will be able to show your chatbot to potential employers.

Additionally, you have the option to use pretrained word embeddings in your model. We have loaded Brown Embeddings from Gensim in the starter code below. You can compare the performance of your model with pre-trained embeddings against a model without the embeddings.



---



A sequence to sequence model (Seq2Seq) has two components:
- An Encoder consisting of an embedding layer and LSTM unit.
- A Decoder consisting of an embedding layer, LSTM unit, and linear output unit.

The Seq2Seq model works by accepting an input into the Encoder, passing the hidden state from the Encoder to the Decoder, which the Decoder uses to output a series of token predictions.

## Dependencies

- Pytorch
- Numpy
- Pandas
- NLTK
- Gzip
- Gensim


Please choose a dataset from the Torchtext website. We recommend looking at the Squad dataset first. Here is a link to the website where you can view your options:

- https://pytorch.org/text/stable/datasets.html





In [37]:
%pwd
%cd question/RNN/ChatbotSeq2Seq

[Errno 2] No such file or directory: 'question/RNN/ChatbotSeq2Seq'
/home/question/RNN/ChatbotSeq2Seq


In [38]:
from src.Data import loadDF, tokenizer, getPairs, add_symbols, create_word_embedding, add_symbols2
from src.Models import Seq2Seq, Encoder, Decoder
from src.Vocab import Vocab
from src.Train import train
from src.Evaluate import evaluate
from src.Chat import chat
from src.ValEarlyStop import ValidationLossEarlyStopping

import torch
from sklearn.model_selection import KFold
from torch.utils.data import DataLoader
import torch.nn as nn
import torch.optim as optim

import numpy as np
import random, math, time

import gensim
import nltk
from nltk.corpus import brown


%load_ext autoreload
%autoreload 2




The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [39]:

SEED = 1234

random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
# torch.cuda.manual_seed(SEED)
# torch.backends.cudnn.deterministic = True

<torch._C.Generator at 0x7f0529c96bf0>

In [40]:

# nltk.download('brown')
# nltk.download('punkt')

print([" ".join(sent) for sent in brown.sents()[0:3]])

# Output, save, and load brown embeddings

# The default value of vector_size is 100.
# model = gensim.models.Word2Vec(brown.sents(), size=100)
# model.save('brown.embedding')

# i can use googlenews vector which doesn't need training but it is very very large
# model = gensim.models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)
w2v = gensim.models.Word2Vec.load('brown.embedding')


["The Fulton County Grand Jury said Friday an investigation of Atlanta's recent primary election produced `` no evidence '' that any irregularities took place .", "The jury further said in term-end presentments that the City Executive Committee , which had over-all charge of the election , `` deserves the praise and thanks of the City of Atlanta '' for the manner in which the election was conducted .", "The September-October term jury had been charged by Fulton Superior Court Judge Durwood Pye to investigate reports of possible `` irregularities '' in the hard-fought primary which was won by Mayor-nominate Ivan Allen Jr. ."]


In [41]:
data_df = loadDF('data')
# I will take only the first 5,000 Q&A to avoid CUDA out of memory error due to the large dataset
data_df = data_df.iloc[:5000, :]

  return train_df.append(validation_df)


In [42]:
data_df.describe()

Unnamed: 0,Question,Answer
count,5000,5000
unique,4983,3642
top,Who was Alexander Scriabin's teacher?,Manhattan
freq,2,21


In [43]:
data_df.head()

Unnamed: 0,Question,Answer
0,To whom did the Virgin Mary allegedly appear i...,Saint Bernadette Soubirous
1,What is in front of the Notre Dame Main Building?,a copper statue of Christ
2,The Basilica of the Sacred heart at Notre Dame...,the Main Building
3,What is the Grotto at Notre Dame?,a Marian place of prayer and reflection
4,What sits on top of the Main Building at Notre...,a golden statue of the Virgin Mary


In [44]:
data_df['Question'], data_df['Qtoken'] = data_df['Question'].apply(tokenizer).str
data_df['Answer'], data_df['Atoken'] = data_df['Answer'].apply(tokenizer).str

In [None]:
data_df.head()

Unnamed: 0,Question,Answer,Qtoken,Atoken
0,to whom did the virgin mari alleg appear in 18...,saint bernadett soubir,"[to, whom, did, the, virgin, mari, alleg, appe...","[saint, bernadett, soubir]"
1,what is in front of the notr dame main build,a copper statu of christ,"[what, is, in, front, of, the, notr, dame, mai...","[a, copper, statu, of, christ]"
2,the basilica of the sacr heart at notr dame is...,the main build,"[the, basilica, of, the, sacr, heart, at, notr...","[the, main, build]"
3,what is the grotto at notr dame,a marian place of prayer and reflect,"[what, is, the, grotto, at, notr, dame]","[a, marian, place, of, prayer, and, reflect]"
4,what sit on top of the main build at notr dame,a golden statu of the virgin mari,"[what, sit, on, top, of, the, main, build, at,...","[a, golden, statu, of, the, virgin, mari]"


In [None]:
pairs_sequence = getPairs(data_df)
first_five_items = pairs_sequence[:5]
# import itertools
# first_five_items = list(itertools.islice(pairs_sequence, 5))
print(len(pairs_sequence))
first_five_items

5000


[['to whom did the virgin mari alleg appear in 1858 in lourd franc',
  'saint bernadett soubir'],
 ['what is in front of the notr dame main build', 'a copper statu of christ'],
 ['the basilica of the sacr heart at notr dame is besid to which structur',
  'the main build'],
 ['what is the grotto at notr dame', 'a marian place of prayer and reflect'],
 ['what sit on top of the main build at notr dame',
  'a golden statu of the virgin mari']]

In [None]:
# max_src, max_trg = getMaxLen(pairs_sequence)
# max_src, max_trg

In [None]:
data_vocab = Vocab(data_df)
print("total of unique questions and answers in dataset: ", len(data_vocab.text))
# A_vocab = Vocab()

data_vocab.build_word_vocab()

print({k: data_vocab.index2word[k] for k in list(data_vocab.index2word)[:10]})
print({k: data_vocab.word2index[k] for k in list(data_vocab.word2index)[:10]})
print(data_vocab.word_vocab[:10])
print(data_vocab['the'])
print(data_vocab['oov'])

# # build vocabularies for questions "source" and answers "target"
# for pair in pairs_sequence:
#     Q_vocab.add_words(pair[0])
#     A_vocab.add_words(pair[1])

total of unique questions and answers in dataset:  8505
raw-vocab: 6321
vocab-length: 6324
word2idx-length: 6324
{0: '<pad>', 1: '<sos>', 2: '<eos>', 3: 'the', 4: 'what', 5: 'of', 6: 'in', 7: 'did', 8: 'was', 9: 'to'}
{'<pad>': 0, '<sos>': 1, '<eos>': 2, 'the': 3, 'what': 4, 'of': 5, 'in': 6, 'did': 7, 'was': 8, 'to': 9}
['<pad>', '<sos>', '<eos>', 'the', 'what', 'of', 'in', 'did', 'was', 'to']
3
0


In [None]:
src_data_vocab = Vocab(data_df, source=True)
print("total of unique questions in dataset: ", len(src_data_vocab.text))
# A_vocab = Vocab()

src_data_vocab.build_word_vocab()

trg_data_vocab = Vocab(data_df, source=False)
print("total of unique answers in dataset: ", len(trg_data_vocab.text))
# A_vocab = Vocab()

trg_data_vocab.build_word_vocab()

# # build vocabularies for questions "source" and answers "target"
# for pair in pairs_sequence:
#     Q_vocab.add_words(pair[0])
#     A_vocab.add_words(pair[1])

total of unique questions in dataset:  4980
raw-vocab: 4504
vocab-length: 4507
word2idx-length: 4507
total of unique answers in dataset:  3525
raw-vocab: 4081
vocab-length: 4084
word2idx-length: 4084


In [None]:
print(len(pairs_sequence))

5000


In [None]:
# from torchdata.datapipes.iter import IterableWrapper
# tmp = IterableWrapper(pairs_sequence).sharding_filter()
# a, b = next(iter(tmp))
# print(a)
# print(b)


In [None]:
# source_data = [toTensor(data_vocab, pair[0]) for pair in pairs_sequence]
# target_data = [toTensor(data_vocab, pair[1]) for pair in pairs_sequence]

In [None]:
# print(source_data[10].shape)
# print(source_data[0].view(-1).shape)
# print(source_data[0])

In [None]:
weights_matrix, words_found = create_word_embedding(w2v.wv, word_vocab=src_data_vocab.word_vocab)

In [None]:
print("Total words found in glove vocab: {0} from {1}".format(words_found, len(data_vocab)))

Total words found in glove vocab: 1653 from 6324


In [None]:
np.save('seq2seqEmb_vt.npy', weights_matrix)

practice to understand batch generation

In [None]:
print(data_vocab['whom'])

183


In [None]:
# print(len(source_data))

In [None]:
# def get_batches(tmp, batch_size, seq_length):

#     n_batches = int(tmp.shape[0]/(batch_size*seq_length))
#     print(n_batches)
#     tmp = tmp[:n_batches*batch_size*seq_length]
#     tmp = tmp.reshape(batch_size, -1)
#     print(tmp.shape)
#     print(tmp)
#     print((tmp[0:2, :-1]).shape)
#     ## now, we have to Iterate over the batches using a window of size seq_length
#     for n in range(0, tmp.shape[1], seq_length):
#         # The features
#         x = tmp[:, n:n+seq_length]
#         # The targets, shifted by one
#         y = np.zeros_like(x)
#         try:
#             y[:, :-1], y[:, -1] = x[:, 1:], tmp[:, n+seq_length]
#         except IndexError:
#             y[:, :-1], y[:, -1] = x[:, 1:], tmp[:, 0]
#         yield x, y

# seq_length = 4
# batch_size = 10
# tmp = np.array([55, 20, 48, 54, 76, 36, 12,  4, 81,  7,  7,  7, 57, 48, 54, 54, 65,
#         4, 66, 48, 69, 78,  9, 78, 36, 19,  4, 48, 12, 36,  4, 48,  9,  9,
#         4, 48,  9, 78, 17, 36, 47,  4, 36, 27, 36, 12, 65,  4, 44, 29, 20,
#        48, 54, 54, 65,  4, 66, 48, 69, 78,  9, 65,  4, 78, 19,  4, 44, 29,
#        20, 48, 54, 54, 65,  4, 78, 29,  4, 78, 76, 19,  4, 18, 46, 29,  7,
#        46, 48, 65,  0,  7,  7, 52, 27, 36, 12, 65, 76, 20, 78, 29, 213, 4])
# i = 0
# for d in (get_batches(tmp, batch_size, seq_length)):
#     i +=1
# print(f"The number of batches (iterations): {i}")
# # next(get_batches(tmp, batch_size, seq_length))

In [None]:
# from sklearn.model_selection import KFold

# kf = KFold(n_splits=10, shuffle=True)
# tmp = source_data[:20]
# for e, (train_index, test_index) in enumerate(kf.split(tmp), 1):
#     print(f"Iteration: {e}")
#     print(f"{train_index}->{len(train_index)}")
#     print(f"{test_index}->{len(test_index)}")

    
#     # break
    

In [None]:
from torch.nn.utils.rnn import pad_sequence
# import logging
# logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', handlers=[logging.StreamHandler()])


def generate_batch(batch):
    src_batch = []
    trg_batch = []
    src_len = []
    i = 0
    # print(type(batch))
    for src, trg in batch:
        i += 1
        #split sentence into tokens
        _, src_tokens = tokenizer(src)
        # logging.warning(f'iteration {i}:\n {src}'); # why prints 3 times while batch is 1?
        _, trg_tokens = tokenizer(trg)
        #convert tokens to index and to tensor and add <sos> and <eos> to each sentence
        src_tensor = add_symbols(torch.tensor(src_data_vocab(src_tokens)).long(), src_data_vocab)
        trg_tensor = add_symbols2(torch.tensor(trg_data_vocab(trg_tokens)).long(), trg_data_vocab)
        src_batch.append(src_tensor)
        #track length of each source sentence, not useful in this model. Will be useful in further models
        src_len.append(len(src_tensor))
        trg_batch.append(trg_tensor)
        # logging.warning(f'iteration {i}:\n {(src_tensor)}');
    src_len = torch.tensor(src_len, dtype = torch.int)
    src_batch = pad_sequence(src_batch, padding_value=src_data_vocab['<pad>'])
    trg_batch = pad_sequence(trg_batch, padding_value=trg_data_vocab['<pad>'])
    src_len, idx = torch.sort(src_len,descending=True)
    #src_len is not useful in this model
    # logging.warning(f'lsrc_batch:{len(src_batch)}')
    return src_batch, src_len, trg_batch

In [None]:
len(pairs_sequence)

5000

In [None]:
train_dataloader = DataLoader(pairs_sequence, batch_size=5, collate_fn=generate_batch)
sr, srlen, tg = (next(iter(train_dataloader)))
print(len(srlen))
# print(len(list(train_dataloader))) # it is grouped to 5 items per batch

5


In [None]:
sr

tensor([[   1,    1,    1,    1,    1],
        [   9,    4,    3,    4,    4],
        [ 155,   11, 1555,   11, 1288],
        [   7,    6,    5,    3,   22],
        [   3, 1287,    3, 1945,  338],
        [2666,    5, 1556,   26,    5],
        [1942,    3, 1085,   35,    3],
        [1943,   35,   26,   34,  242],
        [ 234,   34,   35,    2,   88],
        [   6,  242,   34,    0,   26],
        [2667,   88,   11,    0,   35],
        [   6,    2,  409,    0,   34],
        [1944,    0,    9,    0,    2],
        [1084,    0,   14,    0,    0],
        [   2,    0,  549,    0,    0],
        [   0,    0,    2,    0,    0]])

In [None]:
tg

tensor([[ 483,    6,    3,    6,    6],
        [1436,  738,  740, 1439, 1441],
        [1437,  739,  156,  741,  739],
        [   2,    5,    2,    5,    5],
        [   0, 1438,    0,  742,    3],
        [   0,    2,    0,    4,  743],
        [   0,    0,    0, 1440,  256],
        [   0,    0,    0,    2,    2]])

In [None]:
def epoch_time(start_time, end_time):
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time/60)
    elapsed_secs = int(elapsed_time - (elapsed_mins*60))
    
    return elapsed_mins, elapsed_secs

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
INPUT_DIM = len(src_data_vocab)
OUTPUT_DIM = len(trg_data_vocab)
ENC_EMB_DIM = 256
DEC_EMB_DIM = 256
HID_DIM = 512
N_LAYERS = 1
ENC_DROPOUT = 0.2
DEC_DROPOUT = 0.2
RNN_DROPOUT = 0

enc = Encoder(INPUT_DIM, ENC_EMB_DIM, HID_DIM, N_LAYERS, ENC_DROPOUT, RNN_DROPOUT, weights_matrix)
dec = Decoder(OUTPUT_DIM, DEC_EMB_DIM, HID_DIM, N_LAYERS, DEC_DROPOUT, RNN_DROPOUT)

model = Seq2Seq(enc, dec, device).to(device)

In [None]:
def init_weights(m):
    for name, param in m.named_parameters():
        if not name.startswith('encoder.embedding'):  # Exclude encoder embedding parameters
            nn.init.uniform_(param.data, -0.08, 0.08)
            # nn.init.zeros_(param.data)

model.apply(init_weights)

Seq2Seq(
  (encoder): Encoder(
    (embedding): Embedding(4507, 100)
    (rnn): LSTM(100, 32)
    (dropout): Dropout(p=0.2, inplace=False)
  )
  (decoder): Decoder(
    (embedding): Embedding(4084, 100)
    (rnn): LSTM(100, 32)
    (fc_out): Linear(in_features=32, out_features=4084, bias=True)
    (dropout): Dropout(p=0.2, inplace=False)
  )
)

In [None]:
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f'The model has {count_parameters(model):,} trainable parameters')

The model has 1,028,176 trainable parameters


In [None]:
# optimizer = optim.Adam(model.parameters())
# optimizer = optim.SGD(model.parameters(), lr=0.1)
optimizer = optim.Adamax(model.parameters())

# ignore the loss whenever the target token is a padding token.
TRG_PAD_IDX = trg_data_vocab['<pad>']
criterion = nn.CrossEntropyLoss(ignore_index=TRG_PAD_IDX)


In [None]:
# After multiple training using the adam optimizer and batch of 12 and 10 folds
# I have changed the batch size to 128
# changed the optimizer to sgd
# changed the fold to 20

# finally i have changed the lr in sgd from 0.01 to 0.1 but did not train to the end. i just saved the model
# after reloading the model, trained it for 40 epochs.
# then changed rhe optimizer to adamax with default learning rate

# model.load_state_dict(torch.load('seq2seq_adam_stemmer_brown_embedding.pt'))

In [None]:
N_EPOCHS = 40
CLIP = 1
BATCH_SIZE = 1

# speedup the training by reducing the size to grasp how the model is doing
half_length = len(pairs_sequence) // 1
cut_list = pairs_sequence[:half_length]

# Initialize K-Fold cross-validation
kf = KFold(n_splits=20, shuffle=True)

# Lists to store performance metrics for each fold
fold_metrics = []


# Loop through each fold
for fold_x, (train_indices, val_indices) in enumerate(kf.split(cut_list), 1):
    train_data = torch.utils.data.Subset(cut_list, train_indices)
    val_data = torch.utils.data.Subset(cut_list, val_indices)

    train_dataloader = DataLoader(train_data, batch_size=BATCH_SIZE, collate_fn=generate_batch)
    val_dataloader = DataLoader(val_data, batch_size=BATCH_SIZE, collate_fn=generate_batch)
    
    early_stop = ValidationLossEarlyStopping(patience=3, min_delta=0.001)
    best_val_loss = float('inf')
    # Training/Validation loop
    for epoch in range(N_EPOCHS):

        start_time = time.time()

        train_loss, answer_token = train(model, train_dataloader, optimizer, criterion, CLIP, trg_data_vocab)
        val_loss = evaluate(model, val_dataloader, criterion)

        end_time = time.time()

        epoch_mins, epoch_secs = epoch_time(start_time, end_time)

        if val_loss < best_val_loss:
            best_val_loss = val_loss
            torch.save(model.state_dict(), f'seq2seq_fold_{fold_x:02}.pt')
        print(f'Epoch: {epoch+1:02} | Time: {epoch_mins}m {epoch_secs}s')
        print(f'\tTrain Loss: {train_loss:.3f} | Train PPL: {math.exp(train_loss):7.3f}')
        print(f'\t Val. Loss: {val_loss:.3f} |  Val. PPL: {math.exp(val_loss):7.3f}')
        
        # if(early_stop(val_loss)):
        #     print(f"Repeated slow change in validation loss for {early_stop.patience} times.")
        #     print(f"Early stopping at epoch {epoch+1:02} ...")
        #     # print(f"The last output of the decoder: {answer_token}")
        #     break # let 
    fold_metrics.append(val_loss)
    # break
    # create new model for the next fold
    model = Seq2Seq(enc, dec, device).to(device)
    model.apply(init_weights)

    continue

Training: 100%|██████████| 4750/4750 [00:21<00:00, 221.79it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 950.41it/s]


Epoch: 01 | Time: 0m 21s
	Train Loss: 4.259 | Train PPL:  70.714
	 Val. Loss: 4.868 |  Val. PPL: 130.030


Training: 100%|██████████| 4750/4750 [00:21<00:00, 224.82it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 920.98it/s]


Epoch: 02 | Time: 0m 21s
	Train Loss: 3.555 | Train PPL:  34.999
	 Val. Loss: 4.560 |  Val. PPL:  95.551


Training: 100%|██████████| 4750/4750 [00:21<00:00, 225.19it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 872.89it/s]


Epoch: 03 | Time: 0m 21s
	Train Loss: 3.277 | Train PPL:  26.503
	 Val. Loss: 4.629 |  Val. PPL: 102.391


Training: 100%|██████████| 4750/4750 [00:20<00:00, 226.72it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 947.41it/s]


Epoch: 04 | Time: 0m 21s
	Train Loss: 3.154 | Train PPL:  23.438
	 Val. Loss: 4.914 |  Val. PPL: 136.160


Training: 100%|██████████| 4750/4750 [00:21<00:00, 225.29it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 842.59it/s]


Epoch: 05 | Time: 0m 21s
	Train Loss: 3.018 | Train PPL:  20.453
	 Val. Loss: 5.117 |  Val. PPL: 166.869


Training: 100%|██████████| 4750/4750 [00:20<00:00, 226.33it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 968.20it/s]


Epoch: 06 | Time: 0m 21s
	Train Loss: 2.927 | Train PPL:  18.676
	 Val. Loss: 5.112 |  Val. PPL: 166.015


Training: 100%|██████████| 4750/4750 [00:21<00:00, 225.46it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 958.73it/s]


Epoch: 07 | Time: 0m 21s
	Train Loss: 2.840 | Train PPL:  17.114
	 Val. Loss: 5.163 |  Val. PPL: 174.662


Training: 100%|██████████| 4750/4750 [00:21<00:00, 225.51it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 943.13it/s]


Epoch: 08 | Time: 0m 21s
	Train Loss: 2.763 | Train PPL:  15.846
	 Val. Loss: 5.107 |  Val. PPL: 165.199


Training: 100%|██████████| 4750/4750 [00:20<00:00, 227.18it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 949.64it/s]


Epoch: 09 | Time: 0m 21s
	Train Loss: 2.706 | Train PPL:  14.970
	 Val. Loss: 5.244 |  Val. PPL: 189.345


Training: 100%|██████████| 4750/4750 [00:20<00:00, 226.82it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 962.18it/s]


Epoch: 10 | Time: 0m 21s
	Train Loss: 2.657 | Train PPL:  14.254
	 Val. Loss: 5.233 |  Val. PPL: 187.432


Training: 100%|██████████| 4750/4750 [00:21<00:00, 225.25it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 896.30it/s]


Epoch: 11 | Time: 0m 21s
	Train Loss: 2.611 | Train PPL:  13.609
	 Val. Loss: 5.083 |  Val. PPL: 161.184


Training: 100%|██████████| 4750/4750 [00:21<00:00, 223.59it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 943.16it/s]


Epoch: 12 | Time: 0m 21s
	Train Loss: 2.567 | Train PPL:  13.030
	 Val. Loss: 5.218 |  Val. PPL: 184.638


Training: 100%|██████████| 4750/4750 [00:21<00:00, 224.67it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 948.21it/s]


Epoch: 13 | Time: 0m 21s
	Train Loss: 2.542 | Train PPL:  12.701
	 Val. Loss: 5.056 |  Val. PPL: 157.002


Training: 100%|██████████| 4750/4750 [00:21<00:00, 226.14it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 966.28it/s]


Epoch: 14 | Time: 0m 21s
	Train Loss: 2.499 | Train PPL:  12.166
	 Val. Loss: 5.201 |  Val. PPL: 181.483


Training: 100%|██████████| 4750/4750 [00:21<00:00, 225.81it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 955.04it/s]


Epoch: 15 | Time: 0m 21s
	Train Loss: 2.481 | Train PPL:  11.955
	 Val. Loss: 5.136 |  Val. PPL: 170.027


Training: 100%|██████████| 4750/4750 [00:21<00:00, 225.04it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 957.89it/s]


Epoch: 16 | Time: 0m 21s
	Train Loss: 2.448 | Train PPL:  11.569
	 Val. Loss: 5.234 |  Val. PPL: 187.626


Training: 100%|██████████| 4750/4750 [00:21<00:00, 224.28it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 959.65it/s]


Epoch: 17 | Time: 0m 21s
	Train Loss: 2.431 | Train PPL:  11.375
	 Val. Loss: 5.064 |  Val. PPL: 158.211


Training: 100%|██████████| 4750/4750 [00:20<00:00, 226.30it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 961.54it/s]


Epoch: 18 | Time: 0m 21s
	Train Loss: 2.397 | Train PPL:  10.989
	 Val. Loss: 5.124 |  Val. PPL: 167.998


Training: 100%|██████████| 4750/4750 [00:21<00:00, 223.70it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 940.70it/s]


Epoch: 19 | Time: 0m 21s
	Train Loss: 2.385 | Train PPL:  10.859
	 Val. Loss: 4.967 |  Val. PPL: 143.586


Training: 100%|██████████| 4750/4750 [00:21<00:00, 224.23it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 969.07it/s]


Epoch: 20 | Time: 0m 21s
	Train Loss: 2.350 | Train PPL:  10.484
	 Val. Loss: 5.066 |  Val. PPL: 158.612


Training: 100%|██████████| 4750/4750 [00:20<00:00, 228.03it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 935.21it/s]


Epoch: 21 | Time: 0m 21s
	Train Loss: 2.330 | Train PPL:  10.278
	 Val. Loss: 5.090 |  Val. PPL: 162.437


Training: 100%|██████████| 4750/4750 [00:21<00:00, 224.30it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 900.69it/s]


Epoch: 22 | Time: 0m 21s
	Train Loss: 2.304 | Train PPL:  10.014
	 Val. Loss: 5.053 |  Val. PPL: 156.464


Training: 100%|██████████| 4750/4750 [00:20<00:00, 227.82it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 952.23it/s]


Epoch: 23 | Time: 0m 21s
	Train Loss: 2.287 | Train PPL:   9.848
	 Val. Loss: 5.186 |  Val. PPL: 178.766


Training: 100%|██████████| 4750/4750 [00:20<00:00, 227.97it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 947.36it/s]


Epoch: 24 | Time: 0m 21s
	Train Loss: 2.261 | Train PPL:   9.592
	 Val. Loss: 4.954 |  Val. PPL: 141.697


Training: 100%|██████████| 4750/4750 [00:20<00:00, 226.91it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 948.60it/s]


Epoch: 25 | Time: 0m 21s
	Train Loss: 2.247 | Train PPL:   9.458
	 Val. Loss: 4.906 |  Val. PPL: 135.041


Training: 100%|██████████| 4750/4750 [00:21<00:00, 225.35it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 941.70it/s]


Epoch: 26 | Time: 0m 21s
	Train Loss: 2.228 | Train PPL:   9.286
	 Val. Loss: 5.016 |  Val. PPL: 150.796


Training: 100%|██████████| 4750/4750 [00:20<00:00, 226.37it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 944.36it/s]


Epoch: 27 | Time: 0m 21s
	Train Loss: 2.205 | Train PPL:   9.071
	 Val. Loss: 5.137 |  Val. PPL: 170.206


Training: 100%|██████████| 4750/4750 [00:21<00:00, 224.51it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 957.31it/s]


Epoch: 28 | Time: 0m 21s
	Train Loss: 2.192 | Train PPL:   8.949
	 Val. Loss: 5.107 |  Val. PPL: 165.132


Training: 100%|██████████| 4750/4750 [00:21<00:00, 224.26it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 944.30it/s]


Epoch: 29 | Time: 0m 21s
	Train Loss: 2.176 | Train PPL:   8.810
	 Val. Loss: 5.042 |  Val. PPL: 154.839


Training: 100%|██████████| 4750/4750 [00:21<00:00, 225.40it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 948.70it/s]


Epoch: 30 | Time: 0m 21s
	Train Loss: 2.165 | Train PPL:   8.717
	 Val. Loss: 5.090 |  Val. PPL: 162.448


Training: 100%|██████████| 4750/4750 [00:21<00:00, 224.21it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 953.56it/s]


Epoch: 31 | Time: 0m 21s
	Train Loss: 2.144 | Train PPL:   8.535
	 Val. Loss: 5.059 |  Val. PPL: 157.374


Training: 100%|██████████| 4750/4750 [00:21<00:00, 225.08it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 939.23it/s]


Epoch: 32 | Time: 0m 21s
	Train Loss: 2.131 | Train PPL:   8.423
	 Val. Loss: 5.059 |  Val. PPL: 157.380


Training: 100%|██████████| 4750/4750 [00:21<00:00, 223.53it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 967.76it/s]


Epoch: 33 | Time: 0m 21s
	Train Loss: 2.111 | Train PPL:   8.259
	 Val. Loss: 4.997 |  Val. PPL: 148.027


Training: 100%|██████████| 4750/4750 [00:20<00:00, 226.96it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 951.02it/s]


Epoch: 34 | Time: 0m 21s
	Train Loss: 2.092 | Train PPL:   8.101
	 Val. Loss: 5.262 |  Val. PPL: 192.960


Training: 100%|██████████| 4750/4750 [00:21<00:00, 226.07it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 951.98it/s]


Epoch: 35 | Time: 0m 21s
	Train Loss: 2.080 | Train PPL:   8.008
	 Val. Loss: 5.129 |  Val. PPL: 168.873


Training: 100%|██████████| 4750/4750 [00:21<00:00, 225.36it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 946.91it/s]


Epoch: 36 | Time: 0m 21s
	Train Loss: 2.073 | Train PPL:   7.946
	 Val. Loss: 5.071 |  Val. PPL: 159.301


Training: 100%|██████████| 4750/4750 [00:20<00:00, 227.15it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 958.06it/s]


Epoch: 37 | Time: 0m 21s
	Train Loss: 2.051 | Train PPL:   7.776
	 Val. Loss: 5.048 |  Val. PPL: 155.782


Training: 100%|██████████| 4750/4750 [00:21<00:00, 224.43it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 936.24it/s]


Epoch: 38 | Time: 0m 21s
	Train Loss: 2.044 | Train PPL:   7.721
	 Val. Loss: 5.141 |  Val. PPL: 170.917


Training: 100%|██████████| 4750/4750 [00:21<00:00, 224.43it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 944.81it/s]


Epoch: 39 | Time: 0m 21s
	Train Loss: 2.030 | Train PPL:   7.612
	 Val. Loss: 5.139 |  Val. PPL: 170.529


Training: 100%|██████████| 4750/4750 [00:20<00:00, 228.18it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 949.39it/s]


Epoch: 40 | Time: 0m 21s
	Train Loss: 2.010 | Train PPL:   7.463
	 Val. Loss: 5.057 |  Val. PPL: 157.082


Training: 100%|██████████| 4750/4750 [00:21<00:00, 224.00it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 727.68it/s]


Epoch: 01 | Time: 0m 21s
	Train Loss: 3.875 | Train PPL:  48.172
	 Val. Loss: 3.927 |  Val. PPL:  50.733


Training: 100%|██████████| 4750/4750 [00:20<00:00, 227.18it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 958.21it/s]


Epoch: 02 | Time: 0m 21s
	Train Loss: 3.331 | Train PPL:  27.964
	 Val. Loss: 4.071 |  Val. PPL:  58.642


Training: 100%|██████████| 4750/4750 [00:21<00:00, 225.12it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 959.31it/s]


Epoch: 03 | Time: 0m 21s
	Train Loss: 3.201 | Train PPL:  24.553
	 Val. Loss: 4.243 |  Val. PPL:  69.638


Training: 100%|██████████| 4750/4750 [00:21<00:00, 224.85it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 951.41it/s]


Epoch: 04 | Time: 0m 21s
	Train Loss: 3.081 | Train PPL:  21.776
	 Val. Loss: 4.322 |  Val. PPL:  75.341


Training: 100%|██████████| 4750/4750 [00:21<00:00, 225.04it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 950.48it/s]


Epoch: 05 | Time: 0m 21s
	Train Loss: 2.984 | Train PPL:  19.759
	 Val. Loss: 4.358 |  Val. PPL:  78.125


Training: 100%|██████████| 4750/4750 [00:20<00:00, 227.21it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 859.80it/s]


Epoch: 06 | Time: 0m 21s
	Train Loss: 2.906 | Train PPL:  18.286
	 Val. Loss: 4.438 |  Val. PPL:  84.578


Training: 100%|██████████| 4750/4750 [00:21<00:00, 222.86it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 943.84it/s]


Epoch: 07 | Time: 0m 21s
	Train Loss: 2.849 | Train PPL:  17.277
	 Val. Loss: 4.565 |  Val. PPL:  96.043


Training: 100%|██████████| 4750/4750 [00:21<00:00, 225.04it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 952.90it/s]


Epoch: 08 | Time: 0m 21s
	Train Loss: 2.801 | Train PPL:  16.459
	 Val. Loss: 4.592 |  Val. PPL:  98.648


Training: 100%|██████████| 4750/4750 [00:21<00:00, 225.21it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 944.67it/s]


Epoch: 09 | Time: 0m 21s
	Train Loss: 2.755 | Train PPL:  15.719
	 Val. Loss: 4.622 |  Val. PPL: 101.655


Training: 100%|██████████| 4750/4750 [00:20<00:00, 227.42it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 943.06it/s]


Epoch: 10 | Time: 0m 21s
	Train Loss: 2.714 | Train PPL:  15.085
	 Val. Loss: 4.655 |  Val. PPL: 105.145


Training: 100%|██████████| 4750/4750 [00:20<00:00, 226.98it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 942.63it/s]


Epoch: 11 | Time: 0m 21s
	Train Loss: 2.679 | Train PPL:  14.566
	 Val. Loss: 4.769 |  Val. PPL: 117.830


Training: 100%|██████████| 4750/4750 [00:21<00:00, 224.33it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 934.19it/s]


Epoch: 12 | Time: 0m 21s
	Train Loss: 2.633 | Train PPL:  13.915
	 Val. Loss: 4.695 |  Val. PPL: 109.394


Training: 100%|██████████| 4750/4750 [00:21<00:00, 224.60it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 963.28it/s]


Epoch: 13 | Time: 0m 21s
	Train Loss: 2.611 | Train PPL:  13.615
	 Val. Loss: 4.793 |  Val. PPL: 120.678


Training: 100%|██████████| 4750/4750 [00:20<00:00, 227.19it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 934.30it/s]


Epoch: 14 | Time: 0m 21s
	Train Loss: 2.573 | Train PPL:  13.103
	 Val. Loss: 4.811 |  Val. PPL: 122.836


Training: 100%|██████████| 4750/4750 [00:21<00:00, 225.35it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 937.01it/s]


Epoch: 15 | Time: 0m 21s
	Train Loss: 2.546 | Train PPL:  12.756
	 Val. Loss: 4.904 |  Val. PPL: 134.763


Training: 100%|██████████| 4750/4750 [00:21<00:00, 224.86it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 938.75it/s]


Epoch: 16 | Time: 0m 21s
	Train Loss: 2.521 | Train PPL:  12.445
	 Val. Loss: 4.903 |  Val. PPL: 134.633


Training: 100%|██████████| 4750/4750 [00:21<00:00, 225.59it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 953.65it/s]


Epoch: 17 | Time: 0m 21s
	Train Loss: 2.485 | Train PPL:  12.002
	 Val. Loss: 4.941 |  Val. PPL: 139.979


Training: 100%|██████████| 4750/4750 [00:20<00:00, 226.59it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 963.12it/s]


Epoch: 18 | Time: 0m 21s
	Train Loss: 2.457 | Train PPL:  11.673
	 Val. Loss: 4.988 |  Val. PPL: 146.690


Training: 100%|██████████| 4750/4750 [00:21<00:00, 225.46it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 954.52it/s]


Epoch: 19 | Time: 0m 21s
	Train Loss: 2.432 | Train PPL:  11.386
	 Val. Loss: 4.998 |  Val. PPL: 148.111


Training: 100%|██████████| 4750/4750 [00:20<00:00, 227.30it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 618.63it/s]


Epoch: 20 | Time: 0m 21s
	Train Loss: 2.407 | Train PPL:  11.098
	 Val. Loss: 4.907 |  Val. PPL: 135.200


Training: 100%|██████████| 4750/4750 [00:20<00:00, 227.18it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 940.06it/s]


Epoch: 21 | Time: 0m 21s
	Train Loss: 2.387 | Train PPL:  10.882
	 Val. Loss: 4.971 |  Val. PPL: 144.102


Training: 100%|██████████| 4750/4750 [00:21<00:00, 223.55it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 947.78it/s]


Epoch: 22 | Time: 0m 21s
	Train Loss: 2.362 | Train PPL:  10.617
	 Val. Loss: 4.904 |  Val. PPL: 134.804


Training: 100%|██████████| 4750/4750 [00:21<00:00, 224.86it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 951.66it/s]


Epoch: 23 | Time: 0m 21s
	Train Loss: 2.342 | Train PPL:  10.403
	 Val. Loss: 5.002 |  Val. PPL: 148.688


Training: 100%|██████████| 4750/4750 [00:21<00:00, 225.29it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 946.00it/s]


Epoch: 24 | Time: 0m 21s
	Train Loss: 2.324 | Train PPL:  10.215
	 Val. Loss: 4.911 |  Val. PPL: 135.833


Training: 100%|██████████| 4750/4750 [00:21<00:00, 223.25it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 948.53it/s]


Epoch: 25 | Time: 0m 21s
	Train Loss: 2.300 | Train PPL:   9.974
	 Val. Loss: 4.982 |  Val. PPL: 145.721


Training: 100%|██████████| 4750/4750 [00:20<00:00, 228.88it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 964.30it/s]


Epoch: 26 | Time: 0m 21s
	Train Loss: 2.272 | Train PPL:   9.697
	 Val. Loss: 4.876 |  Val. PPL: 131.138


Training: 100%|██████████| 4750/4750 [00:21<00:00, 226.02it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 947.82it/s]


Epoch: 27 | Time: 0m 21s
	Train Loss: 2.264 | Train PPL:   9.624
	 Val. Loss: 5.003 |  Val. PPL: 148.875


Training: 100%|██████████| 4750/4750 [00:21<00:00, 224.46it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 948.77it/s]


Epoch: 28 | Time: 0m 21s
	Train Loss: 2.245 | Train PPL:   9.443
	 Val. Loss: 5.013 |  Val. PPL: 150.411


Training: 100%|██████████| 4750/4750 [00:21<00:00, 224.43it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 949.87it/s]


Epoch: 29 | Time: 0m 21s
	Train Loss: 2.232 | Train PPL:   9.316
	 Val. Loss: 5.042 |  Val. PPL: 154.726


Training: 100%|██████████| 4750/4750 [00:21<00:00, 225.08it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 942.28it/s]


Epoch: 30 | Time: 0m 21s
	Train Loss: 2.209 | Train PPL:   9.109
	 Val. Loss: 5.175 |  Val. PPL: 176.849


Training: 100%|██████████| 4750/4750 [00:21<00:00, 224.98it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 960.54it/s]


Epoch: 31 | Time: 0m 21s
	Train Loss: 2.198 | Train PPL:   9.010
	 Val. Loss: 5.296 |  Val. PPL: 199.559


Training: 100%|██████████| 4750/4750 [00:21<00:00, 225.50it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 957.39it/s]


Epoch: 32 | Time: 0m 21s
	Train Loss: 2.172 | Train PPL:   8.776
	 Val. Loss: 5.215 |  Val. PPL: 184.096


Training: 100%|██████████| 4750/4750 [00:21<00:00, 224.99it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 943.09it/s]


Epoch: 33 | Time: 0m 21s
	Train Loss: 2.155 | Train PPL:   8.627
	 Val. Loss: 5.229 |  Val. PPL: 186.561


Training: 100%|██████████| 4750/4750 [00:20<00:00, 228.10it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 958.89it/s]


Epoch: 34 | Time: 0m 21s
	Train Loss: 2.138 | Train PPL:   8.480
	 Val. Loss: 5.326 |  Val. PPL: 205.566


Training: 100%|██████████| 4750/4750 [00:21<00:00, 222.10it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 950.52it/s]


Epoch: 35 | Time: 0m 21s
	Train Loss: 2.127 | Train PPL:   8.393
	 Val. Loss: 5.258 |  Val. PPL: 192.109


Training: 100%|██████████| 4750/4750 [00:21<00:00, 216.79it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 947.35it/s]


Epoch: 36 | Time: 0m 22s
	Train Loss: 2.111 | Train PPL:   8.254
	 Val. Loss: 5.250 |  Val. PPL: 190.528


Training:  39%|███▉      | 1841/4750 [00:08<00:13, 212.08it/s]


KeyboardInterrupt: 

In [None]:
torch.save(model.state_dict(), f'seq2seq_adam_stemmer_brown_embedding.pt')

In [None]:
# Compute the mean and standard deviation of the accuracy across folds
mean_accuracy = sum(fold_metrics) / len(fold_metrics)
std_accuracy = (sum((x - mean_accuracy) ** 2 for x in fold_metrics) / len(fold_metrics)) ** 0.5

print(f"Mean Accuracy: {mean_accuracy:.2f}")
print(f"Standard Deviation: {std_accuracy:.2f}")
print(f"All folds: {fold_metrics}")

Mean Accuracy: 4.32
Standard Deviation: 0.77
All folds: [1.0730895975941963, 4.346844787485898, 4.2826411010492595, 4.712956026250031, 4.267454868079163, 4.527238293468952, 4.428322538740002, 4.939457928681746, 4.818652681002393, 4.486097786943429, 4.676842145197093, 4.4699497792646286, 4.16081991638802, 4.307067255090922, 4.1885694899829105, 4.546688701670617, 4.5680717931110415, 4.505694039381109, 4.283751132718288, 4.7336242926204575]


In [None]:
enc = Encoder(INPUT_DIM, ENC_EMB_DIM, HID_DIM, N_LAYERS, ENC_DROPOUT, RNN_DROPOUT, weights_matrix)
dec = Decoder(OUTPUT_DIM, DEC_EMB_DIM, HID_DIM, N_LAYERS, DEC_DROPOUT, RNN_DROPOUT)

model = Seq2Seq(enc, dec, device).to(device)
model.load_state_dict(torch.load('seq2seq_fold_01.pt'))

<All keys matched successfully>

In [None]:
print("Type 'exit' to finish the chat.\n", "-"*30, '\n')
while (True):
    src = input("> ")
    if src.strip() == "exit":
        break
    chat(src, trg_data_vocab, model, 10)
    
# to whom did the virgin mary allegedly appear in 1858 in lourdes france

Type 'exit' to finish the chat.
 ------------------------------ 



>  what is the grotto at notre-dame?


[1, 4, 128, 2]
< and colleg 



>  exit
