# LSTM Bot

## Project Overview

In this project, you will build a chatbot that can converse with you at the command line. The chatbot will use a Sequence to Sequence text generation architecture with an LSTM as it's memory unit. You will also learn to use pretrained word embeddings to improve the performance of the model. At the conclusion of the project, you will be able to show your chatbot to potential employers.

Additionally, you have the option to use pretrained word embeddings in your model. We have loaded Brown Embeddings from Gensim in the starter code below. You can compare the performance of your model with pre-trained embeddings against a model without the embeddings.



---



A sequence to sequence model (Seq2Seq) has two components:
- An Encoder consisting of an embedding layer and LSTM unit.
- A Decoder consisting of an embedding layer, LSTM unit, and linear output unit.

The Seq2Seq model works by accepting an input into the Encoder, passing the hidden state from the Encoder to the Decoder, which the Decoder uses to output a series of token predictions.

## Dependencies

- Pytorch
- Numpy
- Pandas
- NLTK
- Gzip
- Gensim


Please choose a dataset from the Torchtext website. We recommend looking at the Squad dataset first. Here is a link to the website where you can view your options:

- https://pytorch.org/text/stable/datasets.html





In [1]:
from src.Data import loadDF, tokenizer, getPairs, add_symbols, create_word_embedding, add_symbols2
from src.Models import Seq2Seq, Encoder, Decoder
from src.Vocab import Vocab
from src.Train import train
from src.Evaluate import evaluate
from src.Chat import chat
from src.ValEarlyStop import ValidationLossEarlyStopping

import torch
from sklearn.model_selection import KFold
from torch.utils.data import DataLoader
import torch.nn as nn
import torch.optim as optim

import numpy as np
import random, math, time

import gensim
import nltk
from nltk.corpus import brown


%load_ext autoreload
%autoreload 2




In [2]:

SEED = 1234

random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
# torch.cuda.manual_seed(SEED)
# torch.backends.cudnn.deterministic = True

<torch._C.Generator at 0x7fd7130802d0>

In [3]:

# nltk.download('brown')
# nltk.download('punkt')

print([" ".join(sent) for sent in brown.sents()[0:3]])

# Output, save, and load brown embeddings

# The default value of vector_size is 100.
# model = gensim.models.Word2Vec(brown.sents(), size=100)
# model.save('brown.embedding')

# i can use googlenews vector which doesn't need training but it is very very large
# model = gensim.models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)
w2v = gensim.models.Word2Vec.load('brown.embedding')


["The Fulton County Grand Jury said Friday an investigation of Atlanta's recent primary election produced `` no evidence '' that any irregularities took place .", "The jury further said in term-end presentments that the City Executive Committee , which had over-all charge of the election , `` deserves the praise and thanks of the City of Atlanta '' for the manner in which the election was conducted .", "The September-October term jury had been charged by Fulton Superior Court Judge Durwood Pye to investigate reports of possible `` irregularities '' in the hard-fought primary which was won by Mayor-nominate Ivan Allen Jr. ."]


In [4]:
data_df = loadDF('data')
# I will take only the first 5,000 Q&A to avoid CUDA out of memory error due to the large dataset
data_df = data_df.iloc[:5000, :]

  return train_df.append(validation_df)


In [5]:
data_df.describe()

Unnamed: 0,Question,Answer
count,5000,5000
unique,4983,3642
top,Who was Alexander Scriabin's teacher?,Manhattan
freq,2,21


In [6]:
data_df.head()

Unnamed: 0,Question,Answer
0,To whom did the Virgin Mary allegedly appear i...,Saint Bernadette Soubirous
1,What is in front of the Notre Dame Main Building?,a copper statue of Christ
2,The Basilica of the Sacred heart at Notre Dame...,the Main Building
3,What is the Grotto at Notre Dame?,a Marian place of prayer and reflection
4,What sits on top of the Main Building at Notre...,a golden statue of the Virgin Mary


In [7]:
data_df['Question'], data_df['Qtoken'] = data_df['Question'].apply(tokenizer).str
data_df['Answer'], data_df['Atoken'] = data_df['Answer'].apply(tokenizer).str

  data_df['Question'], data_df['Qtoken'] = data_df['Question'].apply(tokenizer).str
  data_df['Answer'], data_df['Atoken'] = data_df['Answer'].apply(tokenizer).str


In [8]:
data_df.head()

Unnamed: 0,Question,Answer,Qtoken,Atoken
0,to whom did the virgin mari alleg appear in 18...,saint bernadett soubir,"[to, whom, did, the, virgin, mari, alleg, appe...","[saint, bernadett, soubir]"
1,what is in front of the notr dame main build,a copper statu of christ,"[what, is, in, front, of, the, notr, dame, mai...","[a, copper, statu, of, christ]"
2,the basilica of the sacr heart at notr dame is...,the main build,"[the, basilica, of, the, sacr, heart, at, notr...","[the, main, build]"
3,what is the grotto at notr dame,a marian place of prayer and reflect,"[what, is, the, grotto, at, notr, dame]","[a, marian, place, of, prayer, and, reflect]"
4,what sit on top of the main build at notr dame,a golden statu of the virgin mari,"[what, sit, on, top, of, the, main, build, at,...","[a, golden, statu, of, the, virgin, mari]"


In [9]:
pairs_sequence = getPairs(data_df)
first_five_items = pairs_sequence[:5]
# import itertools
# first_five_items = list(itertools.islice(pairs_sequence, 5))
print(len(pairs_sequence))
first_five_items

5000


[['to whom did the virgin mari alleg appear in 1858 in lourd franc',
  'saint bernadett soubir'],
 ['what is in front of the notr dame main build', 'a copper statu of christ'],
 ['the basilica of the sacr heart at notr dame is besid to which structur',
  'the main build'],
 ['what is the grotto at notr dame', 'a marian place of prayer and reflect'],
 ['what sit on top of the main build at notr dame',
  'a golden statu of the virgin mari']]

In [10]:
# max_src, max_trg = getMaxLen(pairs_sequence)
# max_src, max_trg

In [11]:
data_vocab = Vocab(data_df)
print("total of unique questions and answers in dataset: ", len(data_vocab.text))
# A_vocab = Vocab()

data_vocab.build_word_vocab()

print({k: data_vocab.index2word[k] for k in list(data_vocab.index2word)[:10]})
print({k: data_vocab.word2index[k] for k in list(data_vocab.word2index)[:10]})
print(data_vocab.word_vocab[:10])
print(data_vocab['the'])
print(data_vocab['oov'])

# # build vocabularies for questions "source" and answers "target"
# for pair in pairs_sequence:
#     Q_vocab.add_words(pair[0])
#     A_vocab.add_words(pair[1])

total of unique questions and answers in dataset:  8505
raw-vocab: 6321
vocab-length: 6324
word2idx-length: 6324
{0: '<pad>', 1: '<sos>', 2: '<eos>', 3: 'the', 4: 'what', 5: 'of', 6: 'in', 7: 'did', 8: 'was', 9: 'to'}
{'<pad>': 0, '<sos>': 1, '<eos>': 2, 'the': 3, 'what': 4, 'of': 5, 'in': 6, 'did': 7, 'was': 8, 'to': 9}
['<pad>', '<sos>', '<eos>', 'the', 'what', 'of', 'in', 'did', 'was', 'to']
3
0


In [12]:
src_data_vocab = Vocab(data_df, source=True)
print("total of unique questions in dataset: ", len(src_data_vocab.text))
# A_vocab = Vocab()

src_data_vocab.build_word_vocab()

trg_data_vocab = Vocab(data_df, source=False)
print("total of unique answers in dataset: ", len(trg_data_vocab.text))
# A_vocab = Vocab()

trg_data_vocab.build_word_vocab()

# # build vocabularies for questions "source" and answers "target"
# for pair in pairs_sequence:
#     Q_vocab.add_words(pair[0])
#     A_vocab.add_words(pair[1])

total of unique questions in dataset:  4980
raw-vocab: 4504
vocab-length: 4507
word2idx-length: 4507
total of unique answers in dataset:  3525
raw-vocab: 4081
vocab-length: 4084
word2idx-length: 4084


In [13]:
print(len(pairs_sequence))

5000


In [14]:
# from torchdata.datapipes.iter import IterableWrapper
# tmp = IterableWrapper(pairs_sequence).sharding_filter()
# a, b = next(iter(tmp))
# print(a)
# print(b)


In [15]:
# source_data = [toTensor(data_vocab, pair[0]) for pair in pairs_sequence]
# target_data = [toTensor(data_vocab, pair[1]) for pair in pairs_sequence]

In [16]:
# print(source_data[10].shape)
# print(source_data[0].view(-1).shape)
# print(source_data[0])

In [17]:
weights_matrix, words_found = create_word_embedding(w2v.wv, word_vocab=src_data_vocab.word_vocab)

In [18]:
print("Total words found in glove vocab: {0} from {1}".format(words_found, len(data_vocab)))

Total words found in glove vocab: 1653 from 6324


In [19]:
np.save('seq2seqEmb_vt.npy', weights_matrix)

practice to understand batch generation

In [20]:
print(data_vocab['whom'])

183


In [21]:
# print(len(source_data))

In [22]:
# def get_batches(tmp, batch_size, seq_length):

#     n_batches = int(tmp.shape[0]/(batch_size*seq_length))
#     print(n_batches)
#     tmp = tmp[:n_batches*batch_size*seq_length]
#     tmp = tmp.reshape(batch_size, -1)
#     print(tmp.shape)
#     print(tmp)
#     print((tmp[0:2, :-1]).shape)
#     ## now, we have to Iterate over the batches using a window of size seq_length
#     for n in range(0, tmp.shape[1], seq_length):
#         # The features
#         x = tmp[:, n:n+seq_length]
#         # The targets, shifted by one
#         y = np.zeros_like(x)
#         try:
#             y[:, :-1], y[:, -1] = x[:, 1:], tmp[:, n+seq_length]
#         except IndexError:
#             y[:, :-1], y[:, -1] = x[:, 1:], tmp[:, 0]
#         yield x, y

# seq_length = 4
# batch_size = 10
# tmp = np.array([55, 20, 48, 54, 76, 36, 12,  4, 81,  7,  7,  7, 57, 48, 54, 54, 65,
#         4, 66, 48, 69, 78,  9, 78, 36, 19,  4, 48, 12, 36,  4, 48,  9,  9,
#         4, 48,  9, 78, 17, 36, 47,  4, 36, 27, 36, 12, 65,  4, 44, 29, 20,
#        48, 54, 54, 65,  4, 66, 48, 69, 78,  9, 65,  4, 78, 19,  4, 44, 29,
#        20, 48, 54, 54, 65,  4, 78, 29,  4, 78, 76, 19,  4, 18, 46, 29,  7,
#        46, 48, 65,  0,  7,  7, 52, 27, 36, 12, 65, 76, 20, 78, 29, 213, 4])
# i = 0
# for d in (get_batches(tmp, batch_size, seq_length)):
#     i +=1
# print(f"The number of batches (iterations): {i}")
# # next(get_batches(tmp, batch_size, seq_length))

In [23]:
# from sklearn.model_selection import KFold

# kf = KFold(n_splits=10, shuffle=True)
# tmp = source_data[:20]
# for e, (train_index, test_index) in enumerate(kf.split(tmp), 1):
#     print(f"Iteration: {e}")
#     print(f"{train_index}->{len(train_index)}")
#     print(f"{test_index}->{len(test_index)}")

    
#     # break
    

In [24]:
from torch.nn.utils.rnn import pad_sequence
# import logging
# logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', handlers=[logging.StreamHandler()])


def generate_batch(batch):
    src_batch = []
    trg_batch = []
    src_len = []
    i = 0
    # print(type(batch))
    for src, trg in batch:
        i += 1
        #split sentence into tokens
        _, src_tokens = tokenizer(src)
        # logging.warning(f'iteration {i}:\n {src}'); # why prints 3 times while batch is 1?
        _, trg_tokens = tokenizer(trg)
        #convert tokens to index and to tensor and add <sos> and <eos> to each sentence
        src_tensor = add_symbols(torch.tensor(src_data_vocab(src_tokens)).long(), src_data_vocab)
        trg_tensor = add_symbols2(torch.tensor(trg_data_vocab(trg_tokens)).long(), trg_data_vocab)
        src_batch.append(src_tensor)
        #track length of each source sentence, not useful in this model. Will be useful in further models
        src_len.append(len(src_tensor))
        trg_batch.append(trg_tensor)
        # logging.warning(f'iteration {i}:\n {(src_tensor)}');
    src_len = torch.tensor(src_len, dtype = torch.int)
    src_batch = pad_sequence(src_batch, padding_value=src_data_vocab['<pad>'])
    trg_batch = pad_sequence(trg_batch, padding_value=trg_data_vocab['<pad>'])
    src_len, idx = torch.sort(src_len,descending=True)
    #src_len is not useful in this model
    # logging.warning(f'lsrc_batch:{len(src_batch)}')
    return src_batch, src_len, trg_batch

In [25]:
len(pairs_sequence)

5000

In [26]:
train_dataloader = DataLoader(pairs_sequence, batch_size=5, collate_fn=generate_batch)
sr, srlen, tg = (next(iter(train_dataloader)))
print(len(srlen))
# print(len(list(train_dataloader))) # it is grouped to 5 items per batch

5


In [27]:
sr

tensor([[   1,    1,    1,    1,    1],
        [   9,    4,    3,    4,    4],
        [ 155,   11, 1555,   11, 1288],
        [   7,    6,    5,    3,   22],
        [   3, 1287,    3, 1945,  338],
        [2666,    5, 1556,   26,    5],
        [1942,    3, 1085,   35,    3],
        [1943,   35,   26,   34,  242],
        [ 234,   34,   35,    2,   88],
        [   6,  242,   34,    0,   26],
        [2667,   88,   11,    0,   35],
        [   6,    2,  409,    0,   34],
        [1944,    0,    9,    0,    2],
        [1084,    0,   14,    0,    0],
        [   2,    0,  549,    0,    0],
        [   0,    0,    2,    0,    0]])

In [28]:
tg

tensor([[ 483,    6,    3,    6,    6],
        [1436,  738,  740, 1439, 1441],
        [1437,  739,  156,  741,  739],
        [   2,    5,    2,    5,    5],
        [   0, 1438,    0,  742,    3],
        [   0,    2,    0,    4,  743],
        [   0,    0,    0, 1440,  256],
        [   0,    0,    0,    2,    2]])

In [29]:
def epoch_time(start_time, end_time):
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time/60)
    elapsed_secs = int(elapsed_time - (elapsed_mins*60))
    
    return elapsed_mins, elapsed_secs

In [47]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
INPUT_DIM = len(src_data_vocab)
OUTPUT_DIM = len(trg_data_vocab)
ENC_EMB_DIM = 100 # 256//(2*2)
DEC_EMB_DIM = 100 # 256//(2*2)
HID_DIM = 512//(4*4)
N_LAYERS = 1
ENC_DROPOUT = 0.5
DEC_DROPOUT = 0.5
RNN_DROPOUT = 0

enc = Encoder(INPUT_DIM, ENC_EMB_DIM, HID_DIM, N_LAYERS, ENC_DROPOUT, RNN_DROPOUT, weights_matrix)
dec = Decoder(OUTPUT_DIM, DEC_EMB_DIM, HID_DIM, N_LAYERS, DEC_DROPOUT, RNN_DROPOUT)

model = Seq2Seq(enc, dec, device).to(device)

In [48]:
def init_weights(m):
    for name, param in m.named_parameters():
        if not name.startswith('encoder.embedding'):  # Exclude encoder embedding parameters
            nn.init.uniform_(param.data, -0.08, 0.08)
            # nn.init.zeros_(param.data)

model.apply(init_weights)

Seq2Seq(
  (encoder): Encoder(
    (embedding): Embedding(4507, 100)
    (rnn): LSTM(100, 32)
    (dropout): Dropout(p=0.5, inplace=False)
  )
  (decoder): Decoder(
    (embedding): Embedding(4084, 100)
    (rnn): LSTM(100, 32)
    (fc_out): Linear(in_features=32, out_features=4084, bias=True)
    (dropout): Dropout(p=0.5, inplace=False)
  )
)

In [49]:
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f'The model has {count_parameters(model):,} trainable parameters')

The model has 1,028,176 trainable parameters


In [53]:
# optimizer = optim.Adam(model.parameters())
# optimizer = optim.SGD(model.parameters(), lr=0.1)
optimizer = optim.Adamax(model.parameters())

# ignore the loss whenever the target token is a padding token.
TRG_PAD_IDX = trg_data_vocab['<pad>']
criterion = nn.CrossEntropyLoss(ignore_index=TRG_PAD_IDX)


In [51]:
# After multiple training using the adam optimizer and batch of 12 and 10 folds
# I have changed the batch size to 128
# changed the optimizer to sgd
# changed the fold to 20

# finally i have changed the lr in sgd from 0.01 to 0.1 but did not train to the end. i just saved the model
# after reloading the model, trained it for 40 epochs.
# then changed rhe optimizer to adamax with default learning rate

model.load_state_dict(torch.load('seq2seq_adam_stemmer_brown_embedding.pt'))

<All keys matched successfully>

In [57]:
N_EPOCHS = 40
CLIP = 1
BATCH_SIZE = 1

# speedup the training by reducing the size to grasp how the model is doing
half_length = len(pairs_sequence) // 1
cut_list = pairs_sequence[:half_length]

# Initialize K-Fold cross-validation
kf = KFold(n_splits=20, shuffle=True)

# Lists to store performance metrics for each fold
fold_metrics = []


# Loop through each fold
for fold_x, (train_indices, val_indices) in enumerate(kf.split(cut_list), 1):
    train_data = torch.utils.data.Subset(cut_list, train_indices)
    val_data = torch.utils.data.Subset(cut_list, val_indices)

    train_dataloader = DataLoader(train_data, batch_size=BATCH_SIZE, collate_fn=generate_batch)
    val_dataloader = DataLoader(val_data, batch_size=BATCH_SIZE, collate_fn=generate_batch)
    
    early_stop = ValidationLossEarlyStopping(patience=3, min_delta=0.001)
    best_val_loss = float('inf')
    # Training/Validation loop
    for epoch in range(N_EPOCHS):

        start_time = time.time()

        train_loss, answer_token = train(model, train_dataloader, optimizer, criterion, CLIP, trg_data_vocab)
        val_loss = evaluate(model, val_dataloader, criterion)

        end_time = time.time()

        epoch_mins, epoch_secs = epoch_time(start_time, end_time)

        if val_loss < best_val_loss:
            best_val_loss = val_loss
            torch.save(model.state_dict(), f'seq2seq_fold_{fold_x:02}.pt')
        print(f'Epoch: {epoch+1:02} | Time: {epoch_mins}m {epoch_secs}s')
        print(f'\tTrain Loss: {train_loss:.3f} | Train PPL: {math.exp(train_loss):7.3f}')
        print(f'\t Val. Loss: {val_loss:.3f} |  Val. PPL: {math.exp(val_loss):7.3f}')
        
        if(early_stop(val_loss)):
            print(f"Repeated slow change in validation loss for {early_stop.patience} times.")
            print(f"Early stopping at epoch {epoch+1:02} ...")
            # print(f"The last output of the decoder: {answer_token}")
            break # let 
    fold_metrics.append(val_loss)
    # break
    # create new model for the next fold
    model = Seq2Seq(enc, dec, device).to(device)
    model.apply(init_weights)

Training: 100%|██████████| 4750/4750 [00:32<00:00, 148.33it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 529.61it/s]


Epoch: 01 | Time: 0m 32s
	Train Loss: 1.099 | Train PPL:   3.001
	 Val. Loss: 0.847 |  Val. PPL:   2.332


Training: 100%|██████████| 4750/4750 [00:34<00:00, 138.97it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 441.47it/s]


Epoch: 02 | Time: 0m 34s
	Train Loss: 1.110 | Train PPL:   3.033
	 Val. Loss: 0.909 |  Val. PPL:   2.482


Training: 100%|██████████| 4750/4750 [00:33<00:00, 143.44it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 509.09it/s]


Epoch: 03 | Time: 0m 33s
	Train Loss: 1.118 | Train PPL:   3.060
	 Val. Loss: 0.988 |  Val. PPL:   2.686


Training: 100%|██████████| 4750/4750 [00:33<00:00, 142.84it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 574.71it/s]


Epoch: 04 | Time: 0m 33s
	Train Loss: 1.115 | Train PPL:   3.050
	 Val. Loss: 1.073 |  Val. PPL:   2.924
Repeated slow change in validation loss for 3 times.
Early stopping at epoch 04 ...


Training: 100%|██████████| 4750/4750 [00:31<00:00, 149.67it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 633.38it/s]


Epoch: 01 | Time: 0m 32s
	Train Loss: 3.864 | Train PPL:  47.634
	 Val. Loss: 3.764 |  Val. PPL:  43.134


Training: 100%|██████████| 4750/4750 [00:32<00:00, 145.90it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 544.17it/s]


Epoch: 02 | Time: 0m 33s
	Train Loss: 3.342 | Train PPL:  28.266
	 Val. Loss: 4.009 |  Val. PPL:  55.089


Training: 100%|██████████| 4750/4750 [00:30<00:00, 153.32it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 619.24it/s]


Epoch: 03 | Time: 0m 31s
	Train Loss: 3.234 | Train PPL:  25.384
	 Val. Loss: 4.258 |  Val. PPL:  70.662


Training: 100%|██████████| 4750/4750 [00:34<00:00, 137.14it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 571.20it/s]


Epoch: 04 | Time: 0m 35s
	Train Loss: 3.126 | Train PPL:  22.790
	 Val. Loss: 4.347 |  Val. PPL:  77.234
Repeated slow change in validation loss for 3 times.
Early stopping at epoch 04 ...


Training: 100%|██████████| 4750/4750 [00:32<00:00, 144.25it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 662.38it/s]


Epoch: 01 | Time: 0m 33s
	Train Loss: 3.866 | Train PPL:  47.761
	 Val. Loss: 3.881 |  Val. PPL:  48.451


Training: 100%|██████████| 4750/4750 [00:31<00:00, 151.37it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 744.92it/s]


Epoch: 02 | Time: 0m 31s
	Train Loss: 3.338 | Train PPL:  28.159
	 Val. Loss: 4.091 |  Val. PPL:  59.793


Training: 100%|██████████| 4750/4750 [00:32<00:00, 146.91it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 701.94it/s]


Epoch: 03 | Time: 0m 32s
	Train Loss: 3.205 | Train PPL:  24.663
	 Val. Loss: 4.204 |  Val. PPL:  66.969


Training: 100%|██████████| 4750/4750 [00:31<00:00, 150.76it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 630.32it/s]


Epoch: 04 | Time: 0m 31s
	Train Loss: 3.104 | Train PPL:  22.291
	 Val. Loss: 4.283 |  Val. PPL:  72.431
Repeated slow change in validation loss for 3 times.
Early stopping at epoch 04 ...


Training: 100%|██████████| 4750/4750 [00:30<00:00, 157.65it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 688.47it/s]


Epoch: 01 | Time: 0m 30s
	Train Loss: 3.886 | Train PPL:  48.708
	 Val. Loss: 3.689 |  Val. PPL:  40.004


Training: 100%|██████████| 4750/4750 [00:34<00:00, 135.89it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 561.48it/s]


Epoch: 02 | Time: 0m 35s
	Train Loss: 3.345 | Train PPL:  28.357
	 Val. Loss: 4.129 |  Val. PPL:  62.146


Training: 100%|██████████| 4750/4750 [00:31<00:00, 150.44it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 706.61it/s]


Epoch: 03 | Time: 0m 31s
	Train Loss: 3.218 | Train PPL:  24.984
	 Val. Loss: 4.442 |  Val. PPL:  84.961


Training: 100%|██████████| 4750/4750 [00:31<00:00, 151.64it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 564.81it/s]


Epoch: 04 | Time: 0m 31s
	Train Loss: 3.125 | Train PPL:  22.755
	 Val. Loss: 4.713 |  Val. PPL: 111.381
Repeated slow change in validation loss for 3 times.
Early stopping at epoch 04 ...


Training: 100%|██████████| 4750/4750 [00:34<00:00, 139.60it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 519.56it/s]


Epoch: 01 | Time: 0m 34s
	Train Loss: 3.880 | Train PPL:  48.434
	 Val. Loss: 3.800 |  Val. PPL:  44.693


Training: 100%|██████████| 4750/4750 [00:35<00:00, 135.48it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 527.36it/s]


Epoch: 02 | Time: 0m 35s
	Train Loss: 3.361 | Train PPL:  28.829
	 Val. Loss: 3.971 |  Val. PPL:  53.025


Training: 100%|██████████| 4750/4750 [00:29<00:00, 159.66it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 565.90it/s]


Epoch: 03 | Time: 0m 30s
	Train Loss: 3.230 | Train PPL:  25.287
	 Val. Loss: 4.140 |  Val. PPL:  62.779


Training: 100%|██████████| 4750/4750 [00:32<00:00, 148.33it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 497.37it/s]


Epoch: 04 | Time: 0m 32s
	Train Loss: 3.134 | Train PPL:  22.955
	 Val. Loss: 4.267 |  Val. PPL:  71.340
Repeated slow change in validation loss for 3 times.
Early stopping at epoch 04 ...


Training: 100%|██████████| 4750/4750 [00:28<00:00, 164.74it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 746.55it/s]


Epoch: 01 | Time: 0m 29s
	Train Loss: 3.889 | Train PPL:  48.884
	 Val. Loss: 3.949 |  Val. PPL:  51.894


Training: 100%|██████████| 4750/4750 [00:32<00:00, 146.73it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 718.10it/s]


Epoch: 02 | Time: 0m 32s
	Train Loss: 3.367 | Train PPL:  28.986
	 Val. Loss: 4.140 |  Val. PPL:  62.790


Training: 100%|██████████| 4750/4750 [00:32<00:00, 147.20it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 754.05it/s]


Epoch: 03 | Time: 0m 32s
	Train Loss: 3.230 | Train PPL:  25.268
	 Val. Loss: 4.347 |  Val. PPL:  77.283


Training: 100%|██████████| 4750/4750 [00:32<00:00, 144.62it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 743.56it/s]


Epoch: 04 | Time: 0m 33s
	Train Loss: 3.130 | Train PPL:  22.869
	 Val. Loss: 4.527 |  Val. PPL:  92.503
Repeated slow change in validation loss for 3 times.
Early stopping at epoch 04 ...


Training: 100%|██████████| 4750/4750 [00:32<00:00, 146.88it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 584.96it/s]


Epoch: 01 | Time: 0m 32s
	Train Loss: 3.881 | Train PPL:  48.461
	 Val. Loss: 4.041 |  Val. PPL:  56.887


Training: 100%|██████████| 4750/4750 [00:28<00:00, 165.51it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 597.24it/s]


Epoch: 02 | Time: 0m 29s
	Train Loss: 3.343 | Train PPL:  28.314
	 Val. Loss: 4.173 |  Val. PPL:  64.931


Training: 100%|██████████| 4750/4750 [00:31<00:00, 152.56it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 578.06it/s]


Epoch: 03 | Time: 0m 31s
	Train Loss: 3.218 | Train PPL:  24.985
	 Val. Loss: 4.264 |  Val. PPL:  71.074


Training: 100%|██████████| 4750/4750 [00:32<00:00, 145.34it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 569.33it/s]


Epoch: 04 | Time: 0m 33s
	Train Loss: 3.124 | Train PPL:  22.726
	 Val. Loss: 4.428 |  Val. PPL:  83.791
Repeated slow change in validation loss for 3 times.
Early stopping at epoch 04 ...


Training: 100%|██████████| 4750/4750 [00:32<00:00, 146.46it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 437.09it/s]


Epoch: 01 | Time: 0m 33s
	Train Loss: 3.861 | Train PPL:  47.525
	 Val. Loss: 4.198 |  Val. PPL:  66.578


Training: 100%|██████████| 4750/4750 [00:31<00:00, 152.52it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 444.26it/s]


Epoch: 02 | Time: 0m 31s
	Train Loss: 3.316 | Train PPL:  27.560
	 Val. Loss: 4.563 |  Val. PPL:  95.914


Training: 100%|██████████| 4750/4750 [00:31<00:00, 149.44it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 496.13it/s]


Epoch: 03 | Time: 0m 32s
	Train Loss: 3.196 | Train PPL:  24.425
	 Val. Loss: 4.805 |  Val. PPL: 122.090


Training: 100%|██████████| 4750/4750 [00:32<00:00, 145.97it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 450.57it/s]


Epoch: 04 | Time: 0m 33s
	Train Loss: 3.107 | Train PPL:  22.349
	 Val. Loss: 4.939 |  Val. PPL: 139.695
Repeated slow change in validation loss for 3 times.
Early stopping at epoch 04 ...


Training: 100%|██████████| 4750/4750 [00:31<00:00, 150.79it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 657.01it/s]


Epoch: 01 | Time: 0m 31s
	Train Loss: 3.889 | Train PPL:  48.845
	 Val. Loss: 4.072 |  Val. PPL:  58.680


Training: 100%|██████████| 4750/4750 [00:30<00:00, 154.32it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 732.31it/s]


Epoch: 02 | Time: 0m 31s
	Train Loss: 3.345 | Train PPL:  28.368
	 Val. Loss: 4.344 |  Val. PPL:  77.032


Training: 100%|██████████| 4750/4750 [00:32<00:00, 145.07it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 769.59it/s]


Epoch: 03 | Time: 0m 33s
	Train Loss: 3.228 | Train PPL:  25.233
	 Val. Loss: 4.626 |  Val. PPL: 102.071


Training: 100%|██████████| 4750/4750 [00:32<00:00, 144.09it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 671.96it/s]


Epoch: 04 | Time: 0m 33s
	Train Loss: 3.143 | Train PPL:  23.179
	 Val. Loss: 4.819 |  Val. PPL: 123.798
Repeated slow change in validation loss for 3 times.
Early stopping at epoch 04 ...


Training: 100%|██████████| 4750/4750 [00:32<00:00, 144.73it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 510.53it/s]


Epoch: 01 | Time: 0m 33s
	Train Loss: 3.877 | Train PPL:  48.289
	 Val. Loss: 3.824 |  Val. PPL:  45.782


Training: 100%|██████████| 4750/4750 [00:33<00:00, 142.55it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 502.21it/s]


Epoch: 02 | Time: 0m 33s
	Train Loss: 3.364 | Train PPL:  28.911
	 Val. Loss: 4.070 |  Val. PPL:  58.540


Training: 100%|██████████| 4750/4750 [00:31<00:00, 151.20it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 459.51it/s]


Epoch: 03 | Time: 0m 31s
	Train Loss: 3.239 | Train PPL:  25.503
	 Val. Loss: 4.294 |  Val. PPL:  73.293


Training: 100%|██████████| 4750/4750 [00:34<00:00, 135.95it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 522.18it/s]


Epoch: 04 | Time: 0m 35s
	Train Loss: 3.143 | Train PPL:  23.176
	 Val. Loss: 4.486 |  Val. PPL:  88.774
Repeated slow change in validation loss for 3 times.
Early stopping at epoch 04 ...


Training: 100%|██████████| 4750/4750 [00:36<00:00, 130.35it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 692.13it/s]


Epoch: 01 | Time: 0m 36s
	Train Loss: 3.845 | Train PPL:  46.777
	 Val. Loss: 4.038 |  Val. PPL:  56.719


Training: 100%|██████████| 4750/4750 [00:34<00:00, 136.75it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 723.21it/s]


Epoch: 02 | Time: 0m 35s
	Train Loss: 3.349 | Train PPL:  28.471
	 Val. Loss: 4.414 |  Val. PPL:  82.567


Training: 100%|██████████| 4750/4750 [00:34<00:00, 135.72it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 630.90it/s]


Epoch: 03 | Time: 0m 35s
	Train Loss: 3.219 | Train PPL:  25.003
	 Val. Loss: 4.600 |  Val. PPL:  99.464


Training: 100%|██████████| 4750/4750 [00:36<00:00, 130.58it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 746.38it/s]


Epoch: 04 | Time: 0m 36s
	Train Loss: 3.134 | Train PPL:  22.967
	 Val. Loss: 4.677 |  Val. PPL: 107.430
Repeated slow change in validation loss for 3 times.
Early stopping at epoch 04 ...


Training: 100%|██████████| 4750/4750 [00:34<00:00, 136.84it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 507.13it/s]


Epoch: 01 | Time: 0m 35s
	Train Loss: 3.873 | Train PPL:  48.089
	 Val. Loss: 3.942 |  Val. PPL:  51.524


Training: 100%|██████████| 4750/4750 [00:35<00:00, 133.88it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 523.95it/s]


Epoch: 02 | Time: 0m 35s
	Train Loss: 3.338 | Train PPL:  28.175
	 Val. Loss: 4.100 |  Val. PPL:  60.338


Training: 100%|██████████| 4750/4750 [00:35<00:00, 134.52it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 521.56it/s]


Epoch: 03 | Time: 0m 35s
	Train Loss: 3.227 | Train PPL:  25.215
	 Val. Loss: 4.348 |  Val. PPL:  77.340


Training: 100%|██████████| 4750/4750 [00:36<00:00, 130.49it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 429.84it/s]


Epoch: 04 | Time: 0m 36s
	Train Loss: 3.124 | Train PPL:  22.731
	 Val. Loss: 4.470 |  Val. PPL:  87.352
Repeated slow change in validation loss for 3 times.
Early stopping at epoch 04 ...


Training: 100%|██████████| 4750/4750 [00:35<00:00, 135.48it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 653.93it/s]


Epoch: 01 | Time: 0m 35s
	Train Loss: 3.887 | Train PPL:  48.749
	 Val. Loss: 3.694 |  Val. PPL:  40.194


Training: 100%|██████████| 4750/4750 [00:35<00:00, 132.96it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 726.80it/s]


Epoch: 02 | Time: 0m 36s
	Train Loss: 3.338 | Train PPL:  28.173
	 Val. Loss: 3.896 |  Val. PPL:  49.220


Training: 100%|██████████| 4750/4750 [00:35<00:00, 133.63it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 596.70it/s]


Epoch: 03 | Time: 0m 35s
	Train Loss: 3.206 | Train PPL:  24.680
	 Val. Loss: 4.024 |  Val. PPL:  55.928


Training: 100%|██████████| 4750/4750 [00:35<00:00, 134.98it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 726.26it/s]


Epoch: 04 | Time: 0m 35s
	Train Loss: 3.126 | Train PPL:  22.782
	 Val. Loss: 4.161 |  Val. PPL:  64.124
Repeated slow change in validation loss for 3 times.
Early stopping at epoch 04 ...


Training: 100%|██████████| 4750/4750 [00:35<00:00, 132.14it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 644.47it/s]


Epoch: 01 | Time: 0m 36s
	Train Loss: 3.849 | Train PPL:  46.947
	 Val. Loss: 3.858 |  Val. PPL:  47.389


Training: 100%|██████████| 4750/4750 [00:35<00:00, 133.90it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 676.08it/s]


Epoch: 02 | Time: 0m 35s
	Train Loss: 3.338 | Train PPL:  28.161
	 Val. Loss: 4.125 |  Val. PPL:  61.850


Training: 100%|██████████| 4750/4750 [00:36<00:00, 128.98it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 690.72it/s]


Epoch: 03 | Time: 0m 37s
	Train Loss: 3.231 | Train PPL:  25.310
	 Val. Loss: 4.218 |  Val. PPL:  67.906


Training: 100%|██████████| 4750/4750 [00:34<00:00, 135.97it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 596.75it/s]


Epoch: 04 | Time: 0m 35s
	Train Loss: 3.136 | Train PPL:  23.003
	 Val. Loss: 4.307 |  Val. PPL:  74.222
Repeated slow change in validation loss for 3 times.
Early stopping at epoch 04 ...


Training: 100%|██████████| 4750/4750 [00:34<00:00, 136.99it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 761.44it/s]


Epoch: 01 | Time: 0m 35s
	Train Loss: 3.886 | Train PPL:  48.719
	 Val. Loss: 3.599 |  Val. PPL:  36.554


Training: 100%|██████████| 4750/4750 [00:35<00:00, 134.04it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 787.58it/s]


Epoch: 02 | Time: 0m 35s
	Train Loss: 3.324 | Train PPL:  27.766
	 Val. Loss: 3.871 |  Val. PPL:  47.994


Training: 100%|██████████| 4750/4750 [00:34<00:00, 137.56it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 742.84it/s]


Epoch: 03 | Time: 0m 34s
	Train Loss: 3.214 | Train PPL:  24.876
	 Val. Loss: 4.059 |  Val. PPL:  57.906


Training: 100%|██████████| 4750/4750 [00:35<00:00, 135.56it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 698.90it/s]


Epoch: 04 | Time: 0m 35s
	Train Loss: 3.130 | Train PPL:  22.867
	 Val. Loss: 4.189 |  Val. PPL:  65.928
Repeated slow change in validation loss for 3 times.
Early stopping at epoch 04 ...


Training: 100%|██████████| 4750/4750 [00:35<00:00, 134.10it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 774.59it/s]


Epoch: 01 | Time: 0m 35s
	Train Loss: 3.880 | Train PPL:  48.435
	 Val. Loss: 3.983 |  Val. PPL:  53.674


Training: 100%|██████████| 4750/4750 [00:35<00:00, 133.37it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 756.16it/s]


Epoch: 02 | Time: 0m 35s
	Train Loss: 3.331 | Train PPL:  27.973
	 Val. Loss: 4.226 |  Val. PPL:  68.457


Training: 100%|██████████| 4750/4750 [00:33<00:00, 139.85it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 722.75it/s]


Epoch: 03 | Time: 0m 34s
	Train Loss: 3.214 | Train PPL:  24.875
	 Val. Loss: 4.475 |  Val. PPL:  87.793


Training: 100%|██████████| 4750/4750 [00:35<00:00, 135.15it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 770.70it/s]


Epoch: 04 | Time: 0m 35s
	Train Loss: 3.127 | Train PPL:  22.800
	 Val. Loss: 4.547 |  Val. PPL:  94.320
Repeated slow change in validation loss for 3 times.
Early stopping at epoch 04 ...


Training: 100%|██████████| 4750/4750 [00:35<00:00, 134.40it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 734.10it/s]


Epoch: 01 | Time: 0m 35s
	Train Loss: 3.895 | Train PPL:  49.142
	 Val. Loss: 3.933 |  Val. PPL:  51.036


Training: 100%|██████████| 4750/4750 [00:35<00:00, 134.28it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 655.16it/s]


Epoch: 02 | Time: 0m 35s
	Train Loss: 3.370 | Train PPL:  29.071
	 Val. Loss: 4.191 |  Val. PPL:  66.102


Training: 100%|██████████| 4750/4750 [00:34<00:00, 138.91it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 755.69it/s]


Epoch: 03 | Time: 0m 34s
	Train Loss: 3.245 | Train PPL:  25.671
	 Val. Loss: 4.404 |  Val. PPL:  81.762


Training: 100%|██████████| 4750/4750 [00:31<00:00, 149.08it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 788.81it/s]


Epoch: 04 | Time: 0m 32s
	Train Loss: 3.157 | Train PPL:  23.498
	 Val. Loss: 4.568 |  Val. PPL:  96.358
Repeated slow change in validation loss for 3 times.
Early stopping at epoch 04 ...


Training: 100%|██████████| 4750/4750 [00:29<00:00, 160.30it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 727.15it/s]


Epoch: 01 | Time: 0m 30s
	Train Loss: 3.885 | Train PPL:  48.651
	 Val. Loss: 3.889 |  Val. PPL:  48.863


Training: 100%|██████████| 4750/4750 [00:31<00:00, 152.93it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 554.18it/s]


Epoch: 02 | Time: 0m 31s
	Train Loss: 3.322 | Train PPL:  27.706
	 Val. Loss: 4.200 |  Val. PPL:  66.656


Training: 100%|██████████| 4750/4750 [00:30<00:00, 157.88it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 643.35it/s]


Epoch: 03 | Time: 0m 30s
	Train Loss: 3.195 | Train PPL:  24.408
	 Val. Loss: 4.384 |  Val. PPL:  80.148


Training: 100%|██████████| 4750/4750 [00:32<00:00, 145.65it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 652.19it/s]


Epoch: 04 | Time: 0m 33s
	Train Loss: 3.105 | Train PPL:  22.307
	 Val. Loss: 4.506 |  Val. PPL:  90.531
Repeated slow change in validation loss for 3 times.
Early stopping at epoch 04 ...


Training: 100%|██████████| 4750/4750 [00:30<00:00, 158.32it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 605.27it/s]


Epoch: 01 | Time: 0m 30s
	Train Loss: 3.870 | Train PPL:  47.959
	 Val. Loss: 3.641 |  Val. PPL:  38.146


Training: 100%|██████████| 4750/4750 [00:34<00:00, 137.50it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 543.70it/s]


Epoch: 02 | Time: 0m 35s
	Train Loss: 3.310 | Train PPL:  27.378
	 Val. Loss: 3.880 |  Val. PPL:  48.447


Training: 100%|██████████| 4750/4750 [00:34<00:00, 138.72it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 568.87it/s]


Epoch: 03 | Time: 0m 34s
	Train Loss: 3.190 | Train PPL:  24.297
	 Val. Loss: 4.109 |  Val. PPL:  60.890


Training: 100%|██████████| 4750/4750 [00:33<00:00, 143.84it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 587.76it/s]


Epoch: 04 | Time: 0m 33s
	Train Loss: 3.109 | Train PPL:  22.397
	 Val. Loss: 4.284 |  Val. PPL:  72.512
Repeated slow change in validation loss for 3 times.
Early stopping at epoch 04 ...


Training: 100%|██████████| 4750/4750 [00:33<00:00, 142.01it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 721.14it/s]


Epoch: 01 | Time: 0m 33s
	Train Loss: 3.883 | Train PPL:  48.583
	 Val. Loss: 3.952 |  Val. PPL:  52.047


Training: 100%|██████████| 4750/4750 [00:32<00:00, 146.88it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 738.30it/s]


Epoch: 02 | Time: 0m 32s
	Train Loss: 3.377 | Train PPL:  29.297
	 Val. Loss: 4.148 |  Val. PPL:  63.329


Training: 100%|██████████| 4750/4750 [00:31<00:00, 149.63it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 702.82it/s]


Epoch: 03 | Time: 0m 32s
	Train Loss: 3.242 | Train PPL:  25.579
	 Val. Loss: 4.455 |  Val. PPL:  86.045


Training: 100%|██████████| 4750/4750 [00:31<00:00, 149.70it/s]
Evaluation: 100%|██████████| 250/250 [00:00<00:00, 688.10it/s]

Epoch: 04 | Time: 0m 32s
	Train Loss: 3.134 | Train PPL:  22.967
	 Val. Loss: 4.734 |  Val. PPL: 113.707
Repeated slow change in validation loss for 3 times.
Early stopping at epoch 04 ...





In [45]:
torch.save(model.state_dict(), f'seq2seq_adam_stemmer_brown_embedding.pt')

In [58]:
# Compute the mean and standard deviation of the accuracy across folds
mean_accuracy = sum(fold_metrics) / len(fold_metrics)
std_accuracy = (sum((x - mean_accuracy) ** 2 for x in fold_metrics) / len(fold_metrics)) ** 0.5

print(f"Mean Accuracy: {mean_accuracy:.2f}")
print(f"Standard Deviation: {std_accuracy:.2f}")
print(f"All folds: {fold_metrics}")

Mean Accuracy: 4.32
Standard Deviation: 0.77
All folds: [1.0730895975941963, 4.346844787485898, 4.2826411010492595, 4.712956026250031, 4.267454868079163, 4.527238293468952, 4.428322538740002, 4.939457928681746, 4.818652681002393, 4.486097786943429, 4.676842145197093, 4.4699497792646286, 4.16081991638802, 4.307067255090922, 4.1885694899829105, 4.546688701670617, 4.5680717931110415, 4.505694039381109, 4.283751132718288, 4.7336242926204575]


In [62]:
enc = Encoder(INPUT_DIM, ENC_EMB_DIM, HID_DIM, N_LAYERS, ENC_DROPOUT, RNN_DROPOUT, weights_matrix)
dec = Decoder(OUTPUT_DIM, DEC_EMB_DIM, HID_DIM, N_LAYERS, DEC_DROPOUT, RNN_DROPOUT)

model = Seq2Seq(enc, dec, device).to(device)
model.load_state_dict(torch.load('seq2seq_fold_01.pt'))

<All keys matched successfully>

In [64]:
print("Type 'exit' to finish the chat.\n", "-"*30, '\n')
while (True):
    src = input("> ")
    if src.strip() == "exit":
        break
    chat(src, trg_data_vocab, model, 10)
    
# to whom did the virgin mary allegedly appear in 1858 in lourdes france

Type 'exit' to finish the chat.
 ------------------------------ 



>  what is the grotto at notre-dame?


[1, 4, 128, 2]
< and colleg 



>  exit
