# LSTM Language Models

You guys probably very excited about ChatGPT.  In today class, we will be implementing a very simple language model, which is basically what ChatGPT is, but with a simple LSTM.  You will be surprised that it is not so difficult at all.

Paper that we base on is *Regularizing and Optimizing LSTM Language Models*, https://arxiv.org/abs/1708.02182

In [17]:
!pip uninstall -y torchtext torch

[0m

In [2]:
!pip install torch==2.2.0 torchtext==0.17.0

Collecting torch==2.2.0
  Downloading torch-2.2.0-cp311-cp311-manylinux1_x86_64.whl.metadata (25 kB)
Collecting torchtext==0.17.0
  Downloading torchtext-0.17.0-cp311-cp311-manylinux1_x86_64.whl.metadata (7.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch==2.2.0)
  Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-nccl-cu12==2.19.3 (from torch==2.2.0)
  Downloading nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl.metadata (1.8 kB)
Collecting triton==2.2.0 (from torch==2.2.0)
  Downloading triton-2.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.4 kB)
Collecting torchdata==0.7.1 (from torchtext==0.17.0)
  Downloading torchdata-0.7.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Downloading torch-2.2.0-cp311-cp311-manylinux1_x86_64.whl (755.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m755.5/755.5 MB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0

In [1]:
!pip show torch
!pip show torchtext

Name: torch
Version: 2.2.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /usr/local/lib/python3.11/dist-packages
Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-nccl-cu12, nvidia-nvtx-cu12, sympy, triton, typing-extensions
Required-by: accelerate, fastai, peft, sentence-transformers, timm, torchaudio, torchdata, torchtext, torchvision
Name: torchtext
Version: 0.17.0
Summary: Text utilities, models, transforms, and datasets for PyTorch.
Home-page: https://github.com/pytorch/text
Author: PyTorch Text Team
Author-email: packages@pytorch.org
License: BSD
Location: /usr/local/lib/python3.11/dist-packages
Requires: numpy, requests, torch, torchda

In [3]:
!pip install datasets

Collecting datasets
  Downloading datasets-3.2.0-py3-none-any.whl.metadata (20 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.9.0,>=2023.1.0 (from fsspec[http]<=2024.9.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.9.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-3.2.0-py3-none-any.whl (480 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m32.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading fsspec-2024.9.0-py3-none-any.whl 

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim

import torchtext, datasets, math
from tqdm import tqdm

In [2]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

cuda


In [3]:
SEED = 1234
torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

## 1. Load data - Star Wars Dataset

We will be using wikitext which contains a large corpus of text, perfect for language modeling task.  This time, we will use the `datasets` library from HuggingFace to load.

In [5]:
from google.colab import userdata

userdata.get('myamjechal-hf')

dataset = datasets.load_dataset('myamjechal/star-wars-dataset')

In [6]:
print(dataset)

DatasetDict({
    train: Dataset({
        features: ['text'],
        num_rows: 29529
    })
    validation: Dataset({
        features: ['text'],
        num_rows: 3282
    })
    test: Dataset({
        features: ['text'],
        num_rows: 3646
    })
})


In [7]:
print(dataset['train'].shape)

(29529, 1)


## 2. Preprocessing

### Tokenizing

Simply tokenize the given text to tokens.

In [8]:
tokenizer = torchtext.data.utils.get_tokenizer('basic_english')

tokenize_data = lambda example, tokenizer: {'tokens': tokenizer(example['text'])}

tokenized_dataset = dataset.map(tokenize_data, remove_columns=['text'], fn_kwargs={'tokenizer': tokenizer})

Map:   0%|          | 0/29529 [00:00<?, ? examples/s]

Map:   0%|          | 0/3282 [00:00<?, ? examples/s]

Map:   0%|          | 0/3646 [00:00<?, ? examples/s]

In [9]:
print(tokenized_dataset['train'][223]['tokens'])

['yoda']


### Numericalizing

We will tell torchtext to add any word that has occurred at least three times in the dataset to the vocabulary because otherwise it would be too big.  Also we shall make sure to add `unk` and `eos`.

In [10]:
vocab = torchtext.vocab.build_vocab_from_iterator(tokenized_dataset['train']['tokens'], min_freq=3)
vocab.insert_token('<unk>', 0)
vocab.insert_token('<eos>', 1)
vocab.set_default_index(vocab['<unk>'])

In [11]:
print(len(vocab))

5835


In [12]:
print(vocab.get_itos()[:10])

['<unk>', '<eos>', '.', 'the', ',', 'and', 'a', 'to', 'of', 'is']


## 3. Prepare the batch loader

### Prepare data

Given "Chaky loves eating at AIT", and "I really love deep learning", and given batch size = 3, we will get three batches of data "Chaky loves eating at", "AIT `<eos>` I really", "love deep learning `<eos>`".  

In [13]:
def get_data(dataset, vocab, batch_size):
    data = []
    for example in dataset:
        if example['tokens']:
            tokens = example['tokens'].append('<eos>')
            tokens = [vocab[token] for token in example['tokens']]
            data.extend(tokens)
    data = torch.LongTensor(data)
    num_batches = data.shape[0] // batch_size
    data = data[:num_batches * batch_size]
    data = data.view(batch_size, num_batches) #view vs. reshape (whether data is contiguous)
    return data #[batch size, seq len]

In [14]:
batch_size = 128
train_data = get_data(tokenized_dataset['train'], vocab, batch_size)
valid_data = get_data(tokenized_dataset['validation'], vocab, batch_size)
test_data  = get_data(tokenized_dataset['test'],  vocab, batch_size)

In [15]:
train_data.shape

torch.Size([128, 2451])

## 4. Modeling

<img src="https://github.com/MyaMjechal/nlp-a2-language-model/blob/main/figures/LM.png?raw=1" width=600>

In [16]:
class LSTMLanguageModel(nn.Module):
    def __init__(self, vocab_size, emb_dim, hid_dim, num_layers, dropout_rate):
        super().__init__()
        self.num_layers = num_layers
        self.hid_dim    = hid_dim
        self.emb_dim    = emb_dim

        self.embedding  = nn.Embedding(vocab_size, emb_dim)
        self.lstm       = nn.LSTM(emb_dim, hid_dim, num_layers=num_layers, dropout=dropout_rate, batch_first=True)
        self.dropout    = nn.Dropout(dropout_rate)
        self.fc         = nn.Linear(hid_dim, vocab_size)

        self.init_weights()

    def init_weights(self):
        init_range_emb = 0.1
        init_range_other = 1/math.sqrt(self.hid_dim)
        self.embedding.weight.data.uniform_(-init_range_emb, init_range_other)
        self.fc.weight.data.uniform_(-init_range_other, init_range_other)
        self.fc.bias.data.zero_()
        for i in range(self.num_layers):
            self.lstm.all_weights[i][0] = torch.FloatTensor(self.emb_dim,
                self.hid_dim).uniform_(-init_range_other, init_range_other) #We
            self.lstm.all_weights[i][1] = torch.FloatTensor(self.hid_dim,
                self.hid_dim).uniform_(-init_range_other, init_range_other) #Wh

    def init_hidden(self, batch_size, device):
        hidden = torch.zeros(self.num_layers, batch_size, self.hid_dim).to(device)
        cell   = torch.zeros(self.num_layers, batch_size, self.hid_dim).to(device)
        return hidden, cell

    def detach_hidden(self, hidden):
        hidden, cell = hidden
        hidden = hidden.detach() #not to be used for gradient computation
        cell   = cell.detach()
        return hidden, cell

    def forward(self, src, hidden):
        #src: [batch_size, seq len]
        embedding = self.dropout(self.embedding(src)) #harry potter is
        #embedding: [batch-size, seq len, emb dim]
        output, hidden = self.lstm(embedding, hidden)
        #ouput: [batch size, seq len, hid dim]
        #hidden: [num_layers * direction, seq len, hid_dim]
        output = self.dropout(output)
        prediction =self.fc(output)
        #prediction: [batch_size, seq_len, vocab_size]
        return prediction, hidden

## 5. Training

Follows very basic procedure.  One note is that some of the sequences that will be fed to the model may involve parts from different sequences in the original dataset or be a subset of one (depending on the decoding length). For this reason we will reset the hidden state every epoch, this is like assuming that the next batch of sequences is probably always a follow up on the previous in the original dataset.

In [17]:
vocab_size = len(vocab)
emb_dim = 1024                # 400 in the paper
hid_dim = 1024                # 1150 in the paper
num_layers = 2                # 3 in the paper
dropout_rate = 0.65
lr = 1e-3

In [18]:
model      = LSTMLanguageModel(vocab_size, emb_dim, hid_dim, num_layers, dropout_rate).to(device)
optimizer  = optim.Adam(model.parameters(), lr=lr)
criterion  = nn.CrossEntropyLoss()
num_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f'The model has {num_params:,} trainable parameters')

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

The model has 28,749,515 trainable parameters


In [19]:
def get_batch(data, seq_len, idx):
    #data #[batch size, bunch of tokens]
    src    = data[:, idx:idx+seq_len]
    target = data[:, idx+1:idx+seq_len+1]  #target simply is ahead of src by 1
    return src, target

In [20]:
def train(model, data, optimizer, criterion, batch_size, seq_len, clip, device):

    epoch_loss = 0
    model.train()
    # drop all batches that are not a multiple of seq_len
    # data #[batch size, seq len]
    num_batches = data.shape[-1]
    data = data[:, :num_batches - (num_batches -1) % seq_len]  #we need to -1 because we start at 0
    num_batches = data.shape[-1]

    #reset the hidden every epoch
    hidden = model.init_hidden(batch_size, device)

    for idx in tqdm(range(0, num_batches - 1, seq_len), desc='Training: ',leave=False):
        optimizer.zero_grad()

        #hidden does not need to be in the computational graph for efficiency
        hidden = model.detach_hidden(hidden)

        src, target = get_batch(data, seq_len, idx) #src, target: [batch size, seq len]
        src, target = src.to(device), target.to(device)
        batch_size = src.shape[0]
        prediction, hidden = model(src, hidden)

        #need to reshape because criterion expects pred to be 2d and target to be 1d
        prediction = prediction.reshape(batch_size * seq_len, -1)  #prediction: [batch size * seq len, vocab size]
        target = target.reshape(-1)
        loss = criterion(prediction, target)

        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), clip)
        optimizer.step()
        epoch_loss += loss.item() * seq_len
    return epoch_loss / num_batches

In [21]:
def evaluate(model, data, criterion, batch_size, seq_len, device):

    epoch_loss = 0
    model.eval()
    num_batches = data.shape[-1]
    data = data[:, :num_batches - (num_batches -1) % seq_len]
    num_batches = data.shape[-1]

    hidden = model.init_hidden(batch_size, device)

    with torch.no_grad():
        for idx in range(0, num_batches - 1, seq_len):
            hidden = model.detach_hidden(hidden)
            src, target = get_batch(data, seq_len, idx)
            src, target = src.to(device), target.to(device)
            batch_size= src.shape[0]

            prediction, hidden = model(src, hidden)
            prediction = prediction.reshape(batch_size * seq_len, -1)
            target = target.reshape(-1)

            loss = criterion(prediction, target)
            epoch_loss += loss.item() * seq_len
    return epoch_loss / num_batches

Here we will be using a `ReduceLROnPlateau` learning scheduler which decreases the learning rate by a factor, if the loss don't improve by a certain epoch.

In [22]:
n_epochs = 500
seq_len  = 50 #<----decoding length
clip    = 0.25

lr_scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.5, patience=0)

best_valid_loss = float('inf')

for epoch in range(n_epochs):
    train_loss = train(model, train_data, optimizer, criterion,
                batch_size, seq_len, clip, device)
    valid_loss = evaluate(model, valid_data, criterion, batch_size,
                seq_len, device)

    lr_scheduler.step(valid_loss)

    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'best-val-lstm_lm.pt')

    print(f'\tTrain Perplexity: {math.exp(train_loss):.3f}')
    print(f'\tValid Perplexity: {math.exp(valid_loss):.3f}')



	Train Perplexity: 505.565
	Valid Perplexity: 318.548




	Train Perplexity: 331.754
	Valid Perplexity: 235.359




	Train Perplexity: 237.145
	Valid Perplexity: 178.652




	Train Perplexity: 187.548
	Valid Perplexity: 146.012




	Train Perplexity: 153.295
	Valid Perplexity: 123.332




	Train Perplexity: 131.621
	Valid Perplexity: 110.165




	Train Perplexity: 116.358
	Valid Perplexity: 102.018




	Train Perplexity: 105.605
	Valid Perplexity: 96.323




	Train Perplexity: 97.313
	Valid Perplexity: 91.551




	Train Perplexity: 90.559
	Valid Perplexity: 87.921




	Train Perplexity: 85.025
	Valid Perplexity: 85.097




	Train Perplexity: 80.182
	Valid Perplexity: 82.940




	Train Perplexity: 76.124
	Valid Perplexity: 80.848




	Train Perplexity: 72.614
	Valid Perplexity: 79.187




	Train Perplexity: 69.357
	Valid Perplexity: 77.611




	Train Perplexity: 66.428
	Valid Perplexity: 77.034




	Train Perplexity: 63.673
	Valid Perplexity: 75.870




	Train Perplexity: 61.290
	Valid Perplexity: 75.027




	Train Perplexity: 58.989
	Valid Perplexity: 74.537




	Train Perplexity: 56.876
	Valid Perplexity: 73.572




	Train Perplexity: 54.938
	Valid Perplexity: 73.565




	Train Perplexity: 52.116
	Valid Perplexity: 72.406




	Train Perplexity: 50.628
	Valid Perplexity: 72.262




	Train Perplexity: 49.578
	Valid Perplexity: 72.109




	Train Perplexity: 48.526
	Valid Perplexity: 72.052




	Train Perplexity: 47.657
	Valid Perplexity: 71.893




	Train Perplexity: 46.726
	Valid Perplexity: 71.782




	Train Perplexity: 45.919
	Valid Perplexity: 71.744




	Train Perplexity: 45.010
	Valid Perplexity: 71.714




	Train Perplexity: 43.850
	Valid Perplexity: 71.981




	Train Perplexity: 43.082
	Valid Perplexity: 71.964




	Train Perplexity: 42.539
	Valid Perplexity: 71.958




	Train Perplexity: 42.466
	Valid Perplexity: 71.984




	Train Perplexity: 42.285
	Valid Perplexity: 71.919




	Train Perplexity: 42.240
	Valid Perplexity: 71.913




	Train Perplexity: 42.202
	Valid Perplexity: 71.918




	Train Perplexity: 42.229
	Valid Perplexity: 71.924




	Train Perplexity: 42.216
	Valid Perplexity: 71.924




	Train Perplexity: 42.155
	Valid Perplexity: 71.924




	Train Perplexity: 42.155
	Valid Perplexity: 71.923




	Train Perplexity: 42.149
	Valid Perplexity: 71.922




	Train Perplexity: 42.197
	Valid Perplexity: 71.922




	Train Perplexity: 42.162
	Valid Perplexity: 71.922




	Train Perplexity: 42.163
	Valid Perplexity: 71.922




	Train Perplexity: 42.151
	Valid Perplexity: 71.922




	Train Perplexity: 42.169
	Valid Perplexity: 71.922




	Train Perplexity: 42.137
	Valid Perplexity: 71.922




	Train Perplexity: 42.161
	Valid Perplexity: 71.922




	Train Perplexity: 42.209
	Valid Perplexity: 71.922




	Train Perplexity: 42.229
	Valid Perplexity: 71.922




	Train Perplexity: 42.196
	Valid Perplexity: 71.922




	Train Perplexity: 42.211
	Valid Perplexity: 71.922




	Train Perplexity: 42.113
	Valid Perplexity: 71.922




	Train Perplexity: 42.187
	Valid Perplexity: 71.922




	Train Perplexity: 42.111
	Valid Perplexity: 71.922




	Train Perplexity: 42.075
	Valid Perplexity: 71.922




	Train Perplexity: 42.162
	Valid Perplexity: 71.922




	Train Perplexity: 42.172
	Valid Perplexity: 71.922




	Train Perplexity: 42.249
	Valid Perplexity: 71.922




	Train Perplexity: 42.233
	Valid Perplexity: 71.922




	Train Perplexity: 42.180
	Valid Perplexity: 71.922




	Train Perplexity: 42.213
	Valid Perplexity: 71.922




	Train Perplexity: 42.145
	Valid Perplexity: 71.922




	Train Perplexity: 42.230
	Valid Perplexity: 71.922




	Train Perplexity: 42.190
	Valid Perplexity: 71.922




	Train Perplexity: 42.085
	Valid Perplexity: 71.922




	Train Perplexity: 42.242
	Valid Perplexity: 71.922




	Train Perplexity: 42.201
	Valid Perplexity: 71.922




	Train Perplexity: 42.210
	Valid Perplexity: 71.922




	Train Perplexity: 42.136
	Valid Perplexity: 71.922




	Train Perplexity: 42.148
	Valid Perplexity: 71.922




	Train Perplexity: 42.190
	Valid Perplexity: 71.922




	Train Perplexity: 42.224
	Valid Perplexity: 71.922




	Train Perplexity: 42.239
	Valid Perplexity: 71.922




	Train Perplexity: 42.118
	Valid Perplexity: 71.922




	Train Perplexity: 42.086
	Valid Perplexity: 71.922




	Train Perplexity: 42.131
	Valid Perplexity: 71.922




	Train Perplexity: 42.194
	Valid Perplexity: 71.922




	Train Perplexity: 42.167
	Valid Perplexity: 71.922




	Train Perplexity: 42.167
	Valid Perplexity: 71.922




	Train Perplexity: 42.095
	Valid Perplexity: 71.922




	Train Perplexity: 42.128
	Valid Perplexity: 71.922




	Train Perplexity: 42.186
	Valid Perplexity: 71.922




	Train Perplexity: 42.168
	Valid Perplexity: 71.922




	Train Perplexity: 42.187
	Valid Perplexity: 71.922




	Train Perplexity: 42.119
	Valid Perplexity: 71.922




	Train Perplexity: 42.200
	Valid Perplexity: 71.922




	Train Perplexity: 42.156
	Valid Perplexity: 71.922




	Train Perplexity: 42.158
	Valid Perplexity: 71.922




	Train Perplexity: 42.149
	Valid Perplexity: 71.922




	Train Perplexity: 42.203
	Valid Perplexity: 71.922




	Train Perplexity: 42.130
	Valid Perplexity: 71.922




	Train Perplexity: 42.105
	Valid Perplexity: 71.922




	Train Perplexity: 42.116
	Valid Perplexity: 71.922




	Train Perplexity: 42.193
	Valid Perplexity: 71.922




	Train Perplexity: 42.306
	Valid Perplexity: 71.922




	Train Perplexity: 42.195
	Valid Perplexity: 71.922




	Train Perplexity: 42.123
	Valid Perplexity: 71.922




	Train Perplexity: 42.250
	Valid Perplexity: 71.922




	Train Perplexity: 42.163
	Valid Perplexity: 71.922




	Train Perplexity: 42.212
	Valid Perplexity: 71.922




	Train Perplexity: 42.169
	Valid Perplexity: 71.922




	Train Perplexity: 42.114
	Valid Perplexity: 71.922




	Train Perplexity: 42.183
	Valid Perplexity: 71.922




	Train Perplexity: 42.174
	Valid Perplexity: 71.922




	Train Perplexity: 42.106
	Valid Perplexity: 71.922




	Train Perplexity: 42.189
	Valid Perplexity: 71.922




	Train Perplexity: 42.177
	Valid Perplexity: 71.922




	Train Perplexity: 42.170
	Valid Perplexity: 71.922




	Train Perplexity: 42.169
	Valid Perplexity: 71.922




	Train Perplexity: 42.165
	Valid Perplexity: 71.922




	Train Perplexity: 42.129
	Valid Perplexity: 71.922




	Train Perplexity: 42.124
	Valid Perplexity: 71.922




	Train Perplexity: 42.174
	Valid Perplexity: 71.922




	Train Perplexity: 42.164
	Valid Perplexity: 71.922




	Train Perplexity: 42.115
	Valid Perplexity: 71.922




	Train Perplexity: 42.181
	Valid Perplexity: 71.922




	Train Perplexity: 42.097
	Valid Perplexity: 71.922




	Train Perplexity: 42.141
	Valid Perplexity: 71.922




	Train Perplexity: 42.123
	Valid Perplexity: 71.922




	Train Perplexity: 42.153
	Valid Perplexity: 71.922




	Train Perplexity: 42.206
	Valid Perplexity: 71.922




	Train Perplexity: 42.150
	Valid Perplexity: 71.922




	Train Perplexity: 42.112
	Valid Perplexity: 71.922




	Train Perplexity: 42.142
	Valid Perplexity: 71.922




	Train Perplexity: 42.067
	Valid Perplexity: 71.922




	Train Perplexity: 42.171
	Valid Perplexity: 71.922




	Train Perplexity: 42.133
	Valid Perplexity: 71.922




	Train Perplexity: 42.187
	Valid Perplexity: 71.922




	Train Perplexity: 42.166
	Valid Perplexity: 71.922




	Train Perplexity: 42.201
	Valid Perplexity: 71.922




	Train Perplexity: 42.139
	Valid Perplexity: 71.922




	Train Perplexity: 42.201
	Valid Perplexity: 71.922




	Train Perplexity: 42.175
	Valid Perplexity: 71.922




	Train Perplexity: 42.210
	Valid Perplexity: 71.922




	Train Perplexity: 42.163
	Valid Perplexity: 71.922




	Train Perplexity: 42.055
	Valid Perplexity: 71.922




	Train Perplexity: 42.171
	Valid Perplexity: 71.922




	Train Perplexity: 42.127
	Valid Perplexity: 71.922




	Train Perplexity: 42.131
	Valid Perplexity: 71.922




	Train Perplexity: 42.183
	Valid Perplexity: 71.922




	Train Perplexity: 42.206
	Valid Perplexity: 71.922




	Train Perplexity: 42.166
	Valid Perplexity: 71.922




	Train Perplexity: 42.119
	Valid Perplexity: 71.922




	Train Perplexity: 42.260
	Valid Perplexity: 71.922




	Train Perplexity: 42.120
	Valid Perplexity: 71.922




	Train Perplexity: 42.291
	Valid Perplexity: 71.922




	Train Perplexity: 42.208
	Valid Perplexity: 71.922




	Train Perplexity: 42.109
	Valid Perplexity: 71.922




	Train Perplexity: 42.132
	Valid Perplexity: 71.922




	Train Perplexity: 42.147
	Valid Perplexity: 71.922




	Train Perplexity: 42.215
	Valid Perplexity: 71.922




	Train Perplexity: 42.199
	Valid Perplexity: 71.922




	Train Perplexity: 42.216
	Valid Perplexity: 71.922




	Train Perplexity: 42.109
	Valid Perplexity: 71.922




	Train Perplexity: 42.175
	Valid Perplexity: 71.922




	Train Perplexity: 42.127
	Valid Perplexity: 71.922




	Train Perplexity: 42.195
	Valid Perplexity: 71.922




	Train Perplexity: 42.195
	Valid Perplexity: 71.922




	Train Perplexity: 42.165
	Valid Perplexity: 71.922




	Train Perplexity: 42.170
	Valid Perplexity: 71.922




	Train Perplexity: 42.236
	Valid Perplexity: 71.922




	Train Perplexity: 42.142
	Valid Perplexity: 71.922




	Train Perplexity: 42.191
	Valid Perplexity: 71.922




	Train Perplexity: 42.054
	Valid Perplexity: 71.922




	Train Perplexity: 42.216
	Valid Perplexity: 71.922




	Train Perplexity: 42.148
	Valid Perplexity: 71.922




	Train Perplexity: 42.154
	Valid Perplexity: 71.922




	Train Perplexity: 42.116
	Valid Perplexity: 71.922




	Train Perplexity: 42.118
	Valid Perplexity: 71.922




	Train Perplexity: 42.126
	Valid Perplexity: 71.922




	Train Perplexity: 42.258
	Valid Perplexity: 71.922




	Train Perplexity: 42.063
	Valid Perplexity: 71.922




	Train Perplexity: 42.121
	Valid Perplexity: 71.922




	Train Perplexity: 42.205
	Valid Perplexity: 71.922




	Train Perplexity: 42.193
	Valid Perplexity: 71.922




	Train Perplexity: 42.236
	Valid Perplexity: 71.922




	Train Perplexity: 42.092
	Valid Perplexity: 71.922




	Train Perplexity: 42.242
	Valid Perplexity: 71.922




	Train Perplexity: 42.151
	Valid Perplexity: 71.922




	Train Perplexity: 42.148
	Valid Perplexity: 71.922




	Train Perplexity: 42.159
	Valid Perplexity: 71.922




	Train Perplexity: 42.130
	Valid Perplexity: 71.922




	Train Perplexity: 42.182
	Valid Perplexity: 71.923




	Train Perplexity: 42.083
	Valid Perplexity: 71.923




	Train Perplexity: 42.190
	Valid Perplexity: 71.923




	Train Perplexity: 42.181
	Valid Perplexity: 71.922




	Train Perplexity: 42.098
	Valid Perplexity: 71.923




	Train Perplexity: 42.188
	Valid Perplexity: 71.922




	Train Perplexity: 42.096
	Valid Perplexity: 71.923




	Train Perplexity: 42.102
	Valid Perplexity: 71.923




	Train Perplexity: 42.181
	Valid Perplexity: 71.923




	Train Perplexity: 42.154
	Valid Perplexity: 71.923




	Train Perplexity: 42.074
	Valid Perplexity: 71.923




	Train Perplexity: 42.155
	Valid Perplexity: 71.923




	Train Perplexity: 42.264
	Valid Perplexity: 71.923




	Train Perplexity: 42.149
	Valid Perplexity: 71.923




	Train Perplexity: 42.150
	Valid Perplexity: 71.923




	Train Perplexity: 42.180
	Valid Perplexity: 71.923




	Train Perplexity: 42.090
	Valid Perplexity: 71.923




	Train Perplexity: 42.188
	Valid Perplexity: 71.923




	Train Perplexity: 42.088
	Valid Perplexity: 71.923




	Train Perplexity: 42.207
	Valid Perplexity: 71.923




	Train Perplexity: 42.084
	Valid Perplexity: 71.923




	Train Perplexity: 42.112
	Valid Perplexity: 71.923




	Train Perplexity: 42.132
	Valid Perplexity: 71.923




	Train Perplexity: 42.194
	Valid Perplexity: 71.923




	Train Perplexity: 42.141
	Valid Perplexity: 71.923




	Train Perplexity: 42.262
	Valid Perplexity: 71.923




	Train Perplexity: 42.164
	Valid Perplexity: 71.923




	Train Perplexity: 42.192
	Valid Perplexity: 71.923




	Train Perplexity: 42.092
	Valid Perplexity: 71.923




	Train Perplexity: 42.092
	Valid Perplexity: 71.923




	Train Perplexity: 42.110
	Valid Perplexity: 71.923




	Train Perplexity: 42.175
	Valid Perplexity: 71.923




	Train Perplexity: 42.111
	Valid Perplexity: 71.923




	Train Perplexity: 42.073
	Valid Perplexity: 71.923




	Train Perplexity: 42.133
	Valid Perplexity: 71.923




	Train Perplexity: 42.227
	Valid Perplexity: 71.923




	Train Perplexity: 42.245
	Valid Perplexity: 71.923




	Train Perplexity: 42.154
	Valid Perplexity: 71.923




	Train Perplexity: 42.124
	Valid Perplexity: 71.923




	Train Perplexity: 42.209
	Valid Perplexity: 71.923




	Train Perplexity: 42.114
	Valid Perplexity: 71.923




	Train Perplexity: 42.144
	Valid Perplexity: 71.923




	Train Perplexity: 42.131
	Valid Perplexity: 71.923




	Train Perplexity: 42.136
	Valid Perplexity: 71.923




	Train Perplexity: 42.152
	Valid Perplexity: 71.923




	Train Perplexity: 42.120
	Valid Perplexity: 71.923




	Train Perplexity: 42.141
	Valid Perplexity: 71.923




	Train Perplexity: 42.204
	Valid Perplexity: 71.923




	Train Perplexity: 42.206
	Valid Perplexity: 71.923




	Train Perplexity: 42.178
	Valid Perplexity: 71.923




	Train Perplexity: 42.203
	Valid Perplexity: 71.923




	Train Perplexity: 42.163
	Valid Perplexity: 71.923




	Train Perplexity: 42.162
	Valid Perplexity: 71.923




	Train Perplexity: 42.181
	Valid Perplexity: 71.923




	Train Perplexity: 42.066
	Valid Perplexity: 71.923




	Train Perplexity: 42.169
	Valid Perplexity: 71.923




	Train Perplexity: 42.100
	Valid Perplexity: 71.923




	Train Perplexity: 42.154
	Valid Perplexity: 71.923




	Train Perplexity: 42.186
	Valid Perplexity: 71.923




	Train Perplexity: 42.189
	Valid Perplexity: 71.923




	Train Perplexity: 42.174
	Valid Perplexity: 71.923




	Train Perplexity: 42.192
	Valid Perplexity: 71.923




	Train Perplexity: 42.126
	Valid Perplexity: 71.923




	Train Perplexity: 42.149
	Valid Perplexity: 71.923




	Train Perplexity: 42.102
	Valid Perplexity: 71.923




	Train Perplexity: 42.185
	Valid Perplexity: 71.923




	Train Perplexity: 42.172
	Valid Perplexity: 71.923




	Train Perplexity: 42.139
	Valid Perplexity: 71.923




	Train Perplexity: 42.134
	Valid Perplexity: 71.923




	Train Perplexity: 42.097
	Valid Perplexity: 71.923




	Train Perplexity: 42.185
	Valid Perplexity: 71.923




	Train Perplexity: 42.097
	Valid Perplexity: 71.923




	Train Perplexity: 42.100
	Valid Perplexity: 71.923




	Train Perplexity: 42.151
	Valid Perplexity: 71.922




	Train Perplexity: 42.158
	Valid Perplexity: 71.922




	Train Perplexity: 42.199
	Valid Perplexity: 71.922




	Train Perplexity: 42.199
	Valid Perplexity: 71.923




	Train Perplexity: 42.181
	Valid Perplexity: 71.923




	Train Perplexity: 42.170
	Valid Perplexity: 71.923




	Train Perplexity: 42.120
	Valid Perplexity: 71.923




	Train Perplexity: 42.068
	Valid Perplexity: 71.923




	Train Perplexity: 42.101
	Valid Perplexity: 71.923




	Train Perplexity: 42.175
	Valid Perplexity: 71.923




	Train Perplexity: 42.127
	Valid Perplexity: 71.923




	Train Perplexity: 42.114
	Valid Perplexity: 71.923




	Train Perplexity: 42.104
	Valid Perplexity: 71.923




	Train Perplexity: 42.127
	Valid Perplexity: 71.923




	Train Perplexity: 42.087
	Valid Perplexity: 71.923




	Train Perplexity: 42.205
	Valid Perplexity: 71.923




	Train Perplexity: 42.174
	Valid Perplexity: 71.923




	Train Perplexity: 42.194
	Valid Perplexity: 71.923




	Train Perplexity: 42.180
	Valid Perplexity: 71.923




	Train Perplexity: 42.122
	Valid Perplexity: 71.923




	Train Perplexity: 42.096
	Valid Perplexity: 71.923




	Train Perplexity: 42.195
	Valid Perplexity: 71.923




	Train Perplexity: 42.199
	Valid Perplexity: 71.923




	Train Perplexity: 42.127
	Valid Perplexity: 71.923




	Train Perplexity: 42.131
	Valid Perplexity: 71.923




	Train Perplexity: 42.111
	Valid Perplexity: 71.923




	Train Perplexity: 42.131
	Valid Perplexity: 71.923




	Train Perplexity: 42.149
	Valid Perplexity: 71.923




	Train Perplexity: 42.114
	Valid Perplexity: 71.923




	Train Perplexity: 42.169
	Valid Perplexity: 71.923




	Train Perplexity: 42.176
	Valid Perplexity: 71.923




	Train Perplexity: 42.064
	Valid Perplexity: 71.923




	Train Perplexity: 42.203
	Valid Perplexity: 71.923




	Train Perplexity: 42.129
	Valid Perplexity: 71.923




	Train Perplexity: 42.222
	Valid Perplexity: 71.923




	Train Perplexity: 41.997
	Valid Perplexity: 71.923




	Train Perplexity: 42.204
	Valid Perplexity: 71.923




	Train Perplexity: 42.112
	Valid Perplexity: 71.923




	Train Perplexity: 42.168
	Valid Perplexity: 71.923




	Train Perplexity: 42.281
	Valid Perplexity: 71.923




	Train Perplexity: 42.216
	Valid Perplexity: 71.923




	Train Perplexity: 42.180
	Valid Perplexity: 71.923




	Train Perplexity: 42.196
	Valid Perplexity: 71.923




	Train Perplexity: 42.224
	Valid Perplexity: 71.923




	Train Perplexity: 42.187
	Valid Perplexity: 71.923




	Train Perplexity: 42.008
	Valid Perplexity: 71.923




	Train Perplexity: 42.081
	Valid Perplexity: 71.923




	Train Perplexity: 42.094
	Valid Perplexity: 71.923




	Train Perplexity: 42.215
	Valid Perplexity: 71.923




	Train Perplexity: 42.114
	Valid Perplexity: 71.923




	Train Perplexity: 42.268
	Valid Perplexity: 71.923




	Train Perplexity: 42.133
	Valid Perplexity: 71.923




	Train Perplexity: 42.147
	Valid Perplexity: 71.923




	Train Perplexity: 42.191
	Valid Perplexity: 71.923




	Train Perplexity: 42.209
	Valid Perplexity: 71.923




	Train Perplexity: 42.136
	Valid Perplexity: 71.923




	Train Perplexity: 42.090
	Valid Perplexity: 71.923




	Train Perplexity: 42.134
	Valid Perplexity: 71.923




	Train Perplexity: 42.239
	Valid Perplexity: 71.923




	Train Perplexity: 42.119
	Valid Perplexity: 71.923




	Train Perplexity: 42.179
	Valid Perplexity: 71.923




	Train Perplexity: 42.118
	Valid Perplexity: 71.923




	Train Perplexity: 42.240
	Valid Perplexity: 71.923




	Train Perplexity: 42.150
	Valid Perplexity: 71.923




	Train Perplexity: 42.073
	Valid Perplexity: 71.923




	Train Perplexity: 42.101
	Valid Perplexity: 71.923




	Train Perplexity: 42.128
	Valid Perplexity: 71.923




	Train Perplexity: 42.152
	Valid Perplexity: 71.923




	Train Perplexity: 42.153
	Valid Perplexity: 71.923




	Train Perplexity: 42.128
	Valid Perplexity: 71.923




	Train Perplexity: 42.003
	Valid Perplexity: 71.923




	Train Perplexity: 42.125
	Valid Perplexity: 71.924




	Train Perplexity: 42.192
	Valid Perplexity: 71.924




	Train Perplexity: 42.074
	Valid Perplexity: 71.924




	Train Perplexity: 42.121
	Valid Perplexity: 71.924




	Train Perplexity: 42.073
	Valid Perplexity: 71.924




	Train Perplexity: 42.147
	Valid Perplexity: 71.924




	Train Perplexity: 42.217
	Valid Perplexity: 71.923




	Train Perplexity: 42.166
	Valid Perplexity: 71.923




	Train Perplexity: 42.172
	Valid Perplexity: 71.923




	Train Perplexity: 42.156
	Valid Perplexity: 71.923




	Train Perplexity: 42.175
	Valid Perplexity: 71.923




	Train Perplexity: 42.163
	Valid Perplexity: 71.924




	Train Perplexity: 42.108
	Valid Perplexity: 71.924




	Train Perplexity: 42.201
	Valid Perplexity: 71.923




	Train Perplexity: 42.088
	Valid Perplexity: 71.924




	Train Perplexity: 42.059
	Valid Perplexity: 71.923




	Train Perplexity: 42.111
	Valid Perplexity: 71.923




	Train Perplexity: 42.115
	Valid Perplexity: 71.923




	Train Perplexity: 42.120
	Valid Perplexity: 71.923




	Train Perplexity: 42.167
	Valid Perplexity: 71.923




	Train Perplexity: 42.224
	Valid Perplexity: 71.923




	Train Perplexity: 42.232
	Valid Perplexity: 71.923




	Train Perplexity: 42.164
	Valid Perplexity: 71.923




	Train Perplexity: 42.155
	Valid Perplexity: 71.923




	Train Perplexity: 42.200
	Valid Perplexity: 71.923




	Train Perplexity: 42.163
	Valid Perplexity: 71.923




	Train Perplexity: 42.095
	Valid Perplexity: 71.923




	Train Perplexity: 42.230
	Valid Perplexity: 71.923




	Train Perplexity: 42.226
	Valid Perplexity: 71.923




	Train Perplexity: 42.159
	Valid Perplexity: 71.923




	Train Perplexity: 42.170
	Valid Perplexity: 71.923




	Train Perplexity: 42.142
	Valid Perplexity: 71.924




	Train Perplexity: 42.173
	Valid Perplexity: 71.924




	Train Perplexity: 42.234
	Valid Perplexity: 71.924




	Train Perplexity: 42.114
	Valid Perplexity: 71.924




	Train Perplexity: 42.105
	Valid Perplexity: 71.924




	Train Perplexity: 42.193
	Valid Perplexity: 71.924




	Train Perplexity: 42.126
	Valid Perplexity: 71.924




	Train Perplexity: 42.125
	Valid Perplexity: 71.924




	Train Perplexity: 42.184
	Valid Perplexity: 71.924




	Train Perplexity: 41.995
	Valid Perplexity: 71.924




	Train Perplexity: 42.140
	Valid Perplexity: 71.924




	Train Perplexity: 42.115
	Valid Perplexity: 71.924




	Train Perplexity: 42.131
	Valid Perplexity: 71.924




	Train Perplexity: 42.214
	Valid Perplexity: 71.924




	Train Perplexity: 42.219
	Valid Perplexity: 71.924




	Train Perplexity: 42.156
	Valid Perplexity: 71.924




	Train Perplexity: 42.169
	Valid Perplexity: 71.924




	Train Perplexity: 42.129
	Valid Perplexity: 71.924




	Train Perplexity: 42.107
	Valid Perplexity: 71.924




	Train Perplexity: 42.190
	Valid Perplexity: 71.924




	Train Perplexity: 42.140
	Valid Perplexity: 71.924




	Train Perplexity: 42.185
	Valid Perplexity: 71.924




	Train Perplexity: 42.151
	Valid Perplexity: 71.924




	Train Perplexity: 42.028
	Valid Perplexity: 71.924




	Train Perplexity: 42.230
	Valid Perplexity: 71.924




	Train Perplexity: 42.212
	Valid Perplexity: 71.924




	Train Perplexity: 42.208
	Valid Perplexity: 71.924




	Train Perplexity: 42.174
	Valid Perplexity: 71.924




	Train Perplexity: 42.068
	Valid Perplexity: 71.924




	Train Perplexity: 42.191
	Valid Perplexity: 71.924




	Train Perplexity: 42.132
	Valid Perplexity: 71.924




	Train Perplexity: 42.138
	Valid Perplexity: 71.924




	Train Perplexity: 42.139
	Valid Perplexity: 71.924




	Train Perplexity: 42.197
	Valid Perplexity: 71.924




	Train Perplexity: 42.091
	Valid Perplexity: 71.924




	Train Perplexity: 42.141
	Valid Perplexity: 71.924




	Train Perplexity: 42.081
	Valid Perplexity: 71.924




	Train Perplexity: 42.204
	Valid Perplexity: 71.924




	Train Perplexity: 42.149
	Valid Perplexity: 71.924




	Train Perplexity: 42.200
	Valid Perplexity: 71.924




	Train Perplexity: 42.091
	Valid Perplexity: 71.924




	Train Perplexity: 42.193
	Valid Perplexity: 71.924




	Train Perplexity: 42.211
	Valid Perplexity: 71.924




	Train Perplexity: 42.125
	Valid Perplexity: 71.924




	Train Perplexity: 42.151
	Valid Perplexity: 71.924




	Train Perplexity: 42.139
	Valid Perplexity: 71.924




	Train Perplexity: 42.121
	Valid Perplexity: 71.924




	Train Perplexity: 42.255
	Valid Perplexity: 71.924




	Train Perplexity: 42.170
	Valid Perplexity: 71.924




	Train Perplexity: 42.207
	Valid Perplexity: 71.924




	Train Perplexity: 42.087
	Valid Perplexity: 71.924




	Train Perplexity: 42.133
	Valid Perplexity: 71.924




	Train Perplexity: 42.152
	Valid Perplexity: 71.924




	Train Perplexity: 42.122
	Valid Perplexity: 71.924




	Train Perplexity: 42.192
	Valid Perplexity: 71.924




	Train Perplexity: 42.060
	Valid Perplexity: 71.924




	Train Perplexity: 42.130
	Valid Perplexity: 71.924




	Train Perplexity: 42.202
	Valid Perplexity: 71.924




	Train Perplexity: 42.191
	Valid Perplexity: 71.924




	Train Perplexity: 42.142
	Valid Perplexity: 71.924




	Train Perplexity: 42.159
	Valid Perplexity: 71.924




	Train Perplexity: 42.168
	Valid Perplexity: 71.924




	Train Perplexity: 42.125
	Valid Perplexity: 71.924




	Train Perplexity: 42.231
	Valid Perplexity: 71.924




	Train Perplexity: 42.210
	Valid Perplexity: 71.924




	Train Perplexity: 42.060
	Valid Perplexity: 71.924




	Train Perplexity: 42.155
	Valid Perplexity: 71.924




	Train Perplexity: 42.131
	Valid Perplexity: 71.924




	Train Perplexity: 42.114
	Valid Perplexity: 71.924




	Train Perplexity: 42.148
	Valid Perplexity: 71.924




	Train Perplexity: 42.146
	Valid Perplexity: 71.924




	Train Perplexity: 42.039
	Valid Perplexity: 71.924




	Train Perplexity: 42.081
	Valid Perplexity: 71.924




	Train Perplexity: 42.195
	Valid Perplexity: 71.924




	Train Perplexity: 42.236
	Valid Perplexity: 71.924




	Train Perplexity: 42.117
	Valid Perplexity: 71.924




	Train Perplexity: 42.050
	Valid Perplexity: 71.924




	Train Perplexity: 42.078
	Valid Perplexity: 71.924




	Train Perplexity: 42.175
	Valid Perplexity: 71.924




	Train Perplexity: 42.267
	Valid Perplexity: 71.924




	Train Perplexity: 42.063
	Valid Perplexity: 71.924




	Train Perplexity: 42.217
	Valid Perplexity: 71.924




	Train Perplexity: 42.196
	Valid Perplexity: 71.924




	Train Perplexity: 42.182
	Valid Perplexity: 71.924




	Train Perplexity: 42.017
	Valid Perplexity: 71.924




	Train Perplexity: 42.202
	Valid Perplexity: 71.924




	Train Perplexity: 42.143
	Valid Perplexity: 71.924




	Train Perplexity: 42.115
	Valid Perplexity: 71.924




	Train Perplexity: 42.180
	Valid Perplexity: 71.924




	Train Perplexity: 42.147
	Valid Perplexity: 71.924




	Train Perplexity: 42.069
	Valid Perplexity: 71.924




	Train Perplexity: 42.178
	Valid Perplexity: 71.924




	Train Perplexity: 42.204
	Valid Perplexity: 71.924




	Train Perplexity: 42.109
	Valid Perplexity: 71.924




	Train Perplexity: 42.091
	Valid Perplexity: 71.924




	Train Perplexity: 42.055
	Valid Perplexity: 71.924




	Train Perplexity: 42.113
	Valid Perplexity: 71.924




	Train Perplexity: 42.180
	Valid Perplexity: 71.924




	Train Perplexity: 42.179
	Valid Perplexity: 71.924




	Train Perplexity: 42.076
	Valid Perplexity: 71.924




	Train Perplexity: 42.217
	Valid Perplexity: 71.924




	Train Perplexity: 42.245
	Valid Perplexity: 71.924




	Train Perplexity: 42.129
	Valid Perplexity: 71.924




	Train Perplexity: 42.161
	Valid Perplexity: 71.924




	Train Perplexity: 42.170
	Valid Perplexity: 71.924




	Train Perplexity: 42.115
	Valid Perplexity: 71.924




	Train Perplexity: 42.159
	Valid Perplexity: 71.924




	Train Perplexity: 42.253
	Valid Perplexity: 71.924




	Train Perplexity: 42.109
	Valid Perplexity: 71.924




	Train Perplexity: 42.201
	Valid Perplexity: 71.924




	Train Perplexity: 42.167
	Valid Perplexity: 71.924




	Train Perplexity: 42.242
	Valid Perplexity: 71.924




	Train Perplexity: 42.152
	Valid Perplexity: 71.924




	Train Perplexity: 42.153
	Valid Perplexity: 71.924




	Train Perplexity: 42.181
	Valid Perplexity: 71.924




	Train Perplexity: 42.151
	Valid Perplexity: 71.924




	Train Perplexity: 42.283
	Valid Perplexity: 71.924




	Train Perplexity: 42.191
	Valid Perplexity: 71.924




	Train Perplexity: 42.132
	Valid Perplexity: 71.924




	Train Perplexity: 42.175
	Valid Perplexity: 71.924




	Train Perplexity: 42.107
	Valid Perplexity: 71.924




	Train Perplexity: 42.118
	Valid Perplexity: 71.924




	Train Perplexity: 42.084
	Valid Perplexity: 71.924




	Train Perplexity: 42.125
	Valid Perplexity: 71.924




	Train Perplexity: 42.119
	Valid Perplexity: 71.924




	Train Perplexity: 42.121
	Valid Perplexity: 71.924




	Train Perplexity: 42.270
	Valid Perplexity: 71.924




	Train Perplexity: 42.114
	Valid Perplexity: 71.924




	Train Perplexity: 42.136
	Valid Perplexity: 71.924




	Train Perplexity: 42.103
	Valid Perplexity: 71.924




	Train Perplexity: 42.199
	Valid Perplexity: 71.924




	Train Perplexity: 42.125
	Valid Perplexity: 71.924




	Train Perplexity: 42.156
	Valid Perplexity: 71.924




	Train Perplexity: 42.147
	Valid Perplexity: 71.924




	Train Perplexity: 42.212
	Valid Perplexity: 71.924




	Train Perplexity: 42.130
	Valid Perplexity: 71.924




	Train Perplexity: 42.168
	Valid Perplexity: 71.924




	Train Perplexity: 42.172
	Valid Perplexity: 71.924




	Train Perplexity: 42.206
	Valid Perplexity: 71.924




	Train Perplexity: 42.184
	Valid Perplexity: 71.924




	Train Perplexity: 42.149
	Valid Perplexity: 71.925




	Train Perplexity: 42.125
	Valid Perplexity: 71.924


## 6. Testing

In [23]:
model.load_state_dict(torch.load('best-val-lstm_lm.pt',  map_location=device))
test_loss = evaluate(model, test_data, criterion, batch_size, seq_len, device)
print(f'Test Perplexity: {math.exp(test_loss):.3f}')

Test Perplexity: 67.968


## 7. Real-world inference

Here we take the prompt, tokenize, encode and feed it into the model to get the predictions.  We then apply softmax while specifying that we want the output due to the last word in the sequence which represents the prediction for the next word.  We divide the logits by a temperature value to alter the model’s confidence by adjusting the softmax probability distribution.

Once we have the Softmax distribution, we randomly sample it to make our prediction on the next word. If we get <unk> then we give that another try.  Once we get <eos> we stop predicting.
    
We decode the prediction back to strings last lines.

In [24]:
def generate(prompt, max_seq_len, temperature, model, tokenizer, vocab, device, seed=None):
    if seed is not None:
        torch.manual_seed(seed)
    model.eval()
    tokens = tokenizer(prompt)
    indices = [vocab[t] for t in tokens]
    batch_size = 1
    hidden = model.init_hidden(batch_size, device)
    with torch.no_grad():
        for i in range(max_seq_len):
            src = torch.LongTensor([indices]).to(device)
            prediction, hidden = model(src, hidden)

            #prediction: [batch size, seq len, vocab size]
            #prediction[:, -1]: [batch size, vocab size] #probability of last vocab

            probs = torch.softmax(prediction[:, -1] / temperature, dim=-1)
            prediction = torch.multinomial(probs, num_samples=1).item()

            while prediction == vocab['<unk>']: #if it is unk, we sample again
                prediction = torch.multinomial(probs, num_samples=1).item()

            if prediction == vocab['<eos>']:    #if it is eos, we stop
                break

            indices.append(prediction) #autoregressive, thus output becomes input

    itos = vocab.get_itos()
    tokens = [itos[i] for i in indices]
    return tokens

In [27]:
prompt = 'Yoda is '
max_seq_len = 30
seed = 0

#smaller the temperature, more diverse tokens but comes
#with a tradeoff of less-make-sense sentence
temperatures = [0.5, 0.7, 0.75, 0.8, 1.0]
for temperature in temperatures:
    generation = generate(prompt, max_seq_len, temperature, model, tokenizer,
                          vocab, device, seed)
    print(str(temperature)+'\n'+' '.join(generation)+'\n')

0.5
yoda is the best

0.7
yoda is the best

0.75
yoda is the best

0.8
yoda is the best

1.0
yoda is sitting down

