Let's start by loading one of the models I trained previously

In [None]:
MODEL_NAME = 'fix.loss.300.noclip'

In [None]:
from train import LSTMLanguageModel, load_losses, plot_losses
from generate import print_pred, generate_sentences
import json
import torch
import data, train

In [None]:
model = LSTMLanguageModel.load(f'output/{MODEL_NAME}.pth')

In [None]:
losses = load_losses(f'output/{MODEL_NAME}_losses.txt')

In [None]:
plot_losses(losses, True)

Now let's feed a sentence into this model that ends in an end-of-sentence symbol. What probability does it give for the start-of-sentence symbol?

In [None]:
test_sentence = [data.START_OF_VERSE_TOKEN] + 'this is some test sentence'.split() + [data.END_OF_VERSE_TOKEN]

In [None]:
test_sentence_ids = [model.word_index[word] for word in test_sentence]

In [None]:
test_batch_ids = [test_sentence_ids]
test_batch_tensor = torch.tensor(test_batch_ids)
test_batch_lens = torch.tensor([len(test_sentence_ids)])
word_scores = model(test_batch_tensor, test_batch_lens)

In [None]:
sent_word_scores = word_scores[0]

In [None]:
next_word_scores = sent_word_scores[-1]

In [None]:
next_word_probs = torch.nn.functional.softmax(next_word_scores, dim=0)

In [None]:
index_word = train.invert_dict(model.word_index)

In [None]:
next_word_prob = {index_word[i]: prob.item() for i, prob in enumerate(next_word_probs)}

In [None]:
next_word_prob['and'], next_word_prob[data.START_OF_VERSE_TOKEN]

This is not what I expected. I expected that P(SOS|EOS) = 1, but I'm getting much higher probabilities for other words than for SOS.

Another question is: if I feed w1w2w3, is P(w2) in slot 1 the same as if I feed w1?

In [None]:
short_sentence = [data.START_OF_VERSE_TOKEN, 'this']
long_sentence = f'{data.START_OF_VERSE_TOKEN} this is a sentence {data.END_OF_VERSE_TOKEN}'.split()
short_seq, long_seq = [torch.tensor([[model.word_index[word] for word in sent]]) \
                       for sent in (short_sentence, long_sentence)]
short_seq_len, long_seq_len = [torch.tensor([len(sent)]) for sent in (short_sentence, long_sentence)]
short_pred = model(short_seq, short_seq_len)
long_pred = model(long_seq, long_seq_len)

In [None]:
assert(all([(abs(short_pred[0][1][i] - long_pred[0][1][i]) / short_pred[0][1][i]).item() < 0.0001 \
     for i in range(len(short_pred[0][1]))]))

In [None]:
assert(len(short_pred[0][1]) == len(long_pred[0][1]))

So yes, it seems that as long as the previous sequences are equal, the probabilities are equal. This means that we can feed a long sequence once, and get the probabilities for each slot.

Now let's compute the perplexity of a model. 

In [None]:
model.perplexity_loss_function = torch.nn.CrossEntropyLoss(
            ignore_index=model.word_index[data.PAD_TOKEN],
            reduction='sum'
        )

In [None]:
model.get_perplexity([f'this is a sentence'.split()], False)

But what we want to do is to compute the perplexity for every epoch on the validation dataset. This will allow us to monitor how it evolves with time