# Testing a WordVec-RNN text generator

In this notebook we will load and test our Word-RNN text generator. This notebook has similiarities to the `word-vector-rnn-text-gen-testing.ipynb` notebook, but this time there is no training loop, we are just loading in a model and generating from it. 

First lets do some imports:

In [None]:
import torch
import random

import numpy as np
import torch.nn as nn
import torch.nn.functional as F
import torchtext.vocab as vocab

from torchtext.data.utils import get_tokenizer
from torch.distributions import Categorical

In PyTorch, there are different implementations for storing and processing data on different kinds of computer hardware. By default, all computers will work by training and running neural networks on the Central Processing Unit (CPU), which we can specify with `'cpu'`. 

If you have an NVIDIA Graphics Processing Unit (GPU) (and you have installed CUDA correctly and the correct version of PyTorch), then you can use the flag `'gpu'` which will make training your neural networks **much faster**. Most of you won't have powerful NVIDIA GPU's in yor laptops however. Don't worry if you don't, the notebooks we are using in this class will be designed to work on laptop CPU's. 

If you have an M1 or M2 processor on a Mac then you can use the device `'mps'` which will run on Apples accelerated Metal Performance Shaders (MPS) for potentially faster and more powerful training (though sometimes running on CPU can be faster). 

In [None]:
device = 'cpu'

#### Load word vectors

Here we will load in our word vectors. By default we are going to download only the top30k words from GloVe, this is for speed and efficiency. You can come back to this later and try loading in some other word vectors.

A torchtext Vectors class has two dictionaries in it `stoi` (string to index) and `itos` (index to string). These are equivalent to the dictionaries we made in the previous notebook `word_to_ix` and `ix_to_word`. For consistency we are going to assign variables with the same name to make this and the Week 7 `word-rnn-training.ipynb` notebooks easier to compare. 

In [None]:
# Uncomment the next line and use instead of the following line to use the full GloVe dictionary
# word_vectors = vocab.GloVe(name="6B",dim=100) 
word_vectors = vocab.Vectors(name = '../data/glove.6B.100d.top30k.txt')
tokenizer = get_tokenizer("basic_english")
wordvec_embeddings = nn.Embedding.from_pretrained(word_vectors.vectors)
embedding_dim = wordvec_embeddings.weight.shape[1]

# Get dictionaries from word_vectors class and 
# rename to be consistent with previous notebooks
word_to_ix = word_vectors.stoi
ix_to_word = word_vectors.itos

#### Set hyperparameters

This is where we specify our *hyperparameters*. We have less hyperparameters this time as we are dont need any of the training parameters. The `hidden_size` and `num_layers` parameters need to be the same as was set when the model was trained in the other notebook.

The temperature parameter can be used to control how random or conservative our precited characters will be. If we have a low temeprature (below 1) we will more often than not pick the most likely character. If the temperature is higher (than 1) our generated sequences will be more random. 

In [None]:
hidden_size = 512   # size of hidden state
num_layers = 3      # number of layers in LSTM layer stack
gen_seq_len = 100   # length of LSTM sequence
temperature = 1     # how random do we want our predictions to be
load_path = "wordvec_rnn_model.pt"

#### Defining the network 

Here we define our network the same. This code must be the same as the code used in the training notebook where we saved the model.

In [None]:
class RNN(nn.Module):
    def __init__(self, input_size, output_size, hidden_size, num_layers):
        super(RNN, self).__init__()
        self.rnn = nn.LSTM(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers)
        self.decoder = nn.Linear(hidden_size, output_size)
        self.tanh = nn.Tanh()
    
    def forward(self, input, hidden_state):
        output, hidden_state = self.rnn(input, hidden_state)
        output = self.decoder(output)
        output = self.tanh(output)
        return output, (hidden_state[0].detach(), hidden_state[1].detach())

#### Setting up network and optimiser

Here we will create an instantiation of our network `rnn`. We also need to define out loss function `loss_fn` and our `optimiser`, which is used to make make changes to the neural network weights in training. We have to make our data variable a PyTorch `tensor`. This is data type that we have to use with PyTorch so that our neural networks can read and process the data correctly. [PyTorch tensors](https://pytorch.org/docs/stable/tensors.html) have been designed to work in almost exactly the same way as [numpy arrays](https://numpy.org/doc/stable/reference/generated/numpy.array.html).

In [None]:
checkpoint = torch.load(load_path)

# Calculate vocab size
vocab_size = len(word_to_ix)

# Instantiate RNN
rnn = RNN(embedding_dim, embedding_dim, hidden_size, num_layers).to(device)

# Load model weights from checkpoint file 
rnn.load_state_dict(checkpoint['state_dict'])

#### Generate a random sequence

In [None]:
with torch.no_grad():
    hidden_state = None
    
    random_word = random.choice(list(ix_to_word))
    #Pick a random starting word
    random_start = np.array(word_vectors[random_word])
    
    # Convert to PyTorch Tensor
    input = torch.tensor(random_start)
    
    # Change dimensionality of tensor for PyTorch compatibility
    # For more info on this function see: https://stackoverflow.com/questions/57237352/what-does-unsqueeze-do-in-pytorch
    input = input.unsqueeze(0)
    print(input.shape)
    # Iterate over our sequence length
    for i in range(gen_seq_len):
        # Forward pass
            output, hidden_state = rnn(input, hidden_state)

            # Comput distances to all words
            dists = torch.norm(word_vectors.vectors - output[0], dim=1) 
            # Use softmax to convert to probabilities
            probs = F.softmax(1 - dists, dim=0)
            # Multiply probabilities by mask to only sample words from dataset
            probs = probs
            # Covert probabilities to probability distribution
            prob_dist = Categorical(probs)
            # Sample from probability distribution
            word_index  = prob_dist.sample()

            # Get the next word and print
            next_word = ix_to_word[word_index]
            print(next_word, end=' ')
            
            # The word vector for the next word is the next input
            input = word_vectors[next_word].unsqueeze(0)

#### Map string to indexes

Lets write a function where we can manually create our own starting sequence. We will take a string and use our `word_to_ix` dictionary to get the mapped numerical values. It is important to remember that **only the words in the original dataset** will be able to be **mapped into the index values for the model**. Try printing our `word_to_ix` to see all the avaiable words. Any words not in the original data will unfortunately be skipped:

In [None]:
def map_str_to_wordvec(input_str):
    wordvec_seq = []
    tokens = tokenizer(input_str) 
    for word in tokens:
        ix = word_to_ix.get(word, None)
        if ix is not None:
            wordvec_seq.append(word_vectors[word])
        else:
            print(f'The char {word} is not in the dictionary')
    # Convert list of tensors to one tensor
    return torch.stack(wordvec_seq).to(device)

#### Define new starting string

Now lets create our index list and convert it to a numpy array then pytorch tensor:

In [None]:
input_str = 'Row, row, row, your'
wordvec_seq = map_str_to_wordvec(input_str)
print(f'Our sequence of word_vecs is: {wordvec_seq.shape}')

#### Generate from randomly created starting sequence

Now lets have a go at generating from our own sequence. We need to have two loops here. The first passes each character into the model to update the **hidden state**, here we are not doing anything with the models predictions, just *conditioning* the model on our sequence. Once the model is conditioned on the sequence then we can start to make new generations from it in the second loop.

How do these predictions compare to the random generations? What happens when you put in a starting sequence that is very different to the original data? Try [changing the temperature parameter](#set-hyperparameters) to see how the effects the results.

In [None]:
with torch.no_grad():
    hidden_state = None

    print(input_str, end=' ')
    for i in range(wordvec_seq.shape[0]):
                
        # Convert to PyTorch Tensor
        input = wordvec_seq[i,:]

        # Reshape tensor
        input = input.unsqueeze(0)
        
        # Condition the model on starting sequence
        output, hidden_state = rnn(input, hidden_state)
        
    input = output

    # Iterate over our sequence length
    for i in range(gen_seq_len):
        # Forward pass
        output, hidden_state = rnn(input, hidden_state)
        
        # Construct categorical distribution and sample a word
        output = F.softmax(torch.squeeze(output), dim=0)
        dist = Categorical(output / temperature)
        index = dist.sample()
        
        # Print the sampled word
        print(ix_to_word[index.item()], end=' ')
        
        # Next input is current output
        input[0][0] = index.item()

Try experimenting with the `temperature` parameter. How does that impact the generated results?

This code does not mask for words in the original dataset. Therefore words not in the training data can be generated by this model. Can you see a difference in the vocabularly used compared the to Word RNN model from Week 7? 