# Recurrent Neural Networks

Recurrent Neural Networks (RNN) are a family of models designed in order to model sequences of data (e.g. video, text). In this tutorial (adapted from [here](https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html)), we will see how to **predict the language of a name** using an RNN model, taking single word characters as input. 

Specifically, we will train the network on a list of surnames from 18 languages of origin, and predict which language a name is from based on the spelling:

```
$ python predict.py Hinton
(0.63) Scottish
(0.22) English
(0.02) Irish

$ python predict.py Schmidhuber
(0.83) German
(0.08) Czech
(0.07) Dutch
```

## Downloading and preparing the data

Let's start by downloading the zipped data provided in the original tutorial, and let's then extract it in our environment 

In [None]:
!wget https://download.pytorch.org/tutorial/data.zip
!unzip data.zip

In the folder we can find 18 text files named as "[Language].txt". Each file contains a series of names, one name per line. In the following, we will take care of data preprocessing by:

* extracting names and numbers of categories from the files
* converting each name from Unicode to ASCII encoding
* defining a dictionary containing all names (values) of a given language (key)

In [None]:
import glob
import unicodedata
import string

all_filenames = glob.glob('data/names/*.txt')
all_letters = string.ascii_letters + " .,;'"
n_letters = len(all_letters)

# turn a Unicode string to plain ASCII, thanks to http://stackoverflow.com/a/518232/2809427
def unicode_to_ascii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
        and c in all_letters
    )

print(unicode_to_ascii('Ślusàrski'))

# build the category_lines dictionary
# keys are the languages, and values are list of names for that language
category_lines = {}
all_categories = []

# read a file and split into lines
def readLines(filename):
    lines = open(filename).read().strip().split('\n')
    return [unicode_to_ascii(line) for line in lines]

for filename in all_filenames:
    # extract the name of the language
    category = filename.split('/')[-1].split('.')[0]
    # read the names of that language
    lines = readLines(filename)
    # append to the list and add to the dictionary
    all_categories.append(category)
    category_lines[category] = lines

n_categories = len(all_categories)
print('n_categories =', n_categories)

## Encoding words into Tensors

A crucial issue in this task lies in how how to define the input of the network. Since the network treats numbers - and not plain text - we must **convert text to numerical representation**. With this purpose, we represent each letter as a **one-hot vector** of size `(1, n_letters)`. A one-hot vector is filled with 0s, except for a 1 at the index of the current letter, e.g. `"b" = <0 1 0 0 0 ...>`.

In order to build a word, we join these character representations into a 2D matrix `(line_length, 1, n_letters)`.

That extra 1 dimension is due to the fact that PyTorch assumes the input to be divided in batches: we're just using a batch size of 1 here.

In [None]:
import torch
import torch.nn as nn
  
# just for demonstration, encode a character into a (1, n_letters) tensor
def letter_to_tensor(letter):
    tensor = torch.zeros(1, n_letters)
    letter_index = all_letters.find(letter)
    tensor[0, letter_index] = 1
    return tensor


# encode a line into a (line_length, n_letters) one-hot tensor,
# (or (line_length, 1, n_letters) if the batch dimension is added)
# a line is a sequence of characters
def line_to_tensor(line, add_batch_dimension=True):
    tensor = torch.zeros(len(line), n_letters)
    for line_index, letter in enumerate(line):
        letter_index = all_letters.find(letter)
        tensor[line_index, letter_index] = 1

    if add_batch_dimension:
      tensor = tensor.unsqueeze(1)

    return tensor
  
  
# create a batch of samples given a list of lines
# that is, a list of character sequences
def create_batch(lines):
    tensors = []
    for current_line in lines:
      # current_line_tensor is (line_length, n_letters)
      current_line_tensor = line_to_tensor(current_line, add_batch_dimension=False)
      tensors.append(current_line_tensor)
    
    # since each line_tensor may have a different line_length, we pad each
    # line to the length of the longest sequence
    padded_tensor = torch.nn.utils.rnn.pad_sequence(tensors, batch_first=False, padding_value=0)
    # padded_tensor is (max_line_length, batch_size, n_letters)
    return padded_tensor

## Building the Network

We want to define a simple recurrent neural network. The newtork should have a recurrent layer followed by a fully connected layer mapping the features of the recurrent unit to the output space (i.e. number of categories).

To run a step of this network, we need to provide an input (in our case, the tensor for the current sequence/s) and a previous hidden state (which we initialize as zeros at first). We'll get back the logits (i.e. network activation before the softmax) for each each language.


In [None]:
# create a simple recurrent network      
class SimpleRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleRNN, self).__init__()
        
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        
        self.i2h = nn.RNN(input_size, hidden_size)
        self.i2o = nn.Linear(hidden_size, output_size)
    
    # Forward the whole sequence at once
    def forward(self, input, hidden=None):
        if hidden==None:
          hidden = self.init_hidden(input.shape[1])
          
        output, _ = self.i2h(input, hidden)
        # only the features extracted at the end of the sequence are used
        # to produce the output
        output = self.i2o(output[-1])
        
        return output

    # Instantiate the hidden state of the first element of the sequence dim: 1 x batch_size x hidden_size)
    def init_hidden(self,shape=1):
        return torch.zeros(1, shape, self.hidden_size)
      
      
# create a simple LSTM network
class SimpleLSTM(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleLSTM, self).__init__()
        
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        
        self.i2h = nn.LSTM(input_size, hidden_size)
        self.i2o = nn.Linear(hidden_size, output_size)
        
    def forward(self, input, hidden=None, cell=None):
        if hidden==None:
          hidden = self.init_hidden(input.shape[1])
          
        if cell==None:
          cell = self.init_hidden(input.shape[1])
          
        output, (_,_)= self.i2h(input, (hidden,cell))
        # only the features extracted at the end of the sequence are used
        # to produce the output
        output = self.i2o(output[-1])
        
        return output

    def init_hidden(self,shape=1):
        return torch.zeros(1, shape, self.hidden_size)
      
    def init_cell(self,shape=1):
        return torch.zeros(1, shape, self.hidden_size)
      
      
# implement a simple RNN using cells
class SimpleRNNwithCell(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleRNNwithCell, self).__init__()
        
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        
        self.i2h = nn.RNNCell(input_size, hidden_size)
        self.i2o = nn.Linear(hidden_size, output_size)
    
    def forward(self, input, hidden=None):
        
        if hidden==None:
          hidden = self.init_hidden(input.shape[1])
        
        # manually feed each sequence element to the RNN cell
        for i in range(input.shape[0]):
          hidden = self.i2h(input[i],hidden)

        output = self.i2o(hidden)
          
        return output

    def init_hidden(self,shape=1):
        return torch.zeros(shape, self.hidden_size)


# Preparing for training

Before going into training we should make a few helper functions. The first one should interpret the output of the network, which we know to be a logits of each category. We can use `Tensor.topk` to get the index of the greatest value:

In [None]:
def category_from_output(output):
    # returns top_k_values, top_k_indices
    top_values, top_idx = output.data.topk(1)
    category_idx = top_idx[0][0]  # gets the index for the top value of the first batch element
    return all_categories[category_idx], category_idx

We want a quick way to get a training example (a name and its language):

In [None]:
import random

def random_training_pair(bs=1):
    lines = []
    categories = []

    # each batch element is a random line from a random language
    for b in range(bs):
      category = random.choice(all_categories)
      line = random.choice(category_lines[category])
      
      lines.append(line)
      categories.append(category)
      
    # build the ground truth labels
    categories_tensor = torch.LongTensor([all_categories.index(c) for c in categories])
    # use our previous helper function to build the batch
    # from a list of sequences of characters
    lines_tensor = create_batch(lines)
    
    return categories_tensor, lines_tensor

## Training the network

Now all it takes to train this network is showing it a bunch of examples, have it making guesses, and tell it if it's wrong.

Since the output of the networks consists of logits - and the task is classification - we can use a standard cross-entropy loss.

In [None]:
criterion = nn.CrossEntropyLoss()

Now we instantiate a standard training loop where we will:

*   forward the input to the network
*   compute the loss
*   perform backpropagation
*   make a step with the optimizer
*   reset the optimizer/network's grad

In [None]:
def train(rnn, optimizer, categories_tensor, lines_tensor):

    optimizer.zero_grad()    
    output = rnn(lines_tensor)

    loss = criterion(output, categories_tensor)
    loss.backward()

    optimizer.step()

    return output, loss.item()

Now we just have to:
* instatiate the network
* instatiate the optimizer
* run the training step for a given number of iterations

In [None]:
# initialize the network:
n_hidden = 128
rnn = SimpleRNN(n_letters, n_hidden, n_categories)

# initialize the optimizer
learning_rate = 0.005 # Example: different LR could work better
optimizer = torch.optim.SGD(rnn.parameters(), lr=learning_rate)

# initialize the training loop
batch_size = 2
n_iterations = 100000
print_every = 5000

# keep track of the losse
current_loss = 0

for iter in range(1, n_iterations + 1):
    # get a random training input and target
    category_tensor, line_tensor = random_training_pair(bs=batch_size)
    
    # perform the training step
    output, loss = train(rnn, optimizer, category_tensor, line_tensor)
    
    # accumulate loss for printing
    current_loss += loss
    
    # print iteration number and loss
    if iter % print_every == 0:
        print('%d %d%% %.4f ' % (iter, iter / n_iterations * 100, current_loss/print_every))
        current_loss = 0


## Trying it out

Finally, following the original tutorial [in the Practical PyTorch repo](https://github.com/spro/practical-pytorch/tree/master/char-rnn-classification) we instantiate a prediction function and test on some user defined inputs.

In [None]:
normalizer = torch.nn.Softmax(dim=-1)

def predict(input_line, n_predictions=3):
    print('\n> %s' % input_line)
    output = rnn(line_to_tensor(input_line))
    output = normalizer(output)
    # get top N categories
    top_values, top_index = output.data.topk(n_predictions, 1, True)
    predictions = []

    for i in range(n_predictions):
        value = top_values[0][i]  # 0 indexes the first batch element
        category_index = top_index[0][i]
        print('(%.2f) %s' % (value, all_categories[category_index]))
        predictions.append([value, all_categories[category_index]])

predict('Dovesky')
predict('Jackson')
predict('Satoshi')
