# RNN II: Classification

A typical RNN Setup is this:

<img src="imgs/Typical_RNN.png" />

With:
* Based on the task we can use different types of loss functions.
* But for this tutorial, A loss function of categorical cross entropy was chosen.
* We always chose a gradient-based optimizer algorithm.
* The embeddings are also trainable.

In this lecture we're going to focus on one type of classification, called RNN Classification, where we feedin a sequences of embeddings with the goal of classifying the sequence:

<img src="imgs/RNN_classification.png" />

## Name Classification Dataset

* Input: A sequence of letters (Human Names).
* Output: The Language of the Name (18 Unique languages).
 * Nader -> Arabic
 * Huie -> Chinese
 * Zhogin -> Russian
 
<img src="imgs/Names_to_language.png" />

We will use character level embeddings, we will feed the model the ASCII index of each input character, and that index will map to its corresponding embedding vector:

<img src="imgs/RNN_Model_Architecture.png" />

To get the ASCII index of a character we use the following function:

In [136]:
def name_to_ascii(name):
    '''
    Turns a natural string to its equivalent ASCII.
    Input: Name (String).
    Output: Input in ASCII, Number of characters.
    '''
    ASCII_characters = [ord(c) for c in name]
    return ASCII_characters, len(ASCII_characters)

# Model

* We have 3 layers:
 * The Embedding Layer
 * The RNN GRU Layer
 * The Dense Output Layer

In [137]:
import numpy as np
import torch
import ipdb
from torch import nn
from torch.autograd import Variable

In [138]:
class RNNameClassifier(nn.Module):
    '''
    An RNN that takes as Input a Person's Name and detects its Language.
    Input: String of Characters.
    Output: A Probability Distribution over the Corpus Languages (18).
    '''
    def __init__(self, input_size, hidden_size, output_size, n_layers=1):
        super(RNNameClassifier, self).__init__()
        
        self.hidden_size = hidden_size
        self.n_layers = n_layers
        
        # hidden_size = size of character embedding vectors = size of the hidden state.
        self.embedding = nn.Embedding(num_embeddings=input_size, embedding_dim=hidden_size)
        self.gru = nn.GRU(input_size=hidden_size, hidden_size=hidden_size, num_layers=n_layers)
        self.fc = nn.Linear(in_features=hidden_size, out_features=output_size)
    
    def forward(self, x):
        # Note: We run this all at once over the whole input sequence.
        
        # debugging starts here.
        #ipdb.set_trace()
        
        # input = B x S . size(0)
        batch_size = x.size(0)
        
        # Embedding S x B -> S x B x I (embedding size).
        # It seems that we need to reshape or view the input before feeding it.
        x = x.t()
        embedded = self.embedding(x)
        
        # Make a hidden.
        hidden = self._init_hidden(batch_size)
        output, hidden = self.gru(embedded, hidden)
        
        # use the last layer output as the fc's input.
        # no need to unpack, since we are going to use hidden
        fc_output = self.fc(hidden)
        
        return fc_output

    def _init_hidden(self, batch_size):
        # each example in the batch_size of inputs needs its initial hidden state.
        hidden = torch.zeros(self.n_layers, batch_size, self.hidden_size)
        return Variable(hidden)

Let's call the model:

In [139]:
VOCAB_SIZE = 65535  # ASCII ord() possible encodings. 
EMBEDDING_OUTPUT_SIZE = 100
N_CLASSES = 18

In [140]:
classifer = RNNameClassifier(input_size=VOCAB_SIZE, hidden_size=EMBEDDING_OUTPUT_SIZE, output_size=N_CLASSES)

In [141]:
arr, _ = name_to_ascii('akram')

In [142]:
inp = Variable(torch.LongTensor([arr])); inp.shape

torch.Size([1, 5])

In [143]:
out = classifer(inp)

### Padding
To feed inputs as batches, we need to add zeros to make them tensors of the same shape.

In [144]:
def pad_sequences(vectorized_seqs, seq_lengths):
    seq_tensor = torch.zeros((len(vectorized_seqs), seq_lengths.max())).long()
    for idx, (seq, seq_len) in enumerate(zip(vectorized_seqs, seq_lengths)):
        seq_tensor[idx, :seq_len] = torch.LongTensor(seq)
    return seq_tensor

In [145]:
# create necessary variables, lengths, and target.
def make_variables(names):
    sequence_and_length = [name_to_ascii(name) for name in names]
    vectorized_seqs = [sl[0] for sl in sequence_and_length]
    seq_lengths = torch.LongTensor([sl[1] for sl in sequence_and_length])
    return pad_sequences(vectorized_seqs, seq_lengths)

Let's test feeding a batch of names to the classifier:

In [146]:
names = ['adylov', 'solan', 'hard', 'san']

In [147]:
classifer = RNNameClassifier(input_size=VOCAB_SIZE, hidden_size=EMBEDDING_OUTPUT_SIZE, output_size=N_CLASSES)

In [148]:
inputs = make_variables(names)

In [149]:
out = classifer(inputs)

In [150]:
out.shape

torch.Size([1, 4, 18])

### For training

In [151]:
optimizer = torch.optim.Adam(classifer.parameters(), lr=0.001)

In [152]:
criterion = nn.CrossEntropyLoss()

In [153]:
#loss = criterion(output, target)
#loss.backward()
#optimizer.step()

### Practical Advice

You'll want to use the `torch.nn.utils.rnn.PackedSequence` method to process the packed sequences.

<img src="imgs/packed_RNNs.png" />

Another way to make our operations more efficient is by using GPUs (It's extremely easy to use them in PyTorch):

<img src="imgs/PyTorch_GPUs.png" />

---

# Example
## Embedding Example

In [101]:
# An embedding module of 5 tensors of size 3.
embedding_example = nn.Embedding(num_embeddings=65535, embedding_dim=100)

In [102]:
example_input = Variable(torch.LongTensor([[1,2,4,5,6]])); example_input.shape

torch.Size([1, 5])

In [103]:
arr, _ = name_to_ascii('akram')

In [104]:
inp = Variable(torch.LongTensor([arr])); inp.shape

torch.Size([1, 5])

In [107]:
embedding_example(inp).shape

torch.Size([1, 5, 100])

In [108]:
example_input.shape

torch.Size([1, 5])

---

# Exercices

<img src="imgs/RNN_exo.png" />