## [NLP FROM SCRATCH: GENERATING NAMES WITH A CHARACTER-LEVEL RNN](https://pytorch.org/tutorials/intermediate/char_rnn_generation_tutorial.html#nlp-from-scratch-generating-names-with-a-character-level-rnn)

#### We are still hand-crafting a small RNN with a few linear layers. The big difference is instead of predicting a category after reading in all the letters of a name, we input a category and output one letter at a time. Recurrently predicting characters to form language (this could also be done with words or other higher order constructs) is often referred to as a “language model”.

In [1]:
from __future__ import unicode_literals, print_function, division
from io import open

In [2]:
import os
from glob import glob

In [3]:
import string
import unicodedata

In [4]:
home = os.environ['HOME']
data_dir = f"{home}/torch/data/"
names_dir = data_dir + "names/"

In [5]:
all_letters = string.ascii_letters +" .,;'-"

In [6]:
n_letters =  len(all_letters) + 1

In [7]:
def unicodeToAscii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
        and c in all_letters
    )

In [8]:
def readLines(filename):
    lines = open(filename, encoding='utf-8').read().strip().split('\n')
    return [unicodeToAscii(line) for line in lines]

In [9]:
category_lines = {}
all_categories = []

In [10]:
def findFiles(path): return glob(path)

In [11]:
for filename in findFiles('/home/drclab/torch/data/names/*.txt'):
    category = os.path.splitext(os.path.basename(filename))[0]
    all_categories.append(category)
    lines = readLines(filename)
    category_lines[category] = lines

In [12]:
n_categories = len(all_categories)

_____

To represent a single letter, we use a “one-hot vector” of size <1 x n_letters>. A one-hot vector is filled with 0s except for a 1 at index of the current letter, e.g. "b" = <0 1 0 0 0 ...>.

To make a word we join a bunch of those into a 2D matrix <line_length x 1 x n_letters>.

In [13]:
def letterToIndex(letter):
    return all_letters.find(letter)

In [14]:
import torch

In [15]:
# Just for demonstration, turn a letter into a <1 x n_letters> Tensor
def letterToTensor(letter):
    tensor = torch.zeros(1, n_letters)
    tensor[0][letterToIndex(letter)] = 1
    return tensor

In [16]:
def lineToTensor(line):
    tensor = torch.zeros(len(line), 1, n_letters)
    for li, letter in enumerate(line):
        tensor[li][0][letterToIndex(letter)] = 1
    return tensor

#### This network extends the last tutorial’s RNN with an extra argument for the category tensor, which is concatenated along with the others. The category tensor is a one-hot vector just like the letter input.

We will interpret the output as the probability of the next letter. When sampling, the most likely output letter is used as the next input letter.

I added a second linear layer o2o (after combining hidden and output) to give it more muscle to work with. There’s also a dropout layer, which randomly zeros parts of its input with a given probability (here 0.1) and is usually used to fuzz inputs to prevent overfitting. Here we’re using it towards the end of the network to purposely add some chaos and increase sampling variety.

![rnn](https://i.imgur.com/jzVrf7f.png)

In [17]:
import torch
import torch.nn as nn

In [18]:
class RNN(nn.Module):
    def __init__(self, in_dim, hidden_dim, out_dim):
        super(RNN, self).__init__()
        self.hidden_size =hidden_dim

        self.i2h = nn.Linear(n_categories + in_dim + hidden_dim, hidden_dim)
        self.i2o = nn.Linear(n_categories + in_dim + hidden_dim, out_dim)
        self.o2o = nn.Linear(hidden_dim + out_dim, out_dim)
        self.dropout = nn.Dropout(p=0.1)
        self.softmax = nn.LogSoftmax(dim=1)
 
    def forward(self, category,input, hidden):
        #print(category.size(), input.size(), hidden.size())
        input_combo = torch.cat((category, input, hidden), dim=1)
        hidden = self.i2h(input_combo)
        out = self.i2o(input_combo)
        out_combo = torch.cat((hidden, out), 1)
        output = self.o2o(out_combo)
        output = self.dropout(output)
        output = self.softmax(output)
        return output, hidden

    def init_Hidden(self):
        return torch.zeros(1, self.hidden_size)

In [19]:
import random

In [20]:
def randomChoice(l):
    return l[random.randint(0, len(l) - 1)]

In [21]:
def randomTrainingPair():
    category = randomChoice(all_categories)
    line = randomChoice(category_lines[category])
    return category, line

![](https://i.imgur.com/JH58tXY.png)

##### The category tensor is a one-hot tensor of size <1 x n_categories>. When training we feed it to the network at every timestep - this is a design choice, it could have been included as part of initial hidden state or some other strategy.

In [22]:
def category2Tensor(category):
    li = all_categories.index(category)
    tensor = torch.zeros(1, n_categories)
    tensor[0][li] = 1
    return tensor

In [23]:
def name2Tensor(name):
    tensor = torch.zeros(len(name), 1, n_letters)
    for li in range(len(name)):
        letter = name[li]
        tensor[li][0][all_letters.find(letter)] = 1
    return tensor

In [24]:
def target2Tensor(name):
    letter_indices = [all_letters.find(name[li]) for li in range(1, len(name))]
    letter_indices.append(n_letters-1)
    return torch.LongTensor(letter_indices)


##### In contrast to classification, where only the **last output** is used, we are making a prediction at every step, so we are calculating loss at every step. The magic of **autograd** allows you to simply sum these losses at each step and call backward at the end.

In [25]:
def randomTrainingExample():
    category, line = randomTrainingPair()
    category_tensor = category2Tensor(category)
    input_line_tensor = name2Tensor(line)
    target_line_tensor = target2Tensor(line)
    return category_tensor, input_line_tensor, target_line_tensor

In [26]:
criterion = nn.NLLLoss()
learning_rate = 0.0005

In [27]:
def train(model, ceten, nameten, tgten):
    tgten.unsqueeze_(-1)
    hidden = model.init_Hidden()

    model.zero_grad()

    loss = 0

    for i in range(nameten.size(0)):
        output, hidden = model(ceten, nameten[i], hidden)
        l = criterion(output, tgten[i])
        loss += l

    loss.backward()

    for p in model.parameters():
        p.data.add_(p.grad.data, alpha = -learning_rate)

    return output, loss.item() / nameten.size(0)

In [28]:
rnn = RNN(n_letters, 128, n_letters)

In [30]:
train(rnn, *randomTrainingExample())

(tensor([[-4.1182, -4.1079, -4.0349, -4.1554, -4.1419, -4.0835, -3.9953, -4.2191,
          -4.0584, -4.1958, -4.0835, -4.0741, -3.9834, -4.1584, -4.0237, -4.1273,
          -3.9729, -4.0884, -4.1220, -3.9988, -3.9970, -4.1905, -4.1123, -4.0876,
          -4.1013, -3.9579, -4.0189, -3.9687, -4.1140, -4.0835, -4.0804, -4.1529,
          -4.0666, -4.0935, -4.0835, -4.0835, -4.0207, -4.0438, -4.0048, -4.0884,
          -3.9251, -4.0835, -4.0835, -4.0125, -4.0586, -4.1548, -3.9968, -4.0859,
          -4.1921, -4.1203, -4.1326, -4.1341, -4.0835, -4.1693, -4.0835, -4.1220,
          -4.0772, -4.0132, -4.0739]], grad_fn=<LogSoftmaxBackward0>),
 4.110482788085937)

### [Training RNN](https://pytorch.org/tutorials/intermediate/char_rnn_generation_tutorial.html#training-the-network)