Orginal articles

[Generating Names with a Character-Level RNN](https://pytorch.org/tutorials/intermediate/char_rnn_generation_tutorial.html)

[Classifying Names with a Character-Level RNN](https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html)


RNN related articles :     
[The Unreasonable Effectiveness of Recurrent Neural Networks](https://karpathy.github.io/2015/05/21/rnn-effectiveness/) shows a bunch of real life examples

[Understanding LSTM Networks](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)


``` {.sourceCode .sh}
>  Russian RUS
Rovakov
Uantov
Shavakov

>  German GER
Gerren
Ereng
Rosher

> Spanish SPA
Salla
Parer
Allan

> Chinese CHI
Chan
Hang
Iun
```

file structure --> inside (Drive : /NLP/name_data.zip)

    ./names/English.txt

      Abbas
      Abbey
      Abbott
      Abdi
      Abel
      Abraham

In [80]:
import unicodedata
import string

all_letters = string.ascii_letters + " .,;'-"
n_letters = len(all_letters) + 1 # Plus EOS marker
print(n_letters,all_letters)

# Turn a Unicode string to plain ASCII, thanks to https://stackoverflow.com/a/518232/2809427

def unicodeToAscii(line):
    return ''.join(
        c for c in unicodedata.normalize('NFD', line)
        if unicodedata.category(c) != 'Mn'
        and c in all_letters
    )

# represent the data in - category_lines = {} & all_categories = []
# category_lines['English'][:6] = ['Abbas', 'Abbey', 'Abbott', 'Abdi', 'Abel', 'Abraham']
# all_categories = ['Dutch', 'Japanese', 'Irish', ...]  # total 18 category i.e n_categories


59 abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ .,;'-


Creating the Network
====================

![By Using Linear layers](https://i.imgur.com/jzVrf7f.png)

For each timestep (that is, for each letter in a training word) the
inputs of the network will be `(category, current letter, hidden state)`
and the outputs will be `(next letter, next hidden state)`. So for each
training set, we\'ll need the category, a set of input letters, and a
set of output/target letters.

Since we are predicting the next letter from the current letter for each
timestep, the letter pairs are groups of consecutive letters from the
line - e.g. for `"ABCD<EOS>"` we would create (\"A\", \"B\"), (\"B\",
\"C\"), (\"C\", \"D\"), (\"D\", \"EOS\").

![](https://i.imgur.com/JH58tXY.png)

The category tensor is a [one-hot
tensor](https://en.wikipedia.org/wiki/One-hot) of size
`<1 x n_categories>`. When training we feed it to the network at every
timestep - this is a design choice, it could have been included as part
of initial hidden state or some other strategy.



# Creating Embedding using One-Hot

categoryTensor(category): **(1, n_categories)**

    tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.]])
    torch.Size([1, 18])

inputTensor(line): **(len(line), 1, n_letters)** i.e 1st to last letter without <EOS> , line = single name/1 line in a category

    Abdi --> torch.Size([4, 1, 59])

targetTensor(line) : **(len(line))** i.e 2nd letter to <EOS>

    Abdi ---> torch.Size([4])


**Creating Embedding / Vector representation**

    1. take a tensor of 0's of required size
    2. categoryTensor : go to that index(li) matching the category i.e tensor[0][li]= 1
    3. inputTensor : make tensor[li][0][all_letters.find(letter)] = 1, here letter = line[li]
    4. targetTensor : letter_indexes = [all_letters.find(letter)] --> append(n_letters - 1)

    RNN(
      (i2h): Linear(in_features=205, out_features=128, bias=True)
      (i2o): Linear(in_features=205, out_features=59, bias=True)
      (o2o): Linear(in_features=187, out_features=59, bias=True)
      (dropout): Dropout(p=0.1, inplace=False)
      (softmax): LogSoftmax(dim=1)
    )

hidden = rnn.initHidden()

    torch.Size([1, hidden_size])

hidden_size = 128

n_letters/input_size = 59

n_categories = 18


n_categories + input_size + hidden_size = 205

input_size + hidden_size = 187

### Forward pass + Backward Pass + Optimizer with different epochs --->

**Initialize**: rnn = RNN(n_letters, 128, n_letters),hidden = rnn.initHidden()

**Forward** : In each pass a single letter of the word is passed ,at the end of complete word "Abdi" total loss is calculated

**Backword & Optimizer** + : The backward pass calculates gradients (**loss.backward()**) and updates the model parameters (**p.data.add_(p.grad.data, alpha=-learning_rate)**).



Once the loss calculation is done = 2*len(word) for both Forward & backword

Based on the requirement convert target_line_tensor to --->  torch.Size([4, 1]) to match output size output size (torch.Size([1, n_letters])

Below is the Forward pass after processing each letter,
here it's "Abdi"

**Epoch**: An epoch is a complete pass through the entire dataset(here the word "Abdi").Completing one epoch would mean processing each character in the word once.

**Iteration**: each iteration corresponds to processing a single character of the word. So, to process the entire word "Abdi", you would need 4 iterations (one for each character).

Example : iter = 2,i =4 , so total = 2*4 = 8 (end of all epochs)

    input_combined(category, input, hidden) :  torch.Size([1, n_categories + input_size + hidden_size])
    hidden(i2h): torch.Size([1, hidden_size])
    output(i2o) : torch.Size([1, n_letters])
    output_combined(hidden, output) : torch.Size([1, n_letters+hidden_size])
    output(o2o) : torch.Size([1, n_letters])
    dropout + softmax
    each category belonging to one name i=[4] 4-th letter of that name - output, hidden torch.Size([1, 59]) torch.Size([1, 128])
    output & loss at each iteration  torch.Size([1, 59]) 4.06482778276716

If you set the number of epochs to 1, the training loop would process the word "Abdi" once, which means it would iterate through the word 4 times (4 iterations, 1 epoch).

If you set the number of epochs to 2, the training loop would process the word "Abdi" twice, which means it would iterate through the word 8 times (4 iterations x 2 epochs).

for each iteration when we run training for a single word "Adbi"


    for i in range(input_line_tensor.size(0)):
            output, hidden = rnn(category_tensor, input_line_tensor[i], hidden)
            l = criterion(output, target_line_tensor[i])
            loss += l

        loss.backward()

        for p in rnn.parameters():
            p.data.add_(p.grad.data, alpha=-learning_rate)

        avg_loss = loss.item() / input_line_tensor.size(0)


In [None]:
#