## Generative networks

Recurrent Neural Networks (RNNs) and their gated cell variants such as Long Short Term Memory Cells (LSTMs) and Gated Recurrent Units (GRUs) provided a mechanism for language modeling, i.e. they can leanr word ordering and provide predictions for next word in a sequence. This allows us to use RNNs for **generative tasks**, such as ordinary text generation, machine translation, and even image captionaing.

In RNN architecture, each RNN unit produced next next hidden state as an output. However, we can also add another output to each recurrent unit, which would allow us to output s **sequence** (which is equal in length to the original sequence). Moreover, we can use RNN units that do not accept an input at each step, and just take some initial state vector, and then produce a sequence of outputs.

This allows for different neural architectures that are shown in the picture below:

<figure><img src="https://hostux.social/system/media_attachments/files/110/768/729/925/794/877/original/fa0c21dbd618dfde.jpg" alt="" width="1000"><figcaption><p>Source from Unreasonable Effectiveness of Recurrent Neural Networksn by Andrej Karpaty </a> </p></figcaption></figure>

* **One-to-one** is a traditional neural network with one input and one output.
* **One-to-many** is a generative a architecure that accepts on einput value, and generates a sequence of output values. For example, if we want to train `image caotioning` network that would produce a textual description of a picture. we can have a picture as input, pass it through CNN to obtain hidden state, and then have recurrent chain generate caption word-by-word
* **Many-to-one** corrsponds to RNN architectures we described in the `Capture patterns with recurrent neural networks`, such as text classification
* **Many-to-many** or **sequence-to-sequence** corresponds to tasks such as `machine translation`, where we have first RNN collect all information form the input sequence into the hidden state, and another RNN chain unrolls this state into the output sequence.

Here we will focus on simple generative models that will help us generate text. For simplicity, let's build **character-level network**, which generates text letter by letter. During training, we need to take some text corpus, and split it into letter sequences.

In [None]:
# https://github.com/pytorch/data/issues/1093
pip install portalocker

In [None]:
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118

## Building character vocabulary

To build character-level generative network, we need to split text into individual characters instead of words. This can be done by defining a different tokenizer:

In [None]:
import torch
import torchtext
import numpy as np
import collections

In [None]:
# Loading dataset
train_dataset, test_dataset = torchtext.datasets.AG_NEWS(root='./data')
train_dataset, test_dataset = list(train_dataset), list(test_dataset)
classes = ['World','Sports','Business','Sci/Tech']

tokenizer = torchtext.data.utils.get_tokenizer('basic_english')


In [None]:
def char_tokenizer(words):
    return list(words)

counter = collections.Counter()
for (label, line) in train_dataset:
    counter.update(char_tokenizer(line))
vocab = torchtext.vocab.vocab(counter)

vocab_size = len(vocab)
print(vocab_size)
print(vocab.get_stoi()['a'])
print(vocab.get_itos()[13])

Let's see the example of how we can encode the text from out dataset:

In [None]:
def encode(x, voc=None,tokenizer=tokenizer):
    v =vocab if not voc else voc
    return [v.get_stoi()[s] for s in tokenizer(x)]


def enc(x):
    return torch.LongTensor(encode(x, voc=vocab, tokenizer=char_tokenizer))

print(train_dataset[0][1])
print(enc(train_dataset[0][1]))

## Training a generative RNN

The way we will trian RNN to generate text is the following. On each step, we will take a sequence of characters of length `nchars`, and ask the networks to generate the next output character for each input character:

<figure><img src="https://hostux.social/system/media_attachments/files/110/768/908/937/200/134/original/d8b268cc82ca6080.png" alt="" width="1000"><figcaption><p>Source from MicrosoftLearning </a> </p></figcaption></figure>

Depending on the actual scenario, we may also want to inlcude some special characters, such as `end-of-sequence` `<eos>`. In our case, we just want to train the network for endless text generation, thus we will fix the size of each sequence to be equal to `nchars` tokens. Consequently, each training example will consist of `nchars` inputs and `nchars` outputs(which are input sequence shifted one symbol to the left). Minibatch will consist of several such sequences.

The way we will generate minibatches is to take each news text of length `l`, and generate all possible input-output combinations from it (there will be `l-nchars` such combinations). They will from one minibatch, and size of minibatches would be different at each training step.

In [None]:
# check the paltform, Apple Silicon or Linux
import os, platform

torch_device="cpu"

if 'kaggle' in os.environ.get('KAGGLE_URL_BASE','localhost'):
    torch_device = 'cuda'
else:
    torch_device = 'mps' if platform.system() == 'Darwin' else 'cpu'

torch_device

In [None]:
nchars= 100

def get_batch(s, nchars=nchars):
    ins = torch.zeros(len(s)-nchars, nchars, dtype=torch.long, device=torch_device)
    outs = torch.zeros(len(s)-nchars, dtype=torch.long, device=torch_device)

    for i in range(len(s)-nchars):
        ins[i]=enc(s[i:i+nchars])
        outs[i]=enc(s[i+1:i+nchars+1])
    return ins, outs

get_batch(train_dataset[0][1])

Now, let's define generator network. It can be based on any recurrent cell which we discussed in the previous notebooks(simple, LSTM ot GRU). In our example we will use LSTM.

Because the network takes characters as inputs, and vocabulary size is pretty small, we do not need embedding layer, one-hot-encoded input can directly go to LSTM cell. However, because we pass character numbers as input, we need to one-hot-encode them before passing to LSTM. This is done by calling `one_hot` function during `forward` pass. Output encoder would be a linear layer that will conver hiddent state into one-hot-encoded output.

>Note: One-hot-encoding involves representing each character as a binary vector, where only the index corrsponding to the character's value is set to 1, and all other indices are set to 0. This encoding allows the LSTM to process the characters as input and learn patterns from them.

In [None]:
class LSTMGenerator(torch.nn.Module):
    def __init__(self, vocab_size, hidden_dim):
        super().__init__()
        self.nn = torch.nn.LSTM(vocab_size, hidden_dim, batch_first=True)
        self.fc = torch.nn.Linear(hidden_dim, vocab_size)
    
    def forward(self, x, s=None):
        x = torch.nn.functional.one_hot(x, num_classes=vocab_size).to(torch.float32)
        x,s = self.rnn(x,s)
        return self.fc(x),s