# Generative Networks

Recurrent Neural Networks (RNNs) and their gated cell variants such as Long Short Term Memory Cells (LSTMs) and Gated Recurrent Units (GRUs) provided a mechanism for language modeling, i.e. they can learn word ordering and provide predictions for the next word in a sequence. This allows us to use RNNs for **generative tasks**, such as ordinary text generation, machine translation, and even image captioning.

In the RNN architecture we discussed in the previous unit, each RNN unit produced next next hidden state as an output. However, we can also add another output to each recurrent unit, which would allow us to output a **sequence** (which is equal in length to the original sequence). Moreover, we can use RNN units that do not accept an input at each step, and just take some initial state vector, and then produce a sequence of outputs.

This allows for different neural architectures that are shown in the picture below:

![RNN paterns](../images/unreasonable-effectiveness-of-rnn.jpg)
*Image from blog post [Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) by [Andrej Karpaty](http://karpathy.github.io/)*

* **One-to-one** is a traditional neural network with one input and one output
* **One-to-many** is a generative architecture that accepts one input value, and generates a sequence of output values. For example, if we want to train **image captioning** network that would produce a textual description of a picture, we can a picture as input, pass it through CNN to obtain hidden state, and then have recurrent chain generate caption word-by-word
* **Many-to-one** corresponds to RNN architectures we described in the previous unit, such as text classification
* **Many-to-many**, or **sequence-to-sequence** corresponds to tasks such as **machine translation**, where we have first RNN collect all information from the input sequence into the hidden state, and another RNN chain unrolls this state into the output sequence.

In this unit, we will focus on simple generative models that help us generate text. For simplicity, let's build **character-level network**, in which we will take individual characters as an input. 

In [96]:
import torch
import torchtext
import numpy as np
from torchnlp import *
load_dataset() # we need this to make sure data is fetched

120000lines [00:04, 27589.41lines/s]
120000lines [00:08, 14627.95lines/s]
7600lines [00:00, 14564.78lines/s]


(<torchtext.datasets.text_classification.TextClassificationDataset at 0x7f21e8ef7990>,
 <torchtext.datasets.text_classification.TextClassificationDataset at 0x7f21df1fd190>,
 ['World', 'Sports', 'Business', 'Sci/Tech'],
 95812)

In [133]:
def char_tokenizer(words):
    return list(words) #[word for word in words]

TEXT = torchtext.data.Field(sequential=True, tokenize=char_tokenizer) #, lower=True)
LABEL = torchtext.data.Field(sequential=False, use_vocab=False)

train_dataset = torchtext.data.TabularDataset('./data/ag_news_csv/train.csv',
        format='csv',
        fields=[('Label', LABEL), ('Head', TEXT), ('Text', TEXT) ])
TEXT.build_vocab(train_dataset)

In [154]:
vocab_size = len(TEXT.vocab)
print(f"Vocabulary size = {vocab_size}")

nchars = 100

def encode_text(s):
    return torch.LongTensor([TEXT.vocab.stoi[t] for t in s])

def get_batches(s,batch_size=16,nchars=nchars):
    ins = torch.zeros(len(s)-nchars,nchars,dtype=torch.long)
    outs = torch.zeros(len(s)-nchars,nchars,dtype=torch.long)
    for i in range(len(s)-nchars):
        ins[i] = encode_text(s[i:i+nchars])
        outs[i] = encode_text(s[i+1:i+nchars+1])
    return ins,outs

Vocabulary size = 84


In [155]:
for i,x in zip(range(1),train_dataset.examples):
    print(x.Text)

['R', 'e', 'u', 't', 'e', 'r', 's', ' ', '-', ' ', 'S', 'h', 'o', 'r', 't', '-', 's', 'e', 'l', 'l', 'e', 'r', 's', ',', ' ', 'W', 'a', 'l', 'l', ' ', 'S', 't', 'r', 'e', 'e', 't', "'", 's', ' ', 'd', 'w', 'i', 'n', 'd', 'l', 'i', 'n', 'g', '\\', 'b', 'a', 'n', 'd', ' ', 'o', 'f', ' ', 'u', 'l', 't', 'r', 'a', '-', 'c', 'y', 'n', 'i', 'c', 's', ',', ' ', 'a', 'r', 'e', ' ', 's', 'e', 'e', 'i', 'n', 'g', ' ', 'g', 'r', 'e', 'e', 'n', ' ', 'a', 'g', 'a', 'i', 'n', '.']


In [178]:
class LSTMGenerator(torch.nn.Module):
    def __init__(self, vocab_size, hidden_dim):
        super().__init__()
        self.hidden_dim = hidden_dim
        self.rnn = torch.nn.LSTM(vocab_size,hidden_dim,batch_first=True)
        self.fc = torch.nn.Linear(hidden_dim, vocab_size)

    def forward(self, x, s=None):
        batch_size = x.size(0)
        x = torch.nn.functional.one_hot(x,vocab_size).to(torch.float32)
        x,s = self.rnn(x,s)
        return self.fc(x),s

In [200]:
def generate(net,size=100,start='today '):
        chars = list(start)
        out, s = net(encode_text(chars).view(1,-1).to(device))
        for i in range(size):
            nc = torch.argmax(out[0][-1])
            chars.append(TEXT.vocab.itos[nc])
            out, s = net(nc.view(1,-1),s)
        return ''.join(chars)

In [202]:
net = LSTMGenerator(vocab_size,64).to(device)
    
optimizer = torch.optim.Adam(net.parameters(),0.01)
loss_fn = torch.nn.CrossEntropyLoss()
net.train()
for i,x in enumerate(train_dataset.examples):
    if len(x.Text)-nchars<10:
        continue
    text_in, text_out = get_batches(x.Text)
    optimizer.zero_grad()
    text_in, text_out = text_in.to(device), text_out.to(device)
    out,s = net(text_in)
    loss = torch.nn.functional.cross_entropy(out.view(-1,vocab_size),text_out.flatten()) #cross_entropy(out,labels)
    loss.backward()
    optimizer.step()
    if i%1000==0:
        print(f"Current loss = {loss.item()}")
        print(generate(net))

Current loss = 1.9617218971252441
today a the the the the the the the the the the the the the the the the the the the the the the the the th
Current loss = 1.5846630334854126
today and the company and the company and the company and the company and the company and the company and 
Current loss = 2.368056058883667
today and the proporting the proporting the proporting the proporting the proporting the proporting the pr
Current loss = 1.7323575019836426
today and the contruduction to the to stock to the to stock to the to stock to the to stock to the to stoc
Current loss = 1.6278003454208374
today and the U.S. compaling the compaling the compaling the compaling the compaling the compaling the com
Current loss = 1.9082142114639282
today and the to the to the to the to the to the to the to the to the to the to the to the to the to the t
Current loss = 1.6412254571914673
today and the company and the company and the company and the company and the company and the company and 
Current loss =

Current loss = 1.8670830726623535
today to the second to the second to the second to the second to the second to the second to the second to
Current loss = 2.014064073562622
today and the state to the state to the state to the state to the state to the state to the state to the s
Current loss = 1.6249018907546997
today and the services and the services and the services and the services and the services and the service
Current loss = 1.8014146089553833
today and the security of the security of the security of the security of the security of the security of 
Current loss = 1.6114312410354614
today and the second to the second to the second to the second to the second to the second to the second t
Current loss = 1.5414443016052246
today and the state to the state to the state to the state to the state to the state to the state to the s
Current loss = 1.742472767829895
today and the second the second the second the second the second the second the second the second the seco
Current loss = 