<a href="https://colab.research.google.com/github/Ankur-singh/personal_projects/blob/master/Notebooks/char_rnn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Char-level RNN

In this notebook, I'll construct a character-level RNN with PyTorch. The network will train character by character on some text, then generate new text character by character. As an example, I will train on Anna Karenina. This model will be able to generate new text based on the text from the book!

Below is the general architecture of the character-wise RNN.

![](http://karpathy.github.io/assets/rnn/charseq.jpeg)

In [0]:
import numpy as np
import torch
from torch import nn, optim
from torch.nn import functional as F

from tqdm import trange
from tqdm import tqdm

In [0]:
path = 'anna.txt' # this is expected to give you an Error. You are suppose to specify your own text file's path.

### Defining Dataset & DataLoader

In [0]:
def load_data(path):
  with open(path) as f:
    text = f.read()
  return ''.join([i if ord(i) < 128 else ' ' for i in text]) # the preprocessing will depend on the use case 


def get_vocab(text):
  chars = list(set(text)) 
  int2char = {i:c for i, c in enumerate(chars)}
  char2int = {c:i for i, c in enumerate(chars)}
  return chars, int2char, char2int


def get_batches(enc_text, batch_size, seq_len):
  n_batches = enc_text.shape[0] // (batch_size * seq_len)
  enc_text = enc_text[:n_batches * batch_size * seq_len]
  enc_text = enc_text.reshape(batch_size, -1)
  
  for i in range(0, enc_text.shape[1] - seq_len + 1, seq_len):
    x = enc_text[:, i : i + seq_len]
    y = np.zeros(x.shape)
    
    try:
      y[:, : -1], y[:, -1] = x[:, 1:], enc_text[:, i + seq_len]  
    except IndexError:
      y[:, : -1], y[:, -1] = x[:, 1:], enc_text[:, 0]
    
    yield x,y
    

def one_hot_encode(arr, n_labels):
    
    # Initialize the the encoded array
    one_hot = np.zeros((arr.size, n_labels), dtype=np.float32)
    
    # Fill the appropriate elements with ones
    one_hot[np.arange(one_hot.shape[0]), arr.flatten()] = 1.
    
    # Finally reshape it to get back to the original array
    one_hot = one_hot.reshape((*arr.shape, n_labels))
    
    return one_hot

In [0]:
text = load_data(path)
chars, int2char, char2int = get_vocab(text)

In [5]:
encoded_text = np.array([char2int[c] for c in text])
encoded_text.shape

(1985223,)

In [6]:
chars[:10], list(int2char.items())[:10], list(char2int.items())[:10]

(['H', 'o', '4', 'g', 'y', '_', 'B', '5', 'l', 'k'],
 [(0, 'H'),
  (1, 'o'),
  (2, '4'),
  (3, 'g'),
  (4, 'y'),
  (5, '_'),
  (6, 'B'),
  (7, '5'),
  (8, 'l'),
  (9, 'k')],
 [('H', 0),
  ('o', 1),
  ('4', 2),
  ('g', 3),
  ('y', 4),
  ('_', 5),
  ('B', 6),
  ('5', 7),
  ('l', 8),
  ('k', 9)])

In [7]:
print(text[:10])
[char2int[c] for c in text[:10]]

Chapter 1



[25, 59, 80, 70, 69, 31, 68, 71, 34, 54]

In [0]:
batches = get_batches(encoded_text, 10, 50)

In [27]:
x,y = next(batches)
x.shape, y.shape

((10, 50), (10, 50))

### Defining the Network Architecture

In [0]:
class Network(nn.Module):
  def __init__(self, n_input, n_hidden, n_layers, drop_prob):
    super(Network, self).__init__()
    self.dropout = drop_prob
    self.n_hidden = n_hidden
    self.n_layers = n_layers
    self.n_input = n_input
    
    ### Layers ###
    self.rnn1 = nn.LSTM(self.n_input, self.n_hidden, num_layers =self.n_layers, batch_first=True, dropout=self.dropout)
    self.dropout = nn.Dropout(self.dropout)
    self.fc = nn.Linear(self.n_hidden, self.n_input)
    
  def forward(self, x, hidden):
    x, h = self.rnn1(x, hidden)
    x = self.dropout(x)
    x = x.contiguous().view(-1, self.n_hidden)
    x = self.fc(x)
    x = F.log_softmax(x, dim=1)
    
    return x, h
    

### Training the Model

In [0]:
# taken from -> https://github.com/udacity/deep-learning-v2-pytorch/blob/master/recurrent-neural-networks/char-rnn/Character_Level_RNN_Solution.ipynb

def train(net, data, epochs=10, batch_size=10, seq_length=50, lr=0.001, clip=5, val_frac=0.1, print_every=10):
    ''' Training a network 
    
        Arguments
        ---------
        
        net: CharRNN network
        data: text data to train the network
        epochs: Number of epochs to train
        batch_size: Number of mini-sequences per mini-batch, aka batch size
        seq_length: Number of character steps per mini-batch
        lr: learning rate
        clip: gradient clipping
        val_frac: Fraction of data to hold out for validation
        print_every: Number of steps for printing training and validation loss
    
    '''
    net.train()
    
    opt = torch.optim.Adam(net.parameters(), lr=lr)
    criterion = nn.CrossEntropyLoss()
    
    # create training and validation data
    val_idx = int(len(data)*(1-val_frac))
    data, val_data = data[:val_idx], data[val_idx:]
    
    net.cuda()
    
    counter = 0
    n_chars = len(chars)
    for e in range(epochs):
        # initialize hidden state
        h = None #net.init_hidden(batch_size)
        
        for x, y in get_batches(data, batch_size, seq_length):
            counter += 1
            
            # One-hot encode our data and make them Torch tensors
            x = one_hot_encode(x, n_chars)
            inputs, targets = torch.from_numpy(x), torch.from_numpy(y)
            
            inputs, targets = inputs.cuda(), targets.cuda()

            # Creating new variables for the hidden state, otherwise
            # we'd backprop through the entire training history
            if h is not None:
              h = tuple([each.data for each in h])

            # zero accumulated gradients
            net.zero_grad()
            
            # get the output from the model
            output, h = net(inputs, h)
            
            # calculate the loss and perform backprop
            loss = criterion(output, targets.view(batch_size*seq_length).long())
            loss.backward()
            # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
            nn.utils.clip_grad_norm_(net.parameters(), clip)
            opt.step()
            
            # loss stats
            if counter % print_every == 0:
                # Get validation loss
                val_h = None #net.init_hidden(batch_size)
                val_losses = []
                net.eval()
                for x, y in get_batches(val_data, batch_size, seq_length):
                    # One-hot encode our data and make them Torch tensors
                    x = one_hot_encode(x, n_chars)
                    x, y = torch.from_numpy(x), torch.from_numpy(y)
                    
                    # Creating new variables for the hidden state, otherwise
                    # we'd backprop through the entire training history
                    if val_h is not None:
                      val_h = tuple([each.data for each in val_h])
                    
                    inputs, targets = x, y
                    inputs, targets = inputs.cuda(), targets.cuda()

                    output, val_h = net(inputs, val_h)
                    val_loss = criterion(output, targets.view(batch_size*seq_length).long())
                
                    val_losses.append(val_loss.item())
                
                net.train() # reset to train mode after iterationg through validation data
                
                print("Epoch: {}/{}...".format(e+1, epochs),
                      "Step: {}...".format(counter),
                      "Loss: {:.4f}...".format(loss.item()),
                      "Val Loss: {:.4f}".format(np.mean(val_losses)))

In [13]:
# define and print the net
n_hidden=512
n_layers=2
n_chars = len(chars)

net = Network(n_chars, n_hidden, n_layers, 0.5)
print(net)

Network(
  (rnn1): LSTM(83, 512, num_layers=2, batch_first=True, dropout=0.5)
  (dropout): Dropout(p=0.5)
  (fc): Linear(in_features=512, out_features=83, bias=True)
)


In [18]:
batch_size = 128
seq_length = 100
n_epochs = 20 # start smaller if you are just testing initial behavior

# train the model
train(net, encoded_text, epochs=n_epochs, batch_size=batch_size, seq_length=seq_length, lr=0.001, print_every=50)

Epoch: 1/20... Step: 10... Loss: 2.2381... Val Loss: 2.2050
Epoch: 1/20... Step: 20... Loss: 2.1659... Val Loss: 2.1476
Epoch: 1/20... Step: 30... Loss: 2.1566... Val Loss: 2.1198
Epoch: 1/20... Step: 40... Loss: 2.1065... Val Loss: 2.0998
Epoch: 1/20... Step: 50... Loss: 2.1219... Val Loss: 2.0820
Epoch: 1/20... Step: 60... Loss: 2.0586... Val Loss: 2.0606
Epoch: 1/20... Step: 70... Loss: 2.0660... Val Loss: 2.0467
Epoch: 1/20... Step: 80... Loss: 2.0324... Val Loss: 2.0273
Epoch: 1/20... Step: 90... Loss: 2.0397... Val Loss: 2.0099
Epoch: 1/20... Step: 100... Loss: 2.0015... Val Loss: 1.9964
Epoch: 1/20... Step: 110... Loss: 1.9789... Val Loss: 1.9809
Epoch: 1/20... Step: 120... Loss: 1.9359... Val Loss: 1.9612
Epoch: 1/20... Step: 130... Loss: 1.9780... Val Loss: 1.9422
Epoch: 2/20... Step: 140... Loss: 1.9688... Val Loss: 1.9290
Epoch: 2/20... Step: 150... Loss: 1.9449... Val Loss: 1.9092
Epoch: 2/20... Step: 160... Loss: 1.9438... Val Loss: 1.8944
Epoch: 2/20... Step: 170... Loss:

### Inference

In [0]:
def predict(net, char, h=None, top_k=None):
        ''' Given a character, predict the next character.
            Returns the predicted character and the hidden state.
        '''
        
        # tensor inputs
        x = np.array([[char2int[char]]])
        x = one_hot_encode(x, n_chars)
        inputs = torch.from_numpy(x)
        
        inputs = inputs.cuda()
        
        # detach hidden state from history
        if h != None:
          h = tuple([each.data for each in h])
        # get the output of the model
        out, h = net(inputs, h)

        # get the character probabilities
        p = F.softmax(out, dim=1).data
        p = p.cpu() # move to cpu
        
        # get top characters
        if top_k is None:
            top_ch = np.arange(n_chars)
        else:
            p, top_ch = p.topk(top_k)
            top_ch = top_ch.numpy().squeeze()
        
        # select the likely next character with some element of randomness
        p = p.numpy().squeeze()
        char = np.random.choice(top_ch, p=p/p.sum())
        
        # return the encoded value of the predicted char and the hidden state
        return int2char[char], h
      
      
def sample(net, size, prime='The', top_k=None):
    net.cuda()
    net.eval() # eval mode
    
    # First off, run through the prime characters
    chars = [ch for ch in prime]
    h = None
    for ch in prime:
        char, h = predict(net, ch, h, top_k=top_k)

    chars.append(char)
    
    # Now pass in the previous character and get a new one
    for ii in range(size):
        char, h = predict(net, chars[-1], h, top_k=top_k)
        chars.append(char)

    return ''.join(chars)


In [24]:
print(sample(net, 1000, prime='There', top_k=None))

Theres to
be exceptionally an anbormor red eyes, capticild with nose, cannot be still not
trouble--it has to stop, but it was not to ask Kitty. She added their
times, and all the kissed his scatelice, and transling lovem at him.

"But I am goaring about, or live you hos cloagh. _Come douese sI dut's
a few still so," said Metrov. "I harder thas eyer fised the wedd--not
fifting and instants Constious of the shights, like some use," she said, with
the excate offoring bight's thought.

"She seess for the money at home, and Vronsky, or the ordirate walk. When I beg you
to like his wants, and my son after oging you to be-reached."

"Buc shouse dos. I feel of my taken, at the complaints! That's even
supposent a girl and all in a fool?" the feeling of tee-inciquance had
said to his brother.

As he saw all that he was in childres or continually interesting it with her
presence."

"No, how not another could not get speak of one in the last tenge in
herself to give you the doctor, then, to tell m

In [34]:
print(sample(net, 1000, prime='Life is beautiful', top_k=None))

Life is beautiful passion, done
simply impossible, ease, live on this evening. Before that however view and
dinner peace, of straight reloving. It was utterly kissed him; so she was
not doing. The family gentleman and Dolly was so that the day beung back.

"Here us?" he understood the brother.

"Why, well, I'd realute some."

The strengthe oll broken extraced his new hired-foredread--he stopped him.

"I imagine by the clerl, and _house they signs," Konstantin Levin bowed to her.

"Where are you saying?" said Alexey Alexandrovitch "tried before Anna's
wife's letter; of that was no idea to a humal sund not cleared by those
district. He said to equret, humiliated, sympathy with the shage of whether Vronsky had back met the
tiny."

"I may be terry part in...
I can't go
on," he said, wto merry and rans.

"You paviling you distressed to God, she was!" said feet said from everything,
gosting out, tears. He felt that the year the vesitable new light talked of
its armss.

On rupking effort; and

In [36]:
print(sample(net, 1000, prime='Friendship is like', top_k=None))

Friendship is like
at answer.

"The thing charm, is here to Certainly and I see you in the way all the
one," he said people, and went to the prinw, and understanding her
hand, and the daughter's fore addired his same, and her head now silent from
when he reached the red conscientions with all his brother natured, and talking at hom,
to tell him her in his mind was to sway. He rose of might
right in speak in the flode thrieg and deal of more. The sprent
that held without meeting, on which, as soon in an ablo of it, but
he heared his bent over his brother, with a trouble on her face.

"What! Measure, but I can disnote to detice that?" he said to himself.

"Let's go."

"I have so much in evening. Alexey Alexandrovitch's flipsors on the
bidler?"

He said to Alexey Alexandrovitch and his brothers was evident his gave
to her turning round, and selical usseas enproaded now on the lome to ckean
on the tolr into those years, the interest of the mersiffere
to her as if he should nice. And he wou

### Saving the model

In [37]:
torch.save({'epoch': n_epochs,
            'model': Network(n_chars, n_hidden, n_layers, 0.5),
            'model_state_dict': net.state_dict()
           }, 'char_rnn_2mo.pth.tar')

  "type " + obj.__name__ + ". It won't be checked "


In [38]:
checkpt = torch.load('char_rnn_2mo.pth.tar')
checkpt

{'epoch': 20, 'model': Network(
   (rnn1): LSTM(83, 512, num_layers=2, batch_first=True, dropout=0.5)
   (dropout): Dropout(p=0.5)
   (fc): Linear(in_features=512, out_features=83, bias=True)
 ), 'model_state_dict': OrderedDict([('rnn1.weight_ih_l0',
               tensor([[-0.0127,  0.0293,  0.1126,  ...,  0.1241, -0.4026, -0.0992],
                       [-0.0396, -0.1046,  0.0485,  ...,  0.3855,  0.2706, -0.1565],
                       [-0.2313,  0.4806, -0.0224,  ..., -0.0028,  0.2673, -0.1755],
                       ...,
                       [-0.0806, -0.4378, -0.1189,  ...,  0.2344, -0.4833,  0.2831],
                       [-0.1604,  0.1895, -0.0270,  ...,  0.0176,  0.0566,  0.7817],
                       [ 0.1647,  0.0309, -0.0200,  ...,  0.0901,  0.4195, -0.3058]],
                      device='cuda:0')),
              ('rnn1.weight_hh_l0',
               tensor([[ 0.0855, -0.2022,  0.1557,  ..., -0.2245, -0.0040,  0.2665],
                       [ 0.0800,  0.0292,  0.141

In [0]:
n_epochs = checkpt['epoch']
model = checkpt['model']

In [0]:
model.load_state_dict(checkpt['model_state_dict'])

optimizer = optim.Adam(model.parameters(), lr = 0.01)

In [41]:
print(sample(net, 1000, prime='Friendship is like', top_k=None))

Friendship is like stuly as realize. He
began to gre with them from her love with Petersburg, and they was loading
has the compressing bold and rather unurt of her tone.

The lessons let the yead away round her in the dost
conversation with it, did not suppose when Dolly wanted to love him,
and the strimms of another first, strung Them had had been sportally.n

"Well, then it may be barred all to be wearyed?" said Sviazhsky's woman; "that you're
at the same gied in the table capital things, and as, Vronsky had, uncapbered his
resurve more of public quist feeling. I will srop sweet sungen
the one another little--everything it told. And Afferond like
patient, for God. Who said: I have disapprienced I am to turn to my plans. For the
princess.

"Well,. Can man you comaring it. I can't being it!"

Stepan Arkadyevitch was said to her.

And gown up he was talking to the samication, was shally and rojeeved, in
his children) and shappaned to see the paces too the way that
he went on, and steppi

### Summary

- Correct Spellings: Most of the words in the context won't make much sense but they have correct spellings. The model learnt all of it in just 5-7 mins of training. This is amazing.
- Opening and closing qoutes (" "): The model has also learnt to that "quotes" comes in pair. 
- Using punctuations: the model is also using a lot of punctuations and most of them looks appropriate to me.

### Further Reading

- [Karpathy's blog on RNN](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)