# Character-level LSTM in PyTorch

**In this notebook, a LSTM (Long Short Term Memory) model has been trained on text 'Anna'. The trained model is capable of predicting the next character and hence, can form words, sentences, and even paragraphs on its own.**

**Notebook created by: [Aditya Manchanda](https://github.com/Aditya-1500)

In [1]:
#Importing necessary libraries

import numpy as np
import torch
from torch import nn
import torch.nn.functional as F

In [2]:
#Loading the data

with open('data/anna.txt') as f:
    text = f.read()
print(text[:100])

Chapter 1


Happy families are all alike; every unhappy family is unhappy in its own
way.

Everythin


In [3]:
text[:100]

'Chapter 1\n\n\nHappy families are all alike; every unhappy family is unhappy in its own\nway.\n\nEverythin'

### Tokenization

In [4]:
#Encoding the text - map each character to an integer and vice versa

chars = tuple(set(text))
int2char = dict(enumerate(chars))
char2int = {ch:ii for ii,ch in int2char.items()}

encoded = np.array([char2int[ch] for ch in text])

In [5]:
encoded[:100]

array([ 8, 39, 20, 48, 71, 80, 70, 68, 45, 60, 60, 60, 28, 20, 48, 48, 52,
       68, 44, 20, 59, 76, 72, 76, 80, 69, 68, 20, 70, 80, 68, 20, 72, 72,
       68, 20, 72, 76, 27, 80, 15, 68, 80, 55, 80, 70, 52, 68, 30, 18, 39,
       20, 48, 48, 52, 68, 44, 20, 59, 76, 72, 52, 68, 76, 69, 68, 30, 18,
       39, 20, 48, 48, 52, 68, 76, 18, 68, 76, 71, 69, 68, 73,  6, 18, 60,
        6, 20, 52, 65, 60, 60, 61, 55, 80, 70, 52, 71, 39, 76, 18])

### Pre-processing the data

In [6]:
#One-hot encoding
def one_hot_encode(arr,n_labels):
    
    #Initialize the encoded array
    one_hot = np.zeros((arr.size,n_labels),dtype=np.float32)
    
    #Fill the appropriate elements with ones
    one_hot[np.arange(one_hot.shape[0]),arr.flatten()] = 1
    
    #Reshape it to get to final one-hot encoded array
    one_hot = one_hot.reshape((*arr.shape,n_labels))
    
    return one_hot

In [7]:
#Testing the one-hot encoding function
test_seq = np.array([1,3,5,7])
test_one_hot = one_hot_encode(test_seq,8)
print(test_one_hot)

[[0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1.]]


### Making training mini-batches

In [8]:
def get_batches(arr, batch_size, seq_length):
    '''Create a generator that returns batches of size
       batch_size x seq_length from arr.
       
       Arguments
       ---------
       arr: Array you want to make batches from
       batch_size: Batch size, the number of sequences per batch
       seq_length: Number of encoded chars in a sequence
    '''
    
    #Get the number of batches we can make
    n_batches = arr.size // (batch_size*seq_length)
    
    #Total number of characters to keep from the array
    arr = arr[:batch_size*seq_length*n_batches]
    
    #Reshape into batch_size rows
    arr = arr.reshape(batch_size,-1)
    
    #Iterate over the batches using a window of size seq_length
    for n in range(0,arr.shape[1],seq_length):
        #The features
        x = arr[:,n:n+seq_length]
        
        #The targets
        y = np.zeros_like(x)
        try:
            y[:,:-1], y[:,-1] = x[:,1:], arr[:,n+seq_length]
        except IndexError:
            y[:,:-1], y[:,-1] = x[:,1:], arr[:,0]
        
        yield x,y

**Testing the implementation of above function**

In [9]:
batches = get_batches(encoded,8,50)
x, y = next(batches)

In [10]:
#Printing the first 10 items in a sequence
print('x\n',x[:10,:10])
print('y\n',y[:10,:10])

x
 [[ 8 39 20 48 71 80 70 68 45 60]
 [69 73 18 68 71 39 20 71 68 20]
 [80 18 51 68 73 70 68 20 68 44]
 [69 68 71 39 80 68 14 39 76 80]
 [68 69 20  6 68 39 80 70 68 71]
 [14 30 69 69 76 73 18 68 20 18]
 [68 58 18 18 20 68 39 20 51 68]
 [57  0 72 73 18 69 27 52 65 68]]
y
 [[39 20 48 71 80 70 68 45 60 60]
 [73 18 68 71 39 20 71 68 20 71]
 [18 51 68 73 70 68 20 68 44 73]
 [68 71 39 80 68 14 39 76 80 44]
 [69 20  6 68 39 80 70 68 71 80]
 [30 69 69 76 73 18 68 20 18 51]
 [58 18 18 20 68 39 20 51 68 69]
 [ 0 72 73 18 69 27 52 65 68 81]]


## Defining the model

In [11]:
#Checking if GPU is available
train_on_gpu = torch.cuda.is_available()
if train_on_gpu:
    print("Training on GPU...")
else:
    print("GPU not available..Training on CPU...")

Training on GPU...


In [12]:
class CharRNN(nn.Module):
    
    def __init__(self, tokens, n_hidden=256, n_layers=2, drop_prob=0.5, lr=0.001):
        super().__init__()
        self.drop_prob = drop_prob
        self.n_layers = n_layers
        self.n_hidden = n_hidden
        self.lr = lr
        
        #Creating character dictionaries
        self.chars = tokens
        self.int2char = dict(enumerate(self.chars))
        self.char2int = {ch:ii for ii,ch in self.int2char.items()}
        
        #Defining the layers of the model
        self.lstm = nn.LSTM(input_size=len(self.chars), hidden_size=self.n_hidden, num_layers=self.n_layers,
                            batch_first=True, dropout=self.drop_prob)
        self.dropout = nn.Dropout(p=self.drop_prob)
        self.fc = nn.Linear(self.n_hidden,len(self.chars))
    
    def forward(self, x, hidden):
        '''
        Forward pass through the network.
        These inputs are x, and the hidden state/cell state `hidden`.
        '''
        
        #Get the outputs and new hidden state from the LSTM
        r_output, hidden = self.lstm(x,hidden)
        
        #Pass the output through dropout layer
        out = self.dropout(r_output)
        
        #Stack up LSTM outputs using view
        out = out.contiguous().view(-1,self.n_hidden)
        
        #Finally pass the output through the fully-connected layer
        out = self.fc(out)
        
        return out,hidden
    
    def init_hidden(self,batch_size):
        ''' Initializes the hidden state '''
        #Create two new tensors with sizes n_layers x batch_size x n_hidden 
        #initialized to zero, for hidden state and cell state for LSTM
        weight = next(self.parameters()).data
        
        if train_on_gpu:
            hidden = (weight.new(self.n_layers,batch_size,self.n_hidden).zero_().cuda(),
                      weight.new(self.n_layers,batch_size,self.n_hidden).zero_().cuda())
        else:
            hidden = (weight.new(self.n_layers,batch_size,self.n_hidden).zero_(),
                      weight.new(self.n_layers,batch_size,self.n_hidden).zero_())
            
        return hidden

## Defining the training function

In [13]:
def train(net, data, epochs=10, batch_size=10, seq_length=50, lr=0.001, clip=5, val_frac=0.1, print_every=10):
    '''
    Training a network
    Arguments
    ----------
    net: CharRNN Network
    data: text data to train our network
    epochs: Number of epochs to train
    batch_size: Number of mini-sequences per mini-batch, aka batch size
    seq_length: Number of character steps per mini-batch
    lr: learning rate
    clip: gradient clipping
    val_frac: Fraction of data to hold out for validation
    print_every: Number of steps for printing training and validation loss
    '''
    
    net.train()
    
    optimizer = torch.optim.Adam(net.parameters(),lr=lr)
    criterion = nn.CrossEntropyLoss()
    
    #Creating training and validation data
    val_idx = int(len(data)*(1-val_frac))
    data, val_data = data[:val_idx], data[val_idx:]
    
    if train_on_gpu:
        net.cuda()
    
    counter = 0
    n_chars = len(net.chars)
    for e in range(epochs):
        #Initialize the hidden state
        h = net.init_hidden(batch_size)
        
        for x,y in get_batches(data,batch_size,seq_length):
            counter += 1
            
            #One-hot encoding our data to feed into network
            x = one_hot_encode(x,n_chars)
            inputs, targets = torch.from_numpy(x), torch.from_numpy(y)
            
            if train_on_gpu:
                inputs, targets = inputs.cuda(), targets.cuda()
            
            #Creating new variables for hidden state
            h = tuple([each.data for each in h])
            
            net.zero_grad()
            
            #Getting output from the model
            output, h = net(inputs,h)
            
            #Calculating the loss and performing backpropagation
            loss = criterion(output, targets.view(batch_size*seq_length).long())
            loss.backward()
            # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
            nn.utils.clip_grad_norm_(net.parameters(), clip)
            optimizer.step()
            
            #Loss stats
            if counter%print_every == 0:
                #Get validation loss
                val_h = net.init_hidden(batch_size)
                val_losses = []
                net.eval()
                
                for x,y in get_batches(val_data, batch_size, seq_length):
                    x = one_hot_encode(x,n_chars)
                    inputs, targets = torch.from_numpy(x), torch.from_numpy(y)
                    
                    val_h = tuple([each.data for each in val_h])
                    
                    if train_on_gpu:
                        inputs, targets = inputs.cuda(), targets.cuda()
                    
                    output, val_h = net(inputs,val_h)
                    val_loss = criterion(output, targets.view(batch_size*seq_length).long())
                    val_losses.append(val_loss.item())
                
                net.train()
                
                print("Epoch: {}/{}...".format(e+1,epochs),
                      "Step: {}...".format(counter),
                      "Loss: {:.4f}...".format(loss.item()),
                      "Val Loss: {:.4f}".format(np.mean(val_losses)))

### Instantiating the model

In [14]:
#Setting the model hyperparameters
n_hidden = 512
n_layers = 2

net = CharRNN(chars,n_hidden,n_layers)
print(net)

CharRNN(
  (lstm): LSTM(83, 512, num_layers=2, batch_first=True, dropout=0.5)
  (dropout): Dropout(p=0.5, inplace=False)
  (fc): Linear(in_features=512, out_features=83, bias=True)
)


In [15]:
#Setting the training hyperparameters
batch_size = 128
seq_length = 100
n_epochs = 20

train(net, encoded, epochs=n_epochs, batch_size=batch_size, seq_length=seq_length)

Epoch: 1/20... Step: 10... Loss: 3.2736... Val Loss: 3.2161
Epoch: 1/20... Step: 20... Loss: 3.1613... Val Loss: 3.1342
Epoch: 1/20... Step: 30... Loss: 3.1481... Val Loss: 3.1220
Epoch: 1/20... Step: 40... Loss: 3.1200... Val Loss: 3.1209
Epoch: 1/20... Step: 50... Loss: 3.1443... Val Loss: 3.1178
Epoch: 1/20... Step: 60... Loss: 3.1245... Val Loss: 3.1166
Epoch: 1/20... Step: 70... Loss: 3.1092... Val Loss: 3.1154
Epoch: 1/20... Step: 80... Loss: 3.1281... Val Loss: 3.1124
Epoch: 1/20... Step: 90... Loss: 3.1224... Val Loss: 3.1060
Epoch: 1/20... Step: 100... Loss: 3.1005... Val Loss: 3.0904
Epoch: 1/20... Step: 110... Loss: 3.0752... Val Loss: 3.0559
Epoch: 1/20... Step: 120... Loss: 3.0003... Val Loss: 2.9904
Epoch: 1/20... Step: 130... Loss: 3.0108... Val Loss: 2.9222
Epoch: 2/20... Step: 140... Loss: 2.8714... Val Loss: 2.8181
Epoch: 2/20... Step: 150... Loss: 2.7687... Val Loss: 2.7933
Epoch: 2/20... Step: 160... Loss: 2.6919... Val Loss: 2.6423
Epoch: 2/20... Step: 170... Loss:

Epoch: 10/20... Step: 1350... Loss: 1.4213... Val Loss: 1.4496
Epoch: 10/20... Step: 1360... Loss: 1.4195... Val Loss: 1.4409
Epoch: 10/20... Step: 1370... Loss: 1.4090... Val Loss: 1.4399
Epoch: 10/20... Step: 1380... Loss: 1.4538... Val Loss: 1.4395
Epoch: 10/20... Step: 1390... Loss: 1.4568... Val Loss: 1.4342
Epoch: 11/20... Step: 1400... Loss: 1.4608... Val Loss: 1.4335
Epoch: 11/20... Step: 1410... Loss: 1.4626... Val Loss: 1.4343
Epoch: 11/20... Step: 1420... Loss: 1.4588... Val Loss: 1.4300
Epoch: 11/20... Step: 1430... Loss: 1.4142... Val Loss: 1.4282
Epoch: 11/20... Step: 1440... Loss: 1.4451... Val Loss: 1.4291
Epoch: 11/20... Step: 1450... Loss: 1.3768... Val Loss: 1.4239
Epoch: 11/20... Step: 1460... Loss: 1.4080... Val Loss: 1.4219
Epoch: 11/20... Step: 1470... Loss: 1.4004... Val Loss: 1.4219
Epoch: 11/20... Step: 1480... Loss: 1.4173... Val Loss: 1.4200
Epoch: 11/20... Step: 1490... Loss: 1.4099... Val Loss: 1.4198
Epoch: 11/20... Step: 1500... Loss: 1.3912... Val Loss:

Epoch: 20/20... Step: 2660... Loss: 1.2564... Val Loss: 1.3095
Epoch: 20/20... Step: 2670... Loss: 1.2630... Val Loss: 1.3056
Epoch: 20/20... Step: 2680... Loss: 1.2465... Val Loss: 1.3055
Epoch: 20/20... Step: 2690... Loss: 1.2493... Val Loss: 1.3060
Epoch: 20/20... Step: 2700... Loss: 1.2494... Val Loss: 1.2992
Epoch: 20/20... Step: 2710... Loss: 1.2243... Val Loss: 1.3050
Epoch: 20/20... Step: 2720... Loss: 1.2210... Val Loss: 1.3046
Epoch: 20/20... Step: 2730... Loss: 1.2173... Val Loss: 1.3026
Epoch: 20/20... Step: 2740... Loss: 1.2145... Val Loss: 1.3058
Epoch: 20/20... Step: 2750... Loss: 1.2242... Val Loss: 1.3055
Epoch: 20/20... Step: 2760... Loss: 1.2232... Val Loss: 1.2985
Epoch: 20/20... Step: 2770... Loss: 1.2532... Val Loss: 1.3024
Epoch: 20/20... Step: 2780... Loss: 1.2760... Val Loss: 1.3004


### Saving checkpoint

In [16]:
model_name = 'char_rnn_1.net'

checkpoint = {'n_hidden':net.n_hidden,
              'n_layers':net.n_layers,
              'state_dict':net.state_dict(),
              'tokens':net.chars}

with open(model_name,'wb') as f:
    torch.save(checkpoint,f)

## Making Predictions

In [17]:
def predict(net, char, h=None, top_k=None):
    '''
    Given a character, predict the next character
    Returns the predicted character and hidden state.
    '''
    
    #Tensor inputs
    x = np.array([[net.char2int[char]]])
    x = one_hot_encode(x,len(net.chars))
    inputs = torch.from_numpy(x)
    
    if train_on_gpu:
        inputs = inputs.cuda()
        
    #Detach hidden state from history
    h = tuple([each.data for each in h])
    
    #Get the output from the model
    out, h = net(inputs,h)
    
    #Get the character probabilities
    p = F.softmax(out, dim=1).data
    if train_on_gpu:
        p = p.cpu()
    
    #Get top characters
    if top_k is None:
        top_ch = np.arange(len(net.chars))
    else:
        p, top_ch = p.topk(top_k)
        top_ch = top_ch.numpy().squeeze()
        
    #Selecting the most likely character with some element of randomness
    p = p.numpy().squeeze()
    char = np.random.choice(top_ch,p=p/p.sum())
    
    return net.int2char[char], h

### Priming and generating text

In [18]:
def sample(net, size, prime="The", top_k=None):
    if train_on_gpu:
        net.cuda()
    else:
        net.cpu()
    
    net.eval()
    
    #Firstly run through the prime characters
    chars = [ch for ch in prime]
    h = net.init_hidden(1)
    for ch in prime:
        char, h = predict(net, ch, h, top_k=top_k)
        
    chars.append(char)
    
    for ii in range(size):
        char,h = predict(net,chars[-1],h,top_k=top_k)
        chars.append(char)
    
    return ''.join(chars)

In [19]:
print(sample(net,1000,"Character",top_k=5))

Character in
the druving of the belicitures, and the
position of this soul.

"Well, I don't believe that he, I'm not
struck her to her."

"It mean this from her, and I care to do you there." "I can't believe that the misery and take the sort,"
he added.

"Well, when you won't see it to the sound of maticusion," answered the praceicing at one and sighing,
and he found she was
so carried out
that the paters stopped the princess to take
a long while any acquaintances.

"I don't know what how it seems, and I can't come..."

"Oh, I what are you getting up on that in other, there's nothing or even this studant
of a conversation, that he didn't know, I shall, have you tell
her and treating that something."

"Oh, you can't believe the same won'e for you are than in her tenstened and my hostess," the most
figure was at a mander, and as though seemed to the conscetuation. The princess were
concentrated for his work, he went to
below his father, with the proforming.

"I can't go on anything. You 

### Loading a checkpoint

In [20]:
#Here we are loading a pre-trained model 'char_rnn_1.net' that trained on 20 epochs
with open('char_rnn_1.net','rb') as f:
    checkpoint = torch.load(f)

loaded_model = CharRNN(checkpoint['tokens'],checkpoint['n_hidden'],checkpoint['n_layers'])
loaded_model.load_state_dict(checkpoint['state_dict'])

<All keys matched successfully>

### Some examples of working of the model

In [21]:
print(sample(loaded_model,3000,top_k=5,prime="Aditya "))

Aditya with
a bed world, and a clear to say
in the belt and tracking on his bed work, where he was, the stall of the peasants was to stray the position with the
creature than her head, and as
he did not
see her son so as to go and take the death, and
while the sisters in
the children telled it all to the most property, the course were not storting any struggle.

"I can't step my arm, always."

"Yes, it's an one when, you do you
then, which has
been taken the steps, and then you will go
away, but I
have not the
motions, and this."

"I am all that too. Would you complete my sensoler," said Stepan Arkadyevitch. "What has the mistakes, but the
man as how if there?"

"I should not be defined! And that I went to him," said Alexey Alexandrovitch as he had not been
thought of that time
that. The countess with all the
close with the condlachion, and
said alought.

"I'll believe you. And his way we have a gentrement and the studing of her."

Levin smiled so is to think of the point the comparish

In [22]:
print(sample(loaded_model,5000,top_k=3,prime="Something"))

Something, and the signification, and some the princess, the same thing was sitting out on
a lone of his wife, he had seen them, as they had to do was some of the stard of the paress of his face at the same. The could never be supersisted at his brother, and so
they are no more at their carriage, and
her husband was a calm there were the partity, and to be discovered. And seemed to him what to see him. The profoss of the conscience was no disappearance to the same sort of sort of his brother. And the point were a stern and this feeling of
heart. They had nothing to him to see the meadow and happy, and something would have the children, and he was so all that she was
not still, but there was no often as something than he
could not care interested
and have but her face. He was a standing to her and
said, the profoss, and had brought her the streems of the part of his study."

"Where is some of a complex attitude on," he added, straight on the painter and sat the same
strenct, to be delig

In [23]:
#Sample using loaded model
print(sample(loaded_model,2000,top_k=5,prime="DL"))

DLly and the forest. This studic and her steps, they would have been successful interests to be
there, there are saling the personary of the coachman. But
the cart and calm than he had
said to his heart all all seeing
him.

"It'll go on, I did not know," he added, smiling.

"You mean and must go," she said, and that he was so much
to his wife's stucial to
the proversions of her, he was not than
examinisatiss at the position of the possibility of the compering of it, but
she was trauthing a strange feeling of contemptuous and such an every table and still the fact was not by now. And when she could not hind this sheeps.

"Ah!'s as you don't come at a moment and her to me."

"Oh, no, what is she? As though I
should hear her? The peasants there was so in a subject."

"Yes," thought Serguy Ivanovitch and his honest he did not left him
and who had been
their some
smile.

"I should not say a silence."

"Yes, in the ceuration, to be somewhere," he said, letting in,
and a little crushed hands 

# END OF NOTEBOOK