# Our first RNN

The goal of this lesson is to introduce the ideas behind Recurrent Neural Networks.

A very good starting point: [Karpathy's blog](http://karpathy.github.io/2015/05/21/rnn-effectiveness)

and the associated [code](https://github.com/karpathy/char-rnn) 

Fortunately, the code is not in pytorch, so that you can now 'translate it'!

In [1]:
import numpy as np

import torch
import torch.nn as nn
from torch.autograd import Variable
from torch_utils import ScaledEmbedding
import torch.nn.functional as F

import torch_utils; 
from torch_utils import gpu

We're going to download the collected works of Nietzsche to use as our data for this class.

In [2]:
data_folder = '/home/lelarge/courses/data/nietzsche/'
data_nie = data_folder+'nietzsche.txt'

In [3]:
#%mkdir -p $data_folder
#!wget -O $data_nie 'https://s3.amazonaws.com/text-datasets/nietzsche.txt'

In [4]:
text = open(data_nie).read()
print('corpus length:', len(text))

corpus length: 600893


In [5]:
chars = sorted(list(set(text)))
vocab_size = len(chars)+1
print('total chars:', vocab_size)

total chars: 85


In [6]:
chars.insert(0, "\0")

In [7]:
''.join(chars[1:-6])

'\n !"\'(),-.0123456789:;=?ABCDEFGHIJKLMNOPQRSTUVWXYZ[]_abcdefghijklmnopqrstuvwxy'

Map from chars to indices and back again

In [8]:
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

idx will be the data we use from now own

In [9]:
idx = [char_indices[c] for c in text]

In [10]:
idx[:10]

[40, 42, 29, 30, 25, 27, 29, 1, 1, 1]

In [11]:
''.join(indices_char[i] for i in idx[:60])

'PREFACE\n\n\nSUPPOSING that Truth is a woman--what then? Is the'

# 3 Char

We want to learn the 4-th character from the 3 first ones.

In [12]:
cs=3
c1_dat = [idx[i] for i in range(0, len(idx)-1-cs, cs)]
c2_dat = [idx[i+1] for i in range(0, len(idx)-1-cs, cs)]
c3_dat = [idx[i+2] for i in range(0, len(idx)-1-cs, cs)]
c4_dat = [idx[i+3] for i in range(0, len(idx)-1-cs, cs)]

Inputs

In [13]:
x1 = np.stack(c1_dat[:-2])
x2 = np.stack(c2_dat[:-2])
x3 = np.stack(c3_dat[:-2])

Outputs

In [14]:
y = np.stack(c4_dat[:-2])

In [15]:
x1[:4], x2[:4], x3[:4]

(array([40, 30, 29,  1]), array([42, 25,  1, 43]), array([29, 27,  1, 45]))

In [16]:
y[:4]

array([30, 29,  1, 40])

In [17]:
x1.shape, y.shape

((200295,), (200295,))

## Char-RNN

I am using the architecture described in [Lesson 6 of fast.ai course](http://wiki.fast.ai/index.php/Lesson_6_Notes#Recurrent_Neural_Network_.28RNN.29:)

In [18]:
class RNN(nn.Module):
    def __init__(self, embedding_dim=42, vocab_size = 1, hidden_dim =256):
        super(RNN, self).__init__()
        
        self._embedding_dim = embedding_dim
        self._vocab_size = vocab_size
        self._hidden_dim = hidden_dim
        
        self.embeddings = ScaledEmbedding(vocab_size, embedding_dim)
        self.i2h = nn.Linear(embedding_dim, hidden_dim)
        self.h2h = nn.Linear(hidden_dim, hidden_dim)
        self.h2o = nn.Linear(hidden_dim, vocab_size)
            
    def forward(self, c1, c2, c3):
        c1_embedding = self.embeddings(c1)
        c2_embedding = self.embeddings(c2)
        c3_embedding = self.embeddings(c3)
        c1_r = F.relu(self.i2h(c1_embedding))
        c2_r = F.relu(self.i2h(c2_embedding))
        c3_r = F.relu(self.i2h(c3_embedding))
        h2 = F.tanh(self.h2h(c1_r))
        h3 = F.tanh(self.h2h(h2+c2_r))
        output = self.h2o(h3+c3_r)
        return output

In [19]:
rnn = RNN(vocab_size=vocab_size)

First try

In [20]:
in1 = Variable(gpu(torch.from_numpy(np.array([idx[0]]).astype(np.int64))))
in2 = Variable(gpu(torch.from_numpy(np.array([idx[1]]).astype(np.int64))))
in3 = Variable(gpu(torch.from_numpy(np.array([idx[2]]).astype(np.int64))))

In [21]:
out = rnn(in1,in2,in3)

In [22]:
out

Variable containing:

Columns 0 to 9 
-0.0245 -0.0220 -0.1027 -0.1586 -0.0403  0.0261  0.0605 -0.0767 -0.0182  0.0008

Columns 10 to 19 
 0.0616  0.0658  0.0591 -0.0109 -0.0945 -0.0509 -0.1279 -0.1257  0.0811 -0.0466

Columns 20 to 29 
 0.0288  0.0015 -0.0207  0.0747 -0.0856  0.0547 -0.0007  0.0803  0.0031  0.0207

Columns 30 to 39 
 0.0143 -0.0338 -0.0827 -0.1020  0.0759  0.0703 -0.0587 -0.1110 -0.0033 -0.0895

Columns 40 to 49 
 0.0252  0.1196  0.0655 -0.0822 -0.0140 -0.0154 -0.0512 -0.0382  0.1218  0.0110

Columns 50 to 59 
-0.0996 -0.0840  0.0190  0.0546 -0.1337  0.0081 -0.0951 -0.1081  0.0785  0.0750

Columns 60 to 69 
-0.0137  0.0650  0.0895  0.0697 -0.1337  0.0203  0.0800 -0.0611  0.0033 -0.0344

Columns 70 to 79 
-0.0189  0.0474 -0.0107  0.0112 -0.0555  0.0359  0.0826  0.0071  0.0435 -0.0477

Columns 80 to 84 
 0.0649 -0.0492  0.0240 -0.0188  0.0309
[torch.FloatTensor of size 1x85]

Now with a batch of size 2

In [23]:
in1 = Variable(gpu(torch.from_numpy(np.array([x1[:2]]).astype(np.int64))).squeeze())
in2 = Variable(gpu(torch.from_numpy(np.array([x2[:2]]).astype(np.int64))).squeeze())
in3 = Variable(gpu(torch.from_numpy(np.array([x3[:2]]).astype(np.int64))).squeeze())

In [24]:
in1

Variable containing:
 40
 30
[torch.LongTensor of size 2]

In [25]:
out = rnn(in1,in2,in3)

In [26]:
out

Variable containing:

Columns 0 to 9 
-0.0245 -0.0220 -0.1027 -0.1586 -0.0403  0.0261  0.0605 -0.0767 -0.0182  0.0008
-0.0194 -0.0330 -0.1015 -0.1556 -0.0453  0.0255  0.0437 -0.0738 -0.0246 -0.0196

Columns 10 to 19 
 0.0616  0.0658  0.0591 -0.0109 -0.0945 -0.0509 -0.1279 -0.1257  0.0811 -0.0466
 0.0476  0.0624  0.0609 -0.0044 -0.0862 -0.0388 -0.1554 -0.1366  0.0714 -0.0449

Columns 20 to 29 
 0.0288  0.0015 -0.0207  0.0747 -0.0856  0.0547 -0.0007  0.0803  0.0031  0.0207
 0.0176 -0.0007 -0.0208  0.0679 -0.0802  0.0497 -0.0164  0.0831 -0.0060  0.0317

Columns 30 to 39 
 0.0143 -0.0338 -0.0827 -0.1020  0.0759  0.0703 -0.0587 -0.1110 -0.0033 -0.0895
 0.0122 -0.0442 -0.0711 -0.0978  0.0792  0.0780 -0.0497 -0.1086  0.0113 -0.0946

Columns 40 to 49 
 0.0252  0.1196  0.0655 -0.0822 -0.0140 -0.0154 -0.0512 -0.0382  0.1218  0.0110
 0.0166  0.1086  0.0617 -0.0817 -0.0218 -0.0177 -0.0480 -0.0479  0.1060  0.0066

Columns 50 to 59 
-0.0996 -0.0840  0.0190  0.0546 -0.1337  0.0081 -0.0951 -0.1081  0.

In [27]:
rnn_loss = nn.CrossEntropyLoss()
lr = 0.000001
rnn_optimizer = torch.optim.Adam(rnn.parameters(),lr = lr)

In [28]:
def char2Var(ch):
    return Variable(gpu(torch.from_numpy(np.array([ch]).astype(np.int64))).squeeze())

In [29]:
def data_gen(c1,c2,c3,c4,batch_size=64,shuffle=True):
    if shuffle:
        index = np.random.permutation(c1.shape[0])
        c1 = c1[index]
        c2 = c2[index]
        c3 = c3[index]
        c4 = c4[index]
    for idx in range(0,c1.shape[0],batch_size):
        yield(c1[idx:idx+batch_size],c2[idx:idx+batch_size], c3[idx:idx+batch_size], c4[idx:idx+batch_size])

In [30]:
def train_model(c1,c2,c3,c4,model=rnn,epochs=1,train=True):
    if train:
        model.train()
    else:
        model.eval()
        
    for epoch in range(epochs):
        batches = data_gen(c1,c2,c3,c4)
        running_loss = 0.0
        for ch1,ch2,ch3,ch4 in batches:
            in1 = char2Var(ch1)#Variable(gpu(torch.from_numpy(np.array([ch1]).astype(np.int64))).squeeze())
            in2 = char2Var(ch2)#Variable(gpu(torch.from_numpy(np.array([ch2]).astype(np.int64))).squeeze())
            in3 = char2Var(ch3)#Variable(gpu(torch.from_numpy(np.array([ch3]).astype(np.int64))).squeeze())
            ou4 = char2Var(ch4)#Variable(gpu(torch.from_numpy(np.array([ch4]).astype(np.int64))).squeeze())
            
            out = rnn(in1,in2,in3)
            loss = rnn_loss(out,ou4)
            rnn_optimizer.zero_grad()
            loss.backward()
            rnn_optimizer.step()
            
            running_loss += loss.data[0]
            
        epoch_loss = running_loss / c1.shape[0]
        print('Loss: {:.4f}'.format(epoch_loss))            

In [31]:
%%time
train_model(x1[:4], x2[:4], x3[:4], y[:4])

Loss: 1.1063
CPU times: user 36 ms, sys: 0 ns, total: 36 ms
Wall time: 21.6 ms


In [32]:
%%time
train_model(x1, x2, x3, y)

Loss: 0.0681
CPU times: user 1min 17s, sys: 1.84 s, total: 1min 19s
Wall time: 27.6 s


In [33]:
lr = 0.01
rnn_optimizer = torch.optim.Adam(rnn.parameters(),lr = lr)

In [34]:
%%time
train_model(x1, x2, x3, y)

Loss: 0.0395
CPU times: user 1min 24s, sys: 2.02 s, total: 1min 26s
Wall time: 32.4 s


In [35]:
def get_next(inp):
    idxs = [char_indices[c] for c in inp]
    arrs = [char2Var(i)[np.newaxis] for i in idxs]
    out = rnn(arrs[0], arrs[1], arrs[2])
    i = np.argmax(out.data.numpy())
    return chars[i]

In [36]:
get_next(' th')

'e'

In [37]:
get_next(' an')

'd'

## RNN with pytorch

We will now use the Recurrent layers of pytorch. The documentation is [here](http://pytorch.org/docs/master/nn.html#rnn)

To understand it, we will use a very simple example taken from PyTorchZeroToAll [tutorials](https://github.com/hunkim/PyTorchZeroToAll/blob/master/12_1_rnn_basics.py)

In [38]:
import torch
import torch.nn as nn
from torch.autograd import Variable

# One hot encoding for each char in 'hello'
h = [1, 0, 0, 0]
e = [0, 1, 0, 0]
l = [0, 0, 1, 0]
o = [0, 0, 0, 1]

# One cell RNN input_dim (4) -> output_dim (2). sequence: 5
cell = nn.RNN(input_size=4, hidden_size=2, batch_first=True)

# (batch, num_layers * num_directions, hidden_size) for batch_first=True
hidden = (Variable(torch.randn(1, 1, 2)))

# Propagate input through RNN
# Input: (batch, seq_len, input_size) when batch_first=True
inputs = Variable(torch.Tensor([h, e, l, l, o]))
print('inputs size', inputs.size())
for one in inputs:
    one = one.view(1, 1, -1)
    # Input: (batch, seq_len, input_size) when batch_first=True
    out, hidden = cell(one, hidden)
    print("one input size", one.size(), "out size", out.size(), 'hidden size', hidden.size())

inputs size torch.Size([5, 4])
one input size torch.Size([1, 1, 4]) out size torch.Size([1, 1, 2]) hidden size torch.Size([1, 1, 2])
one input size torch.Size([1, 1, 4]) out size torch.Size([1, 1, 2]) hidden size torch.Size([1, 1, 2])
one input size torch.Size([1, 1, 4]) out size torch.Size([1, 1, 2]) hidden size torch.Size([1, 1, 2])
one input size torch.Size([1, 1, 4]) out size torch.Size([1, 1, 2]) hidden size torch.Size([1, 1, 2])
one input size torch.Size([1, 1, 4]) out size torch.Size([1, 1, 2]) hidden size torch.Size([1, 1, 2])


In [39]:
hidden = (Variable(torch.randn(1, 1, 2)))
# We can do the whole at once
# Propagate input through RNN
# Input: (batch, seq_len, input_size) when batch_first=True
inputs = inputs.view(1, 5, -1)
out, hidden = cell(inputs, hidden)
print("sequence input size", inputs.size(), "out size", out.size(), 'hidden size', hidden.size())

sequence input size torch.Size([1, 5, 4]) out size torch.Size([1, 5, 2]) hidden size torch.Size([1, 1, 2])


In [40]:
# Strange batch number for hidden can be arbitrary?
hidden = (Variable(torch.randn(3, 1, 2)))

# One cell RNN input_dim (4) -> output_dim (2). sequence: 5, batch 3
# 3 batches 'hello', 'eolll', 'lleel'
inputs = Variable(torch.Tensor([[h, e, l, l, o],
                                [e, o, l, l, l],
                                [l, l, e, e, l]]))

# Propagate input through RNN
# Input: (batch, seq_len, input_size) when batch_first=True
# B x S x I
out, hidden = cell(inputs, hidden)
print("batch input size", inputs.size(), "out size", out.size(), 'hidden size', hidden.size())

batch input size torch.Size([3, 5, 4]) out size torch.Size([3, 5, 2]) hidden size torch.Size([1, 3, 2])


There seems to be a bug: the hidden state has size (num_layers * num_directions, batch, hidden_size)?

We are now ready to build our new RNN with an arbitrary number of characters in input.

In [41]:
class new_RNN(nn.Module):
    def __init__(self, embedding_dim=42, vocab_size = 1, hidden_dim =256):
        super(new_RNN, self).__init__()
        
        self._embedding_dim = embedding_dim
        self._vocab_size = vocab_size
        self._hidden_dim = hidden_dim
               
        self.embeddings = ScaledEmbedding(vocab_size, embedding_dim)
        self.cell = nn.RNN(input_size=embedding_dim, hidden_size=hidden_dim, batch_first=True)
        self.h2o = nn.Linear(hidden_dim, vocab_size)
        
    def forward(self, c):
        #print(c.size(1))
        hidden = Variable(torch.zeros(c.size(0),1,self._hidden_dim))
        for i in range(c.size(1)):
            #print(self.embeddings(c[:,i]).size())
            #print(self.embeddings(c[:,i]).unsqueeze(1).size(),hidden.size())
            outp,hidden = self.cell(self.embeddings(c[:,i]).unsqueeze(1),hidden)
            #print(i, outp.size())
        output = self.h2o(outp)
        return output

In [42]:
new_rnn = new_RNN(vocab_size=vocab_size)

In [43]:
inp = char2Var(idx[:8]).view(1,-1)

In [44]:
inp

Variable containing:
   40    42    29    30    25    27    29     1
[torch.LongTensor of size 1x8]

In [45]:
inp[:,0]

Variable containing:
 40
[torch.LongTensor of size 1]

In [46]:
emb = ScaledEmbedding(vocab_size, 30)

In [47]:
emb(inp[:,0]).unsqueeze(1).size()

torch.Size([1, 1, 30])

In [48]:
out = new_rnn(inp)

In [49]:
out.size()

torch.Size([1, 1, 85])

In [50]:
cs=8

In [51]:
c_in_dat = [[idx[i+n] for i in range(0, len(idx)-1-cs, cs)]
            for n in range(cs)]

In [52]:
np.shape(c_in_dat)

(8, 75111)

In [53]:
c_out_dat = [idx[i+cs] for i in range(0, len(idx)-1-cs, cs)]

In [54]:
xs = [np.stack(c[:-2]) for c in c_in_dat]

In [55]:
len(xs), xs[0].shape

(8, (75109,))

In [56]:
y = np.stack(c_out_dat[:-2])

In [57]:
np.shape(y)

(75109,)

In [58]:
[xs[n][:5] for n in range(cs)]

[array([40,  1, 33,  2, 72]),
 array([42,  1, 38, 44,  2]),
 array([29, 43, 31, 71, 54]),
 array([30, 45,  2, 74,  2]),
 array([25, 40, 73, 73, 76]),
 array([27, 40, 61, 61, 68]),
 array([29, 39, 54,  2, 66]),
 array([ 1, 43, 73, 62, 54])]

In [59]:
y[:5]

array([ 1, 33,  2, 72, 67])

In [60]:
idx[:16]

[40, 42, 29, 30, 25, 27, 29, 1, 1, 1, 43, 45, 40, 40, 39, 43]

In [61]:
inp = char2Var([xs[n][:2] for n in range(cs)]).permute(1,0)

In [62]:
inp

Variable containing:
   40    42    29    30    25    27    29     1
    1     1    43    45    40    40    39    43
[torch.LongTensor of size 2x8]

In [63]:
out = new_rnn(inp)

In [64]:
def new_data_gen(ch,y,batch_size=64,shuffle=True):
    if shuffle:
        index = np.random.permutation(ch.shape[0])
        ch = ch[index,:]
        y = y[index]
    for idx in range(0,ch.shape[0],batch_size):
        yield(ch[idx:idx+batch_size,:], y[idx:idx+batch_size])

In [80]:
def new_train_model(ch,y,model=new_rnn,epochs=1,train=True):
    if train:
        model.train()
    else:
        model.eval()
        
    for epoch in range(epochs):
        batches = new_data_gen(ch,y)
        running_loss = 0.0
        for ch,y in batches:
            inp = char2Var(ch)
            o = char2Var(y)
            
            out = new_rnn(inp)
            #print(out.squeeze().size(), o.size())
            loss = new_rnn_loss(out.squeeze(),o)
            new_rnn_optimizer.zero_grad()
            loss.backward()
            new_rnn_optimizer.step()
            
            running_loss += loss.data[0]
            
        epoch_loss = running_loss / ch.shape[0]
        print('Loss: {:.4f}'.format(epoch_loss))            

In [81]:
new_rnn_loss = nn.CrossEntropyLoss()
lr = 0.000001
new_rnn_optimizer = torch.optim.Adam(new_rnn.parameters(),lr = lr)

In [82]:
%%time
new_train_model(np.transpose([xs[n][:2] for n in range(cs)]),y[:2])

Loss: 2.2359
CPU times: user 24 ms, sys: 8 ms, total: 32 ms
Wall time: 15 ms


In [83]:
%%time
new_train_model(np.transpose([xs[n][:] for n in range(cs)]),y)

Loss: 140.1240
CPU times: user 1min 21s, sys: 1.75 s, total: 1min 23s
Wall time: 28.6 s


In [98]:
lr = 0.01
new_rnn_optimizer = torch.optim.Adam(new_rnn.parameters(),lr = lr)

In [99]:
%%time
new_train_model(np.transpose([xs[n][:] for n in range(cs)]),y)

Loss: 109.0212
CPU times: user 1min 25s, sys: 1.68 s, total: 1min 27s
Wall time: 30.2 s


In [100]:
def new_get_next(inp):
    idxs = [char_indices[c] for c in inp]
    #print(idxs)
    #arrs = [char2Var(i)[np.newaxis] for i in idxs]
    arrs = char2Var(idxs).unsqueeze(0)
    #print(arrs)
    out = new_rnn(arrs)
    i = np.argmax(out.data.numpy())
    return chars[i]

In [101]:
new_get_next('for thos')

' '

In [102]:
new_get_next('part of ')

'i'

# Exercise

As stated during the course, this code is very preliminary and does not run on GPU. Fix it!

Also instead of RNN, use [GRU](http://pytorch.org/docs/master/nn.html#torch.nn.GRU)