RNN
* Main idea is like a neuron gets part of its input from a different neuron on the same level
* This is good for series of data
    * time series prediction
    * text generation (sentences have a particular series)
    * etc.
* Extreme is have neuron to neuron to neuron then go to the neurons that produce the final input

Various RNNs

```
nn.RNN # hidden_size is the secret input / output to another neuron
nn.GRU
nn.LSTM
```

In [1]:
import torch as t
h = [1,0,0,0]
e = [0,1,0,0]
l = [0,0,1,0]
o = [0,0,0,1]

In [2]:
cell = t.nn.RNN(input_size=4, hidden_size=2, num_layers=3, batch_first=True, bidirectional=True) # 2 out, 2 hidden, add seq_len for length of 3rd dim
inputs = t.Tensor([[[1, 0, 0, 0]]])

In [3]:
hidden = t.randn(6,1,2) ## depth, rows, cols

In [4]:
out_t1, hidden_t1 = cell(inputs, hidden)

In [5]:
print(f'out {out_t1.data}, hidden {hidden_t1.data}') 
# 3 layers, each 1 with hidden, out_1 is result of going through 3 layers

out tensor([[[-0.2562,  0.3281,  0.9079,  0.3072]]]), hidden tensor([[[-0.8171,  0.9373]],

        [[-0.6565,  0.6108]],

        [[ 0.7963,  0.9485]],

        [[-0.7855,  0.9595]],

        [[-0.2562,  0.3281]],

        [[ 0.9079,  0.3072]]])


### How to deal with multiple words ?

In [6]:
# one word
inputs_2 = t.Tensor([[
    h,e,l,l,o
]])
# input_size = columns, or one hot size
cell_2 = t.nn.RNN(input_size=4, hidden_size=2, batch_first=True)
print(f'inputs 2 size {inputs_2.size()}')

inputs 2 size torch.Size([1, 5, 4])


In [7]:
out, hidden = cell_2(inputs_2,  t.randn(1,1,2))
print(f'out {out.data} {out.data.size()}, hidden {hidden.data}')

out tensor([[[-0.5892,  0.5320],
         [ 0.3550, -0.7357],
         [ 0.6484,  0.4687],
         [ 0.4873,  0.2668],
         [ 0.1915,  0.3272]]]) torch.Size([1, 5, 2]), hidden tensor([[[ 0.1915,  0.3272]]])


In [8]:
inputs_3 = t.Tensor([
    [h,e,l,l,o],
    [e, o, l, l, l],
    [l, l, e, e, l]
])
hidden_3 = t.randn(1,3,2) # (num_layers * num_directions, batch, hidden_size)
cell_3 = t.nn.RNN(input_size=4, hidden_size=2, batch_first=True)
print(f'inputs 3 size {inputs_3.size()}')
out, hidden = cell_3(inputs_3, hidden_3)
print(f'out {out.data} \n {out.data.size()} \n hidden {hidden.data}')

inputs 3 size torch.Size([3, 5, 4])
out tensor([[[-0.0048, -0.0478],
         [-0.4024,  0.2319],
         [-0.4744,  0.5936],
         [-0.4182,  0.6447],
         [-0.1237,  0.8420]],

        [[-0.3012,  0.3575],
         [-0.1574,  0.8211],
         [-0.2656,  0.6674],
         [-0.3377,  0.6501],
         [-0.3643,  0.6493]],

        [[-0.4941,  0.3444],
         [-0.4766,  0.6116],
         [-0.3967,  0.3792],
         [-0.4243,  0.3316],
         [-0.4599,  0.6083]]]) 
 torch.Size([3, 5, 2]) 
 hidden tensor([[[-0.1237,  0.8420],
         [-0.3643,  0.6493],
         [-0.4599,  0.6083]]])


Output prob of a particular letter

Data is fed in where the batch (3rd dimension) is one entire sequence, or entire word at a time

Num of sequences x seq length x length of a single sequence (data encoded for one period of time)

In [9]:
idx2char = ['h', 'i', 'e', 'l', 'o']
x_data = [0,1,0,2,3,3] # 'hihell'
one_hot_lookup = [
    [1,0,0,0,0], # h
    [0,1,0,0,0], # i
    [1,0,0,0,0],
    [0,0,1,0,0],
    [0,0,0,1,0]
]
y_data = [1,0,2,3,3,4] # ihello
x_one_hot = [one_hot_lookup[x] for x in x_data]
x_one_hot

[[1, 0, 0, 0, 0],
 [0, 1, 0, 0, 0],
 [1, 0, 0, 0, 0],
 [1, 0, 0, 0, 0],
 [0, 0, 1, 0, 0],
 [0, 0, 1, 0, 0]]

In [10]:
from torch.autograd import Variable

inputs = t.Tensor(x_one_hot)
labels = t.LongTensor(y_data)

In [11]:
num_classes = 5
input_size = 5 # one hot length
hidden_size = 5 # LSTM output, we need to feed into next cell
batch_size = 1 # one sentence
sequence_length = 1 # 1 by 1
num_layers = 1 # 1 layer rnn

In [12]:
class Model(t.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.rnn = t.nn.RNN(input_size=input_size, hidden_size=hidden_size, batch_first=True)
    def forward(self, x, hidden=None):
        if hidden is None:
            hidden = self.init_hidden()
        x = x.view(batch_size, sequence_length, input_size)
        out, hidden = self.rnn(x, hidden)
        out = out.view(-1, num_classes)
        return hidden, out
    def init_hidden(self):
        return t.zeros(num_layers, batch_size, hidden_size) # hidden is 1 per hidden layer? or implicit cell input

In [21]:
model = Model()
criterion = t.nn.CrossEntropyLoss()
optimizer = t.optim.Adam(model.parameters(), lr = 0.1)

for epoch in range(100):
    optimizer.zero_grad()
    loss = 0
    hidden = model.init_hidden()
    for input, label in zip(inputs, labels):
        hidden, output = model(input, hidden)
        val, idx = output.max(1)
        #print(idx2char[idx.data[0]])
        #print(output, label.view(-1))
        loss += criterion(output, label.view(-1))
    print(f'epoch {epoch + 1}, loss {loss.data[0]}')
    loss.backward()
    optimizer.step()

tensor([[ 0.1296, -0.0692,  0.3919, -0.3089,  0.2677]]) tensor([ 1])
tensor([[ 0.1174, -0.0303,  0.4079, -0.0409,  0.3867]]) tensor([ 0])
tensor([[ 0.3537, -0.0704,  0.3279,  0.0441,  0.2974]]) tensor([ 2])
tensor([[ 0.2882, -0.1989,  0.3538,  0.0616,  0.2519]]) tensor([ 3])
tensor([[ 0.2257,  0.2637, -0.0953,  0.5952,  0.0730]]) tensor([ 3])
tensor([[ 0.0205,  0.1227, -0.1650,  0.1063,  0.2180]]) tensor([ 4])
epoch 1, loss 9.388077735900879
tensor([[ 0.2263,  0.2267,  0.6132, -0.0193,  0.3578]]) tensor([ 1])
tensor([[ 0.5170, -0.0228,  0.4883,  0.1029,  0.4820]]) tensor([ 0])
tensor([[ 0.5701, -0.1278,  0.5864,  0.5900,  0.3387]]) tensor([ 2])
tensor([[ 0.3026, -0.3340,  0.3707,  0.5749,  0.2331]]) tensor([ 3])
tensor([[ 0.1160,  0.1223, -0.2890,  0.8006,  0.2345]]) tensor([ 3])
tensor([[-0.0713,  0.1850, -0.2699,  0.4226,  0.5581]]) tensor([ 4])
epoch 2, loss 8.202107429504395
tensor([[ 0.2905,  0.4841,  0.7464,  0.0092,  0.2525]]) tensor([ 1])
tensor([[ 0.7712,  0.0174,  0.6741, -0.

  from ipykernel import kernelapp as app


tensor([[-0.7191, -0.8027, -0.4842,  0.9501, -0.9979]]) tensor([ 3])
tensor([[-0.9996, -0.4529, -0.9999,  0.6707,  0.0857]]) tensor([ 3])
tensor([[-0.9932,  0.7811, -0.9996, -0.8657,  0.9848]]) tensor([ 4])
epoch 22, loss 4.774805545806885
tensor([[-0.1715,  0.8075,  0.3959, -0.6290, -0.8580]]) tensor([ 1])
tensor([[ 0.9928,  0.1083, -0.0131, -0.9952, -0.9465]]) tensor([ 0])
tensor([[ 0.4132,  0.0616,  0.9855, -0.3090, -0.9975]]) tensor([ 2])
tensor([[ 0.3605, -0.6199,  0.7940,  0.8721, -0.9996]]) tensor([ 3])
tensor([[-0.9767, -0.9991, -0.9972,  0.9997, -0.9968]]) tensor([ 3])
tensor([[-1.0000,  0.3999, -1.0000, -0.2111,  0.7210]]) tensor([ 4])
epoch 23, loss 4.8970112800598145
tensor([[-0.2028,  0.8131,  0.3655, -0.7036, -0.8640]]) tensor([ 1])
tensor([[ 0.9937,  0.2815,  0.0804, -0.9972, -0.9466]]) tensor([ 0])
tensor([[ 0.6550, -0.1088,  0.9896, -0.4723, -0.9974]]) tensor([ 2])
tensor([[ 0.4750, -0.7076,  0.9093,  0.9369, -0.9999]]) tensor([ 3])
tensor([[-0.9772, -0.9996, -0.9969, 

tensor([[-0.5939,  0.6974, -0.1730, -0.9470, -0.8794]]) tensor([ 1])
tensor([[ 0.9860,  0.9472, -0.2598, -1.0000, -0.7687]]) tensor([ 0])
tensor([[ 0.9135, -0.5796,  0.9899, -0.9932, -0.9779]]) tensor([ 2])
tensor([[ 0.8000, -0.8265,  0.9847,  0.9402, -1.0000]]) tensor([ 3])
tensor([[-0.9868, -1.0000, -0.9854,  1.0000, -0.9999]]) tensor([ 3])
tensor([[-1.0000, -0.9659, -1.0000, -0.8920,  0.9831]]) tensor([ 4])
epoch 41, loss 4.618303298950195
tensor([[-0.6073,  0.6751, -0.1726, -0.9480, -0.8824]]) tensor([ 1])
tensor([[ 0.9868,  0.9574, -0.1903, -1.0000, -0.8291]]) tensor([ 0])
tensor([[ 0.9309, -0.6963,  0.9915, -0.9904, -0.9878]]) tensor([ 2])
tensor([[ 0.7732, -0.8557,  0.9843,  0.9614, -1.0000]]) tensor([ 3])
tensor([[-0.9898, -1.0000, -0.9870,  1.0000, -0.9999]]) tensor([ 3])
tensor([[-1.0000, -0.9662, -1.0000, -0.8984,  0.9833]]) tensor([ 4])
epoch 42, loss 4.606808662414551
tensor([[-0.6206,  0.6563, -0.1739, -0.9486, -0.8867]]) tensor([ 1])
tensor([[ 0.9873,  0.9651, -0.1307, -

tensor([[-0.8709,  0.8880, -0.6276, -0.9441, -0.9523]]) tensor([ 1])
tensor([[ 0.9925,  0.9984, -0.7040, -1.0000, -0.9440]]) tensor([ 0])
tensor([[ 0.8953, -0.8335,  0.9612, -0.9993, -0.9953]]) tensor([ 2])
tensor([[-0.7163, -0.7902,  0.9688,  0.9938, -1.0000]]) tensor([ 3])
tensor([[-0.9999, -0.9957, -0.9996,  0.9997, -0.9909]]) tensor([ 3])
tensor([[-1.0000, -0.9287, -1.0000, -0.9664,  0.9895]]) tensor([ 4])
epoch 62, loss 4.090289115905762
tensor([[-0.8786,  0.8947, -0.6460, -0.9441, -0.9544]]) tensor([ 1])
tensor([[ 0.9926,  0.9985, -0.7349, -1.0000, -0.9472]]) tensor([ 0])
tensor([[ 0.8847, -0.8353,  0.9571, -0.9994, -0.9952]]) tensor([ 2])
tensor([[-0.7751, -0.7727,  0.9660,  0.9937, -1.0000]]) tensor([ 3])
tensor([[-1.0000, -0.9936, -0.9997,  0.9996, -0.9886]]) tensor([ 3])
tensor([[-1.0000, -0.9265, -1.0000, -0.9681,  0.9898]]) tensor([ 4])
epoch 63, loss 4.076082229614258
tensor([[-0.8857,  0.9006, -0.6626, -0.9441, -0.9565]]) tensor([ 1])
tensor([[ 0.9927,  0.9986, -0.7606, -

tensor([[ 0.4788, -0.9698,  0.9791, -0.9998, -0.9991]]) tensor([ 2])
tensor([[-0.9914, -0.8769,  0.0143,  0.9963, -1.0000]]) tensor([ 3])
tensor([[-1.0000, -0.9828, -1.0000,  0.8924, -0.7955]]) tensor([ 3])
tensor([[-1.0000, -0.8961, -1.0000, -0.9710,  0.9570]]) tensor([ 4])
epoch 90, loss 3.667187452316284
tensor([[-0.9466,  0.8254, -0.8620, -0.9340, -0.9878]]) tensor([ 1])
tensor([[ 0.9909,  0.9994, -0.8644, -1.0000, -0.9964]]) tensor([ 0])
tensor([[ 0.4628, -0.9690,  0.9806, -0.9998, -0.9991]]) tensor([ 2])
tensor([[-0.9917, -0.8735,  0.0004,  0.9967, -1.0000]]) tensor([ 3])
tensor([[-1.0000, -0.9834, -1.0000,  0.9028, -0.8081]]) tensor([ 3])
tensor([[-1.0000, -0.9021, -1.0000, -0.9637,  0.9510]]) tensor([ 4])
epoch 91, loss 3.653881311416626
tensor([[-0.9472,  0.8311, -0.8617, -0.9313, -0.9882]]) tensor([ 1])
tensor([[ 0.9911,  0.9994, -0.8661, -1.0000, -0.9966]]) tensor([ 0])
tensor([[ 0.4549, -0.9678,  0.9822, -0.9998, -0.9991]]) tensor([ 2])
tensor([[-0.9919, -0.8750,  0.0170,  

In [14]:
batch_s = 1
seq_l = 6
class Model2(t.nn.Module):
    def __init__(self):
        super(Model2, self).__init__()
        self.rnn = t.nn.RNN(input_size=input_size, hidden_size=hidden_size, batch_first=True)
    def forward(self, x):
        hidden = self.init_hidden()
        x = x.view(batch_s, seq_l, input_size)
        out, hidden = self.rnn(x, hidden)
        return out.view(-1, num_classes)
    def init_hidden(self):
        return t.zeros(num_layers, batch_s, hidden_size) # hidden is 1 per hidden layer? or implicit cell input

In [15]:
model = Model2()
criterion = t.nn.CrossEntropyLoss()
optimizer = t.optim.Adam(model.parameters(), lr = 0.1)
for epoch in range(100):
    optimizer.zero_grad()
    loss = 0
    outputs = model(inputs)
    optimizer.zero_grad()
    loss = criterion(outputs, labels)
    print(f'epoch {epoch + 1}, loss {loss.data[0]}')
    loss.backward()
    optimizer.step()
    _, idx = outputs.max(1)
    idx = idx.data.numpy()
    result_str = [idx2char[c] for c in idx.squeeze()]
    #print("epoch: %d, loss: %1.3f" % (epoch + 1, loss.data[0]))
    print("Predicted string: ", ''.join(result_str))

epoch 1, loss 1.6415597200393677
Predicted string:  eeeeee
epoch 2, loss 1.4516552686691284
Predicted string:  eellll
epoch 3, loss 1.3398113250732422
Predicted string:  lhllll
epoch 4, loss 1.2325706481933594
Predicted string:  lhllll
epoch 5, loss 1.1222553253173828
Predicted string:  lhelll
epoch 6, loss 1.0544041395187378
Predicted string:  lhelll
epoch 7, loss 0.9917203783988953
Predicted string:  lhelll
epoch 8, loss 0.929325520992279
Predicted string:  ihelll
epoch 9, loss 0.87046879529953
Predicted string:  ihelll
epoch 10, loss 0.8156734108924866
Predicted string:  ihello
epoch 11, loss 0.7674911618232727
Predicted string:  ihello
epoch 12, loss 0.7274844646453857
Predicted string:  ihello
epoch 13, loss 0.6987817287445068
Predicted string:  ihello
epoch 14, loss 0.6784595847129822
Predicted string:  ihello
epoch 15, loss 0.6577062606811523
Predicted string:  ihello
epoch 16, loss 0.6338006854057312
Predicted string:  ihello
epoch 17, loss 0.6107373833656311
Predicted string: 

  # Remove the CWD from sys.path while we load stuff.


Predicted string:  ihello
epoch 46, loss 0.471571683883667
Predicted string:  ihello
epoch 47, loss 0.4706486165523529
Predicted string:  ihello
epoch 48, loss 0.47025489807128906
Predicted string:  ihello
epoch 49, loss 0.4696974754333496
Predicted string:  ihello
epoch 50, loss 0.4689004719257355
Predicted string:  ihello
epoch 51, loss 0.4684046804904938
Predicted string:  ihello
epoch 52, loss 0.46818193793296814
Predicted string:  ihello
epoch 53, loss 0.4676450788974762
Predicted string:  ihello
epoch 54, loss 0.4671362340450287
Predicted string:  ihello
epoch 55, loss 0.4668966233730316
Predicted string:  ihello
epoch 56, loss 0.46656182408332825
Predicted string:  ihello
epoch 57, loss 0.4660802185535431
Predicted string:  ihello
epoch 58, loss 0.4657630920410156
Predicted string:  ihello
epoch 59, loss 0.46556004881858826
Predicted string:  ihello
epoch 60, loss 0.46519505977630615
Predicted string:  ihello
epoch 61, loss 0.46489599347114563
Predicted string:  ihello
epoch 62,

### 12-2
Softmax as an output is more stables because forces numbers between 0 / 1

Embedding idea

Use a lookup table versus one hot ... like a combination of features to represent something versus one

In [16]:
test = t.nn.Embedding(5, 10) # 5 parts to a sentence, now it's represented by 10 pieces uniquely

In [17]:
test(t.LongTensor(x_data))

tensor([[-0.4162,  1.1143,  1.1127,  1.6606,  0.3824, -0.1259, -1.1290,
          1.3560,  0.1015, -1.5492],
        [-0.1946, -0.3707,  1.2166, -0.5027,  1.8302, -0.3568,  0.3462,
         -1.5733, -1.5489, -0.3795],
        [-0.4162,  1.1143,  1.1127,  1.6606,  0.3824, -0.1259, -1.1290,
          1.3560,  0.1015, -1.5492],
        [ 0.5127, -0.5082, -0.0649, -0.5959,  1.1138,  0.2975,  0.8366,
          0.2352, -0.5933, -0.6783],
        [ 0.7313,  0.3983,  2.1342, -0.0237,  0.2575, -1.1401,  1.2834,
          0.0958, -1.0707, -1.0808],
        [ 0.7313,  0.3983,  2.1342, -0.0237,  0.2575, -1.1401,  1.2834,
          0.0958, -1.0707, -1.0808]])

### 12-3 TODO

https://colah.github.io/posts/2015-08-Understanding-LSTMs/

http://blog.varunajayasiri.com/numpy_lstm.html

$b, c$ are bias terms

$h^{t-1}$ is hidden from before, with $w$ as its weight

$U$ is weight for input $x^t$

$V$ is weight for $h^t$

$y^t = softmax(o^t)$ output

$o^t = c + Vh^t$ 

$h^t = tanh(a^t)$

$a^t = b + w^{t-1} + Ux^t$

http://cs224d.stanford.edu/lecture_notes/notes4.pdf for diagrams

Batch first means depth dimension is first in [n,m,z] notation

Bi direcitonal means 2 directions
https://towardsdatascience.com/understanding-bidirectional-rnn-in-pytorch-5bd25a5dd66 -> input the sequence forward and backward, hidden is 1x for forward, 1x for backward 2x for output