# Thinking in tensors in PyTorch

Hands-on training  by [Piotr Migdał](https://p.migdal.pl) (2019). 

Version for [AI & NLP Workshop Day](https://nlpday.pl/), 31 May 2019, Warsaw, Poland: **Understanding LSTM and GRU networks in PyTorch**.



## NLP & AI: 4. LSTM GRU anatomy


[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/stared/thinking-in-tensors-writing-in-pytorch/blob/master/extra/4%20LSTM%20GRU%20anatomy.ipynb)
 

In [1]:
import torch
from torch import nn

## LSTM

More in https://pytorch.org/docs/stable/nn.html#lstm

In [2]:
lstm = nn.LSTM(5, 3)

In [3]:
# L = 8 (length)
# B = 1 (batch size)
# C = 5 (channels)
x = torch.randn(8, 1, 5)

In [4]:
x

tensor([[[-1.9475,  0.0295, -0.9345,  0.0686,  0.1173]],

        [[ 0.4105,  1.0467, -0.5570, -0.9523,  0.3352]],

        [[ 0.3378,  0.6555, -0.2683, -0.3765,  0.2690]],

        [[ 0.8649,  0.7311,  0.0932, -1.3769, -1.3036]],

        [[-0.7326, -0.7776,  1.1224,  0.2065, -0.5358]],

        [[ 1.1138, -2.7978, -1.3544, -0.3246, -0.5382]],

        [[-0.3468,  0.2607, -0.0527, -0.2501, -1.4180]],

        [[ 0.9567, -0.1102,  1.1552,  0.6020, -0.8374]]])

In [5]:
output, (hidden, cell) = lstm(x)

In [6]:
output

tensor([[[-0.0337,  0.2362,  0.4257]],

        [[-0.1345, -0.0277,  0.2995]],

        [[-0.2241, -0.0752,  0.3286]],

        [[-0.3758, -0.0412,  0.3113]],

        [[-0.4010,  0.0884,  0.3689]],

        [[-0.3639,  0.1844,  0.3982]],

        [[-0.4993,  0.1345,  0.4895]],

        [[-0.7292,  0.1251,  0.2716]]], grad_fn=<StackBackward>)

In [7]:
hidden

tensor([[[-0.7292,  0.1251,  0.2716]]], grad_fn=<StackBackward>)

In [8]:
cell

tensor([[[-1.3856,  0.3687,  0.5565]]], grad_fn=<StackBackward>)

In [9]:
output[-1] == hidden

tensor([[[1, 1, 1]]], dtype=torch.uint8)

## Step by step

In [11]:
output1, (hidden1, cell1) = lstm(x[:4])
output2, (hidden2, cell2) = lstm(x[4:], (hidden1, cell1))

In [12]:
output1

tensor([[[-0.0337,  0.2362,  0.4257]],

        [[-0.1345, -0.0277,  0.2995]],

        [[-0.2241, -0.0752,  0.3286]],

        [[-0.3758, -0.0412,  0.3113]]], grad_fn=<StackBackward>)

In [13]:
output2

tensor([[[-0.4010,  0.0884,  0.3689]],

        [[-0.3639,  0.1844,  0.3982]],

        [[-0.4993,  0.1345,  0.4895]],

        [[-0.7292,  0.1251,  0.2716]]], grad_fn=<StackBackward>)

## Iteration

In [None]:
lstm

In [14]:
hidden = torch.tensor([[[ 0., 0., 0.]]])
cell = torch.tensor([[[ 0., 0., 0.]]])
for i, token in enumerate(x):
    output, (hidden, cell) = lstm(x[i:i+1], (hidden, cell))
    print(output)

tensor([[[-0.0337,  0.2362,  0.4257]]], grad_fn=<StackBackward>)
tensor([[[-0.1345, -0.0277,  0.2995]]], grad_fn=<StackBackward>)
tensor([[[-0.2241, -0.0752,  0.3286]]], grad_fn=<StackBackward>)
tensor([[[-0.3758, -0.0412,  0.3113]]], grad_fn=<StackBackward>)
tensor([[[-0.4010,  0.0884,  0.3689]]], grad_fn=<StackBackward>)
tensor([[[-0.3639,  0.1844,  0.3982]]], grad_fn=<StackBackward>)
tensor([[[-0.4993,  0.1345,  0.4895]]], grad_fn=<StackBackward>)
tensor([[[-0.7292,  0.1251,  0.2716]]], grad_fn=<StackBackward>)


## GRU

More in https://pytorch.org/docs/stable/nn.html#gru

In [15]:
gru = nn.GRU(5, 3)

In [16]:
# note that instead of (hidden, cell) there is only hidden
output, hidden = gru(x)

In [17]:
output

tensor([[[ 0.1094, -0.2984,  0.1891]],

        [[-0.2014,  0.1291,  0.2487]],

        [[-0.4000,  0.4376,  0.1978]],

        [[-0.3034,  0.5555, -0.0470]],

        [[-0.3015, -0.1367, -0.2690]],

        [[-0.1638,  0.3410, -0.5762]],

        [[-0.1888, -0.1596, -0.4027]],

        [[-0.7913, -0.0711, -0.6111]]], grad_fn=<StackBackward>)

In [18]:
hidden

tensor([[[-0.7913, -0.0711, -0.6111]]], grad_fn=<StackBackward>)

## Bidirectional LSTM

See also: [Understanding Bidirectional RNN in PyTorch](https://towardsdatascience.com/understanding-bidirectional-rnn-in-pytorch-5bd25a5dd66) by Cechine Lee

In [19]:
bilstm = nn.LSTM(5, 3, bidirectional=True)

In [20]:
output, (hidden, cell) = bilstm(x)

In [21]:
output.size()

torch.Size([8, 1, 6])

In [22]:
hidden.size()

torch.Size([2, 1, 3])

In [23]:
cell.size()

torch.Size([2, 1, 3])

In [24]:
output

tensor([[[-0.2669, -0.0221,  0.2699, -0.0720,  0.1682, -0.1084]],

        [[-0.1504,  0.0893,  0.1900, -0.3417, -0.0821, -0.0734]],

        [[-0.1624,  0.1026,  0.2249, -0.2898, -0.0608, -0.0698]],

        [[-0.1411,  0.0305,  0.0147, -0.2639, -0.1124,  0.0594]],

        [[-0.2125, -0.0408,  0.1421, -0.0245,  0.0406,  0.0628]],

        [[-0.0033,  0.3355,  0.3107, -0.0323, -0.0737, -0.1409]],

        [[-0.2040, -0.0687,  0.1029, -0.1220, -0.1299, -0.0108]],

        [[-0.0140, -0.3609,  0.1000, -0.1549, -0.1223, -0.0972]]],
       grad_fn=<CatBackward>)

In [25]:
hidden

tensor([[[-0.0140, -0.3609,  0.1000]],

        [[-0.0720,  0.1682, -0.1084]]], grad_fn=<StackBackward>)

## Many-layered LSTM

In [33]:
multilstm = nn.LSTM(5, 3, num_layers=2, bidirectional=True)

In [34]:
output, (hidden, cell) = multilstm(x)

In [35]:
output.size()

torch.Size([8, 1, 6])

In [36]:
hidden.size()

torch.Size([4, 1, 3])

In [37]:
cell.size()

torch.Size([4, 1, 3])

In [38]:
output

tensor([[[ 0.1479,  0.1397,  0.0706, -0.1380,  0.3392, -0.2939]],

        [[ 0.2466,  0.1548,  0.1389, -0.1487,  0.2985, -0.2765]],

        [[ 0.2980,  0.1424,  0.1756, -0.1356,  0.2838, -0.2714]],

        [[ 0.3256,  0.1475,  0.1853, -0.1240,  0.2768, -0.2555]],

        [[ 0.3314,  0.1064,  0.1445, -0.1045,  0.2236, -0.2474]],

        [[ 0.3407,  0.0577,  0.1794, -0.1044,  0.2029, -0.2230]],

        [[ 0.4067,  0.1002,  0.1415, -0.0825,  0.2006, -0.1969]],

        [[ 0.3843,  0.0553,  0.1478, -0.0613,  0.0966, -0.0805]]],
       grad_fn=<CatBackward>)

In [39]:
hidden

tensor([[[ 0.3624,  0.5248, -0.1750]],

        [[ 0.0545,  0.0592, -0.2942]],

        [[ 0.3843,  0.0553,  0.1478]],

        [[-0.1380,  0.3392, -0.2939]]], grad_fn=<StackBackward>)