# Working with Sequential Models

## Articles used for this Tutorial

- [Official PyTorch Tutorial](https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html)
- [Official PyTorch Documentation](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html)
- [LSTMs In PyTorch](https://towardsdatascience.com/lstms-in-pytorch-528b0440244)
- [Time Series Prediction using LSTM with PyTorch in Python](https://stackabuse.com/time-series-prediction-using-lstm-with-pytorch-in-python/)
- [https://blog.floydhub.com/long-short-term-memory-from-zero-to-hero-with-pytorch/]()
- [LSTM Reference Card](https://www.gregcondit.com/articles/lstm-ref-card)

## Importing the Libraries

In [112]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

## Experimenting with an LSTM Cell

In [113]:
lstm = nn.LSTM(3,3) # Input dim is 3, output dim is 3
lstm.eval()

LSTM(3, 3)

In [114]:
# Making a Sequence on length 5
inputs = [torch.randn(1, 3) for _ in range(5)]

In [115]:
# Initializing the hidden states
hidden = (torch.zeros(1, 1, 3),
          torch.zeros(1, 1, 3))

Before getting to the example, note a few things. Pytorch’s LSTM expects all of its inputs to be 3D tensors. The semantics of the axes of these tensors is important. The first axis is the sequence itself, the second indexes instances in the mini-batch, and the third indexes elements of the input. We haven’t discussed mini-batching, so let’s just ignore that and assume we will always have just 1 dimension on the second axis. If we want to run the sequence model over the sentence “The cow jumped”, our input should look like

In [116]:
print(inputs[0].size())
print(inputs[0].view(1, 1, -1).size())

torch.Size([1, 3])
torch.Size([1, 1, 3])


In [117]:
i = 0
for inp in inputs:
    out, hidden = lstm(inp.view(1, 1, -1), hidden)
    print('Iteration : {}'.format(i+1))
    print('Input Shape : {}'.format(inp.size()))
    print('H Shape : {}'.format(hidden[0].size()))
    print('C Shape: {}'.format(hidden[1].size()))
    print('Out Shape : {}'.format(out.size()))
    print('Input : {}'.format(inp))
    print('H : {}'.format(hidden[0]))
    print('C : {}'.format(hidden[1]))
    print('Out : {}'.format(out))
    print('---------------------------------------')
    i = i + 1 

Iteration : 1
Input Shape : torch.Size([1, 3])
H Shape : torch.Size([1, 1, 3])
C Shape: torch.Size([1, 1, 3])
Out Shape : torch.Size([1, 1, 3])
Input : tensor([[ 1.0202,  2.0414, -0.8369]])
H : tensor([[[-0.0651,  0.1030,  0.0016]]], grad_fn=<StackBackward>)
C : tensor([[[-0.1387,  0.2331,  0.0077]]], grad_fn=<StackBackward>)
Out : tensor([[[-0.0651,  0.1030,  0.0016]]], grad_fn=<StackBackward>)
---------------------------------------
Iteration : 2
Input Shape : torch.Size([1, 3])
H Shape : torch.Size([1, 1, 3])
C Shape: torch.Size([1, 1, 3])
Out Shape : torch.Size([1, 1, 3])
Input : tensor([[-1.8159,  0.2232, -0.2229]])
H : tensor([[[-0.1223,  0.0708,  0.0736]]], grad_fn=<StackBackward>)
C : tensor([[[-0.4496,  0.2071,  0.1689]]], grad_fn=<StackBackward>)
Out : tensor([[[-0.1223,  0.0708,  0.0736]]], grad_fn=<StackBackward>)
---------------------------------------
Iteration : 3
Input Shape : torch.Size([1, 3])
H Shape : torch.Size([1, 1, 3])
C Shape: torch.Size([1, 1, 3])
Out Shape : 

In [118]:
# alternatively, we can do the entire sequence all at once.
# the first value returned by LSTM is all of the hidden states throughout
# the sequence. the second is just the most recent hidden state
# (compare the last slice of "out" with "hidden" below, they are the same)
# The reason for this is that:
# "out" will give you access to all hidden states in the sequence
# "hidden" will allow you to continue the sequence and backpropagate,
# by passing it as an argument  to the lstm at a later time
# Add the extra 2nd dimension
inputs = torch.cat(inputs).view(len(inputs), 1, -1)
print(inputs.size())

torch.Size([5, 1, 3])


In [119]:
print('Inputs')
print(inputs)
print('-----------')
hidden = (torch.zeros(1, 1, 3), torch.zeros(1, 1, 3))  # clean out hidden state
out, hidden = lstm(inputs, hidden)
print('Output:')
print(out.size())
print(out)
print('-----------')
print('Hidden')
print(hidden)

Inputs
tensor([[[ 1.0202,  2.0414, -0.8369]],

        [[-1.8159,  0.2232, -0.2229]],

        [[ 1.8058, -0.7358,  0.3719]],

        [[-0.7690,  0.3371, -1.1757]],

        [[ 0.4484,  1.6172,  1.2343]]])
-----------
Output:
torch.Size([5, 1, 3])
tensor([[[-0.0651,  0.1030,  0.0016]],

        [[-0.1223,  0.0708,  0.0736]],

        [[-0.0501,  0.0941,  0.0378]],

        [[-0.0660,  0.2091,  0.1333]],

        [[-0.2193,  0.1993,  0.0046]]], grad_fn=<StackBackward>)
-----------
Hidden
(tensor([[[-0.2193,  0.1993,  0.0046]]], grad_fn=<StackBackward>), tensor([[[-0.4893,  0.8654,  0.0258]]], grad_fn=<StackBackward>))


## Creating a Full LSTM CELL

In [131]:
class LSTM(nn.Module):
    
    def __init__(self, input_size = 1, hidden_layer_size = 100, output_size = 1):
        super(LSTM, self).__init__()
        self.hidden_layer_size = hidden_layer_size
        self.lstm = nn.LSTM(input_size, hidden_layer_size)
        self.linear = nn.Linear(hidden_layer_size, output_size)
        self.hidden_cell = (torch.zeros(1, 1, self.hidden_layer_size ),
                            torch.zeros(1, 1, self.hidden_layer_size ))
        
    def forward(self, input_seq):
        lstm_out, self.hidden_cell = self.lstm(input_seq.view(len(input_seq), 1, -1), self.hidden_cell)
        print(lstm_out.size())
        print(len(input_seq))
        print(lstm_out.view(len(input_seq), -1).size())
        predictions = self.linear(lstm_out.view(len(input_seq), -1))
        return predictions

In [132]:
model = LSTM()
model.eval()

LSTM(
  (lstm): LSTM(1, 100)
  (linear): Linear(in_features=100, out_features=1, bias=True)
)

In [133]:
model_input = torch.randn((10, 1, 1))

In [134]:
output = model(model_input)

torch.Size([10, 1, 100])
10
torch.Size([10, 100])


In [135]:
print(output.size())
print(output)

torch.Size([10, 1])
tensor([[-0.0140],
        [-0.0017],
        [-0.0080],
        [-0.0166],
        [-0.0062],
        [-0.0129],
        [-0.0073],
        [-0.0129],
        [-0.0080],
        [-0.0017]], grad_fn=<AddmmBackward>)


## Checking the Gradient Computations

In [136]:
def loss(y1, y2):
    return torch.sum((y1-y2)*(y1-y2).t())

In [137]:
y1 = torch.rand(10, 1)

In [138]:
loss1 = loss(y1, output)
print(loss1.size())
print(loss1)

torch.Size([])
tensor(44.6129, grad_fn=<SumBackward0>)


In [139]:
loss1.backward()

In [140]:
i = 0
for w in model.parameters():
    print(i+1)
    print(w.size())
    i += 1
    print('-------------------------------')

1
torch.Size([400, 1])
-------------------------------
2
torch.Size([400, 100])
-------------------------------
3
torch.Size([400])
-------------------------------
4
torch.Size([400])
-------------------------------
5
torch.Size([1, 100])
-------------------------------
6
torch.Size([1])
-------------------------------


In [141]:
model.parameters()

<generator object Module.parameters at 0x0000029856324CF0>

In [142]:
print(model.lstm.weight_ih_l0.size())
print(model.lstm.weight_hh_l0.size())
print(model.lstm.bias_ih_l0.size())
print(model.lstm.bias_hh_l0.size())

torch.Size([400, 1])
torch.Size([400, 100])
torch.Size([400])
torch.Size([400])
