# Documentation tutorial

## Simple LSTM example
#### Resource: https://www.udacity.com/course/computer-vision-nanodegree--nd891

In [365]:
import torch
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt

%matplotlib inline

# so that random variables will be consistent and repeatable for testing
torch.manual_seed(2)

<torch._C.Generator at 0x102add250>

In [366]:
# define an LSTM with an input dim of 4 and hidden dim of 3
# this expects to see 4 values as input and generates 3 values as output
input_dim = 4
hidden_dim = 3
lstm = nn.LSTM(input_size=input_dim, hidden_size=hidden_dim)  

# make 5 input sequences of 4 random values each
inputs_list = [torch.randn(1, input_dim) for _ in range(5)]
print('inputs: \n', inputs_list)
print('\n')

# initialize the hidden state
# (1 layer, 1 batch_size, 3 outputs)
# first tensor is the hidden state, h0
# second tensor initializes the cell memory, c0
h0 = torch.randn(1, 1, hidden_dim)
c0 = torch.randn(1, 1, hidden_dim)


#h0 = Variable(h0)
#c0 = Variable(c0)

print('Initial hidden state', h0)
print('Initial cell memory', c0)
print()
# step through the sequence one element at a time.
for i in inputs_list:
    # wrap in Variable 
    #i = Variable(i)
    
    # after each step, hidden contains the hidden state
    out, hidden = lstm(i.view(1, 1, -1), (h0, c0))
    print('out: \n', out)
    print('hidden: \n', hidden)


inputs: 
 [tensor([[1.4934, 0.4987, 0.2319, 1.1746]]), tensor([[-1.3967,  0.8998,  1.0956, -0.5231]]), tensor([[-0.8462, -0.9946,  0.6311,  0.5327]]), tensor([[-0.8454,  0.9406, -2.1224,  0.0233]]), tensor([[ 0.4836,  1.2895,  0.8957, -0.2465]])]


Initial hidden state tensor([[[ 1.9422, -0.3628, -1.0494]]])
Initial cell memory tensor([[[-1.0264,  1.3494,  0.8018]]])

out: 
 tensor([[[-0.4372,  0.2583,  0.2947]]], grad_fn=<StackBackward>)
hidden: 
 (tensor([[[-0.4372,  0.2583,  0.2947]]], grad_fn=<StackBackward>), tensor([[[-0.7344,  0.6209,  0.4191]]], grad_fn=<StackBackward>))
out: 
 tensor([[[-0.2836,  0.1314,  0.4133]]], grad_fn=<StackBackward>)
hidden: 
 (tensor([[[-0.2836,  0.1314,  0.4133]]], grad_fn=<StackBackward>), tensor([[[-0.5041,  0.2672,  0.6370]]], grad_fn=<StackBackward>))
out: 
 tensor([[[-0.3404,  0.4880,  0.1949]]], grad_fn=<StackBackward>)
hidden: 
 (tensor([[[-0.3404,  0.4880,  0.1949]]], grad_fn=<StackBackward>), tensor([[[-0.5552,  0.7909,  0.3300]]], grad_fn=<S

In the example above we have `5` sequences, each of size `4`. Per **each sequence** we produce `4` outputs, each size of 3 (because `hidden_dim` is 3).

## Guide to LSTM in Pytorch - define the shape

**Resource**: https://pytorch.org/docs/stable/_modules/torch/nn/modules/rnn.html#LSTM

**Step first**:<br>

Recall, how `nn.LSTM` module is defined for a simple case:<br>
- We have a sequence (whole training dataset as a time-series), each sample is `4` elements.<br>
- Features = `hidden_dim` (`hidden_dim = 3`). Output generates `3` values. <br>
- Input has a shape of `(seq_len, batch, input_size)`. `seq_len` - number of timesteps or simply saying, how many elements we have in one sequence or `t` (`T_x`). <br>
`batch` - do we feed the whole dataset at once (`m=1`) or break it into mini-batches (`m` is any number).<br>
`input_size` - the shape of element at time `t`. 

**Step second**:

Next we initialize hidden state (which is STM or output) `h0` and cell state (which is LTM) `c0`.<br>
We randomply initialize them (`torch.randn`) with the size of h0 and c0 to be **(1,1 `hidden_dim`)**.<br>


Using `Variable` is an optional step for the last Pytorch versions.

```
h0 = torch.randn(1, 1, hidden_dim)
c0 = torch.randn(1, 1, hidden_dim)
```

> **What's the shape of `h0`and `c0`?** <br> <br>
 It's `(num_layers * num_directions, batch, hidden_size)` for both `h0`and `c0`.<br>
 In our case: `num_layers` and `num_directions` is 1 (LSTM is uniderectional). We feed inputs one by one, therefore `batch_size` = 1. The size of hidden state is `hidden_size` = `hidden_dim` = 3.

**Third step**:<br>

For all elements in taining data sequence, we feed them one by one to `nn.LSTM`. <br>

## Arguments, Inputs, Outputs
#### Additional source to read: https://towardsdatascience.com/pytorch-basics-how-to-train-your-neural-net-intro-to-rnn-cb6ebc594677

Note on **arguments** and **inputs** of `nn.LSTM`.<br>
**Arguments**: 

> - `input_size` (in our case it is `input_dim = 4`). In some other sources (e.g. Andrew Ng Deep learning course) it's named `n_x`.
- `hidden_size` - number of features in hidden state (which is output or STM). In our case it is `hidden_dim = 3`. The output will be of this size;
- `num_layers`. In our case it's 1 by default.  
This is number of recurrent layers. E.g., setting ``num_layers=2`` would mean stacking two LSTMs together to form a `stacked LSTM`, with the second LSTM taking in outputs of the first LSTM and computing the final results. Default: 1
- `bias` - Default: ``True``.
- `batch_first`: If ``True``, then the input and output tensors are provided as `(batch, seq, feature)`. Default: ``False``
- `dropout` - probability of dropout. 
- `bidirectional`: If ``True``, becomes a bidirectional LSTM. Default: ``False``

**Inputs**: input, `(h_0, c_0)`

> - `(h_0, c_0)` - initialized tuple of hidden state STM and cell state (LTM).<br><br>
- **input** of shape `(seq_len, batch, input_size)`: tensor containing the features of the input sequence, where `seq_len` = `t` (number of timesteps). In Andrew Ng course `seq_len` is notated `T_x`. <br><br>
In our case this is `i.view(1, 1, -1)`, where `i` is an input element in sequence. <br>
Note, that if we specified `batch_first` = ``True``, our input will be of size `(batch, seq, feature)`. In Andrew Ng course `batch` is called `m`. `batch` is a batch size and called `B` sometimes. <br><br>
In case of **input**, `feature` is `input_size`. Correspondingly, for **output** `feature` is `num_directions * hidden_size`, where `hidden_size` = `hidden_dim`.
- `h0` - initialized hidden state of shape `(num_layers * num_directions, batch, hidden_size)`. Tensor containing the initial hidden state for each element in the batch. In Andrew Ng course `hidden_size` is `n_a`.<br><br>
If LSTM is biderectional, `num_directions` is 2. Otherwise it's 1.
- `c0` - initialized cell state of shape `(num_layers * num_directions, batch, hidden_size)`. Tensor
containing the initial cell state for each element in the batch.<br><br>
*Remark: If `(h_0, c_0)` is not provided, both `h_0` and `c_0` default to zero.*

**Output**: output, (h_n, c_n)

> - `(h_n, c_n)` - tuple of hidden state STM and cell state LTM, outputed by n-th layer.<br><br>
- **output** of shape `(seq_len, batch, num_directions * hidden_size)`: tensor with output from **all timestemps** on **last layer** of network. `seq_len` is timestamps or `T_x` (in some sources).<br><br>
- `h_n` - initialized hidden state of shape `(num_layers * num_directions, batch, hidden_size)`: tensor with output containing the hidden state from **last timestep**  on **all layers**. `seq_len` is timestamps or `T_x` (in some sources). <br><br>
If LSTM is biderectional, `num_directions` is 2. Otherwise it's 1.
- `c_0` - initialized cell state of shape `(num_layers * num_directions, batch, hidden_size)`. Tensor  containing the cell state  for `t = seq_len`.<br><br>
*Remark 1: if we have multiple layers, the output `h_n` and cell state `c_n` can be separated using `h_n.view(num_layers, num_directions, batch, hidden_size)` or `c_n.view(num_layers, num_directions, batch, hidden_size)`*

## Use LSTM with batching

**Resources**:  <br>
1. https://towardsdatascience.com/pytorch-basics-how-to-train-your-neural-net-intro-to-rnn-cb6ebc594677
2. https://towardsdatascience.com/all-you-need-to-know-about-rnns-e514f0b00c7c

Using batch with specified `batch_size` or `m`, we break whole dataset into `m` number of sequnces of specified length ``seq_len``. ``seq_len`` corresponds to the number of time periods ``t``. <br>
Important point here is that LSTM produces the output for each element in sequence ``seq_len`` per each batch.

Consider following example. We want to feed 50 elements into lstm, breaking them into 5 sequences and produce the output. 

In [369]:
input_tensors = [torch.randn(10) for _ in range(5)]

In [370]:
print(input_tensors[0])

tensor([-0.2380, -1.7623,  0.4873,  1.4592,  1.4165,  1.0032, -0.5644,  0.3819,
         1.7595,  1.2146])


### Approach 1: ``m = 1``

We start with `batch_size = 1` and will be feeding sequence by sequence, generating the output at each timestep.  

In [371]:
class LSTM(nn.Module):
    def __init__(self, input_size, hidden_dim, batch_size):
        super(LSTM, self).__init__()
        # define size of input, hidden units and batch size
        self.hidden_dim = hidden_dim
        self.batch_size = batch_size
        # define LSTM model
        self.lstm = nn.LSTM(input_size, hidden_dim)
        # initialize hidden state
        self.hidden = self.init_hidden()
        
    def init_hidden(self):
        h0 = torch.zeros(1, batch_size, hidden_dim)
        c0 = torch.zeros(1, batch_size, hidden_dim)
        return (h0, c0)
    
    def forward(self, sequence):
        # input should be of size (sequence_length, batch_size, input_size)
        output, self.hidden = self.lstm(sequence.view(len(sequence),batch_size, -1), 
                                        self.hidden)
        
        return output, self.hidden

In [372]:
input_size = 1 # input size is 1: we will feed item of size 1 per timestamp
hidden_dim = 5 
batch_size = 1

In [373]:
# instantiate the model
model = LSTM(input_size, hidden_dim, batch_size)

Next, prepare the sequence of data.

In [374]:
print('Number of sequences:',len(input_tensors))
print('The length of one sequence:',len(input_tensors[0]))

Number of sequences: 5
The length of one sequence: 10


In [375]:
for sequence in input_tensors:
    # after each step, hidden contains the hidden state
    out, hidden = model(sequence)
    print('out: \n', out)
    print('hidden: \n', hidden)

out: 
 tensor([[[-1.6173e-02,  2.1318e-02,  8.9537e-03,  1.3004e-01, -1.3322e-01]],

        [[-2.3461e-01, -3.0454e-04,  4.3860e-02,  1.2459e-01, -1.0315e-01]],

        [[-4.5394e-03,  4.6846e-02,  4.3741e-02,  1.6021e-01, -1.6202e-01]],

        [[ 9.2667e-02,  8.7011e-02, -2.3487e-02,  1.7278e-01, -1.9596e-01]],

        [[ 1.3717e-01,  1.0959e-01, -5.3626e-02,  2.0487e-01, -2.3011e-01]],

        [[ 1.6384e-01,  1.1752e-01, -4.0300e-02,  2.4500e-01, -2.4689e-01]],

        [[ 9.9148e-02,  1.0298e-01,  1.6554e-02,  3.1784e-01, -2.1384e-01]],

        [[ 1.0508e-01,  9.0466e-02,  2.5842e-02,  2.7210e-01, -1.7975e-01]],

        [[ 1.2585e-01,  1.0373e-01, -4.9865e-02,  2.0447e-01, -1.8706e-01]],

        [[ 1.6040e-01,  1.1981e-01, -5.2092e-02,  2.4728e-01, -2.3765e-01]]],
       grad_fn=<StackBackward>)
hidden: 
 (tensor([[[ 0.1604,  0.1198, -0.0521,  0.2473, -0.2376]]],
       grad_fn=<StackBackward>), tensor([[[ 0.5049,  0.4117, -0.1259,  0.9625, -0.6305]]],
       grad_fn=<Stack

In [376]:
print('The shape of output from a single sequence for m = 1:', tuple(out.shape))

The shape of output from a single sequence for m = 1: (10, 1, 5)


In [377]:
out.view(10, -1)

tensor([[ 0.0234,  0.0610,  0.0648,  0.2846, -0.1588],
        [ 0.0550,  0.0663,  0.0585,  0.2735, -0.1675],
        [ 0.1093,  0.0833,  0.0232,  0.2462, -0.1857],
        [ 0.1153,  0.0870,  0.0269,  0.2906, -0.2033],
        [ 0.0230,  0.0703,  0.0515,  0.3141, -0.1640],
        [ 0.0732,  0.0746,  0.0473,  0.2658, -0.1650],
        [ 0.0886,  0.0780,  0.0406,  0.2816, -0.1826],
        [ 0.0481,  0.0702,  0.0525,  0.3028, -0.1704],
        [-0.0506,  0.0524,  0.0692,  0.2981, -0.1373],
        [-0.1329,  0.0340,  0.0840,  0.2673, -0.1152]], grad_fn=<ViewBackward>)

In [378]:
out.view(10,1, -1)[-1].shape

torch.Size([1, 5])

### Approach 2: mini-batching  ``m = 5``


Let's change the batch size and check the output of model with `m = 5`. 

In [379]:
input_size = 1 # input size is 1: we will feed item of size 1 per timestamp
hidden_dim = 5 
batch_size = 1
seq_len = 10

In [380]:
class LSTM_batch(nn.Module):
    def __init__(self, input_size, hidden_dim, batch_size, seq_len):
        super(LSTM_batch, self).__init__()
        # define size of input, hidden units and batch size
        self.hidden_dim = hidden_dim
        self.batch_size = batch_size
        self.num_layers = num_layers
        self.seq_len = seq_len
        # define LSTM model
        self.lstm = nn.LSTM(input_size, hidden_dim, batch_first = True)
        # initialize hidden state
        self.hidden = self.init_hidden()
        
    def init_hidden(self):
        h0 = torch.zeros(1, batch_size, hidden_dim)
        c0 = torch.zeros(1, batch_size, hidden_dim)
        return (h0, c0)
    
    def forward(self, sequence):
        seq_len = int(len(sequence)/batch_size)
        # input should be of size (sequence_length, batch_size, input_size)
        output, self.hidden = self.lstm(sequence.view(batch_size, seq_len, -1), 
                                        self.hidden)
        return output, self.hidden

In [381]:
# instantiate the model
model_batch = LSTM_batch(input_size, hidden_dim, batch_size, seq_len)

In [333]:
for sequence in input_tensors:
    # after each step, hidden contains the hidden state
    out, hidden = model_batch(sequence)
    print('out: \n', out)
    print('hidden: \n', hidden)

out: 
 tensor([[[-0.0074, -0.0056,  0.0566,  0.1231,  0.0049],
         [ 0.0064, -0.0015,  0.0713,  0.1677, -0.0076],
         [-0.0181,  0.0030,  0.1053,  0.2300, -0.0222],
         [-0.0247,  0.0065,  0.1203,  0.2518, -0.0426],
         [-0.0987,  0.0076,  0.1979,  0.4096, -0.0521],
         [-0.1307,  0.0129,  0.2494,  0.4450, -0.0879],
         [-0.0488,  0.0133,  0.1891,  0.2970, -0.1414],
         [-0.0234,  0.0080,  0.1256,  0.2759, -0.1338],
         [ 0.0525,  0.0050,  0.0445,  0.1847, -0.1274],
         [-0.0316,  0.0010,  0.1157,  0.2979, -0.0894]]],
       grad_fn=<TransposeBackward0>)
hidden: 
 (tensor([[[-0.0316,  0.0010,  0.1157,  0.2979, -0.0894]]],
       grad_fn=<StackBackward>), tensor([[[-0.0809,  0.0014,  0.2184,  0.7585, -0.2072]]],
       grad_fn=<StackBackward>))
out: 
 tensor([[[ 1.1231e-01,  3.3039e-03, -4.5702e-03,  1.1615e-01, -1.0374e-01],
         [ 1.4703e-03,  1.7051e-03,  8.5396e-02,  2.4820e-01, -5.9641e-02],
         [-4.2029e-02,  4.8467e-03,  1.373

In [334]:
print('The shape of output from a single sequence for m = 5:', tuple(out.shape))

The shape of output from a single sequence for m = 5: (1, 10, 5)


### Aditional notes about input/output shape


**input**<br>

The input can also be a **packed variable length sequence**.<br>
See functions: `torch.nn.utils.rnn.pack_padded_sequence` or `torch.nn.utils.rnn.pack_sequence` for details.<br>

*Reference: 
https://github.com/HarshTrivedi/packing-unpacking-pytorch-minimal-tutorial
https://pytorch.org/docs/master/generated/torch.nn.utils.rnn.pack_padded_sequence.html*