## Parameters of `nn.LSTM` in PyTorch

- `input_size`: The number of expected features in the input `x`.
- `hidden_size`: Equivalent to `units` in TensorFlow, this is the number of features in the hidden state `h`.
- `num_layers`: Corresponds to the `layers` in TensorFlow, specifying the number of LSTM layers to stack.
- `batch_first`: When set to `True`, the input and output tensors are expected to be in the form of (batch, seq, feature).
- `dropout`: The dropout rate for regularization, applied to the outputs of each LSTM layer except the last one.
- `bidirectional`: If set to `True`, creates a bidirectional LSTM.

Keep in mind that PyTorch's `nn.LSTM` expects the input data type to be a floating-point tensor, typically `torch.float32`. Unlike TensorFlow, which defaults to `tf.float32`, you must ensure that your input data to the LSTM is of this type. If not, cast your data to `torch.float32` before passing it to the LSTM layer.

## forward() of `nn.LSTM()` in PyTorch 

- `input`: shape=batch,seq_len,intput_size
- `h_0`: Output of the last hidden state. If not provided initialized to 0; shape=D*num_layers, hidden_size; where D=2 if bi-directional ==True
- `c_0`: Output of the last output cell state; If not provided initialized to 0; shape=D*num_layers, hidden_size; where D=2 if bi-directional ==True



In [2]:
import torch 

import torch.nn as nn

In [3]:
device=('cuda' if torch.cuda.is_available() else "cpu")

In [4]:
device

'cuda'

In [6]:
## creating_input
batch_size=10
seq_len=17 ## num of tokens 
num_features=3 ## each token is embedded to three features 
output_feature_size=5

sample_input=torch.rand(size=(batch_size,seq_len,num_features),dtype=torch.float32)
print('dtype(sample_input)  :',sample_input.dtype)
print('samnple_input.shape :', sample_input.shape)

dtype(sample_input)  : torch.float32
samnple_input.shape : torch.Size([10, 17, 3])


In [7]:
lstm_layer=nn.LSTM(num_features,5,batch_first=True,num_layers=2) ## paramters: input_shape,hidden_size, batch_first

In [12]:
h_i=torch.zeros(size=(2,batch_size,5))
c_i=torch.zeros(size=(2,batch_size,5))

In [13]:
output,(h_o,c_o)=lstm_layer(sample_input,(h_i,c_i))

In [16]:
output.shape

torch.Size([10, 17, 5])

In [29]:
output[0].shape

torch.Size([10, 17, 5])

In [32]:
print('output.shape :',output[0].shape) ## returns batc
## Unlike Tensorflow, pytorch LSTM will always have 'return_sequence=True'

output.shape : torch.Size([10, 17, 5])


In [40]:
### calling two LSTM Layer
lstm_layer_1=nn.LSTM(num_features,5,batch_first=True)
lstm_layer_2=nn.LSTM(num_features,5,batch_first=True)
output1=lstm_layer_1(sample_input)
output2=lstm_layer_2(output1[0])

print(output2[0].shape)

torch.Size([10, 17, 5])


In [41]:
## if we are defining two LSTM with same parameters we can simply minimkize the above code as
lstm_layer_stacked=nn.LSTM(num_features,5,num_layers=2,batch_first=True)

In [42]:
output_from_stacked=lstm_layer_stacked(sample_input)

In [43]:
print(output_from_stacked[0].shape)

torch.Size([10, 17, 5])
