### Sequence Models & Long Short-Term Memory Networks

For NLP, we often need models that maintain state or dependency information over time. Plain feed-forward networks do not have this capability, so we use recurrent neural networks. <br><br>

We feed outputs back into the network as inputs, so that information can propagate. For an LSTM, each element has a corresponding hidden state that can contain information from earlier points. This hidden state is used for prediction.

PyTorch's LSTM expects all inputs to be 3D tensors. The first axis is the sequence, the second indexes instances in a mini-batch, and the third indexes elements of the input.

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.manual_seed(1)

<torch._C.Generator at 0x7f0e341b26b0>

In [2]:
# LSTM takes two args; input dim, output dim
lstm = nn.LSTM(3,3)
inputs = [torch.randn(1,3) for _ in range(5)]

In [6]:
hidden = (torch.randn(1,1,3),
         torch.randn(1,1,3))

In [11]:
for i in inputs:
    out, hidden = lstm(i.view(1, 1, -1), hidden)

In [None]:
inputs = torch.cat(inputs).view(len(inputs), 1, -1)
hidden = torch.randn()