## Long Short Term Memory Nets in Pytorch

We have seen various feed-forward networks. That is, there is no state maintained by the network at all. This might not be the behavior we want. Sequence models are central to NLP: they are models where there is some sort of dependence through time between your inputs. The classical example of a sequence model is the Hidden Markov Model for part-of-speech tagging. Another example is the conditional random field.

A recurrent neural network is a network that maintains some kind of state. For example, its output could be used as part of the next input, so that information can propogate along as the network passes over the sequence. In the case of an LSTM, for each element in the sequence, there is a corresponding hidden state `ht`, which in principle can contain information from arbitrary points earlier in the sequence. We can use the hidden state to predict words in a language model, part-of-speech tags, and a myriad of other things.

**_Before getting to the example, note a few things. Pytorch’s LSTM expects all of its inputs to be 3D tensors. The semantics of the axes of these tensors is important. The first axis is the sequence itself, the second indexes instances in the mini-batch, and the third indexes elements of the input. _**

An example:


In [4]:
import torch
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
 
    
torch.manual_seed(1)

<torch._C.Generator at 0x7f2317ff2590>

In [13]:
lstm = nn.LSTM(3,3)
inputs = [ autograd.Variable(torch.rand(1, 3)) for _ in
        range(5)] # 5 instances of 3-number sequence
hidden = (autograd.Variable(torch.randn(1, 1, 3)),   # initialize_hidden
          autograd.Variable(torch.randn((1, 1, 3))))

for i in inputs:
    out, hidden = lstm(i.view(1, 1, -1), hidden) # every-time we feed in new input vector and 
                                                 # and previous history state

inputs = torch.cat(inputs).view(len(inputs), 1, -1)
hidden = ( autograd.Variable(torch.randn(1, 1, 3)), 
           autograd.Variable(torch.randn(1, 1, 3)) )
out, hidden = lstm(inputs, hidden)
print("OUT", out)
print("HIDDEN", hidden)


('OUT', Variable containing:
(0 ,.,.) = 
 -0.4696  0.0255  0.2446

(1 ,.,.) = 
 -0.4593  0.2174  0.1853

(2 ,.,.) = 
 -0.4381  0.2919  0.2162

(3 ,.,.) = 
 -0.5070  0.2419  0.1992

(4 ,.,.) = 
 -0.4843  0.1492  0.1986
[torch.FloatTensor of size 5x1x3]
)
('HIDDEN', (Variable containing:
(0 ,.,.) = 
 -0.4843  0.1492  0.1986
[torch.FloatTensor of size 1x1x3]
, Variable containing:
(0 ,.,.) = 
 -0.8128  0.2265  0.2971
[torch.FloatTensor of size 1x1x3]
))
