# LSTM Structure and Hidden State
A hidden state is a function of the pieces of data that an LSTM has seen over time; it contains some weights and, represents both the short term and long term memory components for the data that the LSTM has already seen. So, for an LSTM that is looking at words in a sentence, the hidden state of the LSTM will change based on each new word it sees. And, we can use the hidden state to predict the next word in a sequence

**LSTMs in pyTorch**

In PyTorch an LSTM can be defined as: lstm = nn.LSTM(input_size=input_dim, hidden_size=hidden_dim, num_layers=n_layers).
All the inputs should be 3D tensors, with dims as follows:


*   input_dim = the number of inputs (eg., a dimension of 20 could represent 20 inputs)
*   hidden_dim = the size of the hidden state; this will be the number of outputs that each LSTM cell produces at each time step.
*   n_layers = the number of hidden LSTM layers to use; this is typically a value between 1 and 3; a value of 1 means that each LSTM cell has one hidden state. This has a default value of 1.


**Hidden State**
Once an LSTM has been defined with input and hidden dimensions, we can call it and retrieve the output and hidden state at every time step. 

out, hidden = lstm(input.view(1, 1, -1), (h0, c0))

The inputs to an LSTM are (input, (h0, c0)):

* input = a Tensor containing the values in an input sequence; this has values: (seq_len, batch, input_size)
* h0 = a Tensor containing the initial hidden state for each element in a batch
* c0 = a Tensor containing the initial cell memory for each element in the batch

h0 nd c0 will default to 0, if they are not specified. Their dimensions are: (n_layers, batch, hidden_dim).

In [1]:
import torch
import torch.nn as nn
from matplotlib import pyplot as plt

%matplotlib inline

torch.manual_seed(2) # so that random variables will be consistent and repeatable for testing

<torch._C.Generator at 0x7f4de565c8f0>

In [4]:
# The hidden_dim and size of the output will be the same unless 
# you define your own LSTM and change the number of outputs by adding a linear layer at the end of the network, 
# ex. fc = nn.Linear(hidden_dim, output_dim).

from torch.autograd import Variable

input_dim = 4
hidden_dim = 3

lstm = nn.LSTM(input_size=input_dim, hidden_size=hidden_dim)

input_list = [torch.randn(1, input_dim) for _ in range(5)]
print("input_list:: ", input_list)

input_list::  [tensor([[ 0.7757,  0.9996, -0.2380, -1.7623]]), tensor([[0.4873, 1.4592, 1.4165, 1.0032]]), tensor([[-0.5644,  0.3819,  1.7595,  1.2146]]), tensor([[ 1.0031,  0.0828, -0.5953, -1.5689]]), tensor([[-1.7744, -1.2860, -0.4395,  1.0293]])]


In [7]:
# intialize the hidden state
h0 = torch.randn(1, 1, hidden_dim)  # (1 layer, 1 batch_size, 3 outputs)
c0 = torch.randn(1, 1 ,hidden_dim)  # cell memory

h0 = Variable(h0)
c0 = Variable(c0)

for i in input_list:
  i = Variable(i)
  out, hidden = lstm(i.view(1, 1, -1), (h0, c0))
  # the output and hidden Tensors are always of length 3, which we specified when we defined the LSTM with hidden_dim.
  print("\nout:: ",out)
  print("hidden:: ",hidden)


out::  tensor([[[-0.1500, -0.0961,  0.2719]]], grad_fn=<StackBackward>)
hidden::  (tensor([[[-0.1500, -0.0961,  0.2719]]], grad_fn=<StackBackward>), tensor([[[-0.3462, -0.1863,  0.8630]]], grad_fn=<StackBackward>))

out::  tensor([[[ 0.0745, -0.2756,  0.4170]]], grad_fn=<StackBackward>)
hidden::  (tensor([[[ 0.0745, -0.2756,  0.4170]]], grad_fn=<StackBackward>), tensor([[[ 0.2830, -0.5666,  0.8108]]], grad_fn=<StackBackward>))

out::  tensor([[[ 0.0724, -0.3120,  0.2302]]], grad_fn=<StackBackward>)
hidden::  (tensor([[[ 0.0724, -0.3120,  0.2302]]], grad_fn=<StackBackward>), tensor([[[ 0.5445, -0.6505,  0.3713]]], grad_fn=<StackBackward>))

out::  tensor([[[-0.0685,  0.0069,  0.3191]]], grad_fn=<StackBackward>)
hidden::  (tensor([[[-0.0685,  0.0069,  0.3191]]], grad_fn=<StackBackward>), tensor([[[-0.1566,  0.0116,  0.6756]]], grad_fn=<StackBackward>))

out::  tensor([[[ 0.1004, -0.1014,  0.2815]]], grad_fn=<StackBackward>)
hidden::  (tensor([[[ 0.1004, -0.1014,  0.2815]]], grad_fn=<Sta

A for loop is not very efficient for large sequences of data, so we can also, process all of these inputs at once.

1. concatenate all our input sequences into one big tensor, with a defined batch_size
2. define the shape of our hidden state
3. get the outputs and the most recent hidden state (created after the last word in the sequence has been seen)

In [10]:
# turn inputs into a tensor with 5 rows of data
# add the extra 2nd dimension (1) for batch_size

# torch.cat(tensors, dim=0, *, out=None) → Tensor :: Concatenates the given sequence of seq tensors in the given dimension. 
#   All tensors must either have the same shape (except in the concatenating dimension) or be empty.

# view(*shape) → Tensor :: Returns a new tensor with the same data as the self tensor but of a different shape.
# PyTorch allows a tensor to be a View of an existing tensor. 
#   View tensor shares the same underlying data with its base tensor. 
#     Supporting View avoids explicit data copy, 
#       thus allows us to do fast and memory efficient reshaping, slicing and element-wise operations.
# Since views share underlying data with its base tensor, if you edit the data in the view, 
#   it will be reflected in the base tensor as well.
inputs = torch.cat(input_list).view(len(input_list), 1, -1)

# print out our inputs and their shape
# you should see (number of sequences, batch size, input_dim)
print('inputs size: \n', inputs.size())
print('\n')

print('inputs: \n', inputs)
print('\n')

# initialize the hidden state
h0 = torch.randn(1, 1, hidden_dim)
c0 = torch.randn(1, 1, hidden_dim)

# wrap everything in Variable
inputs = Variable(inputs)
h0 = Variable(h0)
c0 = Variable(c0)
# get the outputs and hidden state
out, hidden = lstm(inputs, (h0, c0))

print('out: \n', out)
print('hidden: \n', hidden)

inputs size: 
 torch.Size([5, 1, 4])


inputs: 
 tensor([[[ 0.7757,  0.9996, -0.2380, -1.7623]],

        [[ 0.4873,  1.4592,  1.4165,  1.0032]],

        [[-0.5644,  0.3819,  1.7595,  1.2146]],

        [[ 1.0031,  0.0828, -0.5953, -1.5689]],

        [[-1.7744, -1.2860, -0.4395,  1.0293]]])


out: 
 tensor([[[-0.2600, -0.0558,  0.0722]],

        [[-0.0359, -0.1937,  0.0391]],

        [[ 0.0771, -0.2851, -0.3132]],

        [[-0.0210, -0.0216, -0.0668]],

        [[ 0.0898,  0.0230, -0.1873]]], grad_fn=<StackBackward>)
hidden: 
 (tensor([[[ 0.0898,  0.0230, -0.1873]]], grad_fn=<StackBackward>), tensor([[[ 0.2407,  0.0637, -0.2616]]], grad_fn=<StackBackward>))
