# Long Short Term Memory
LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is practically their default behavior, not something they struggle to learn.

The key to LSTMs is the cell state, the horizontal line running through the top of the diagram.
The cell state is kind of like a conveyor belt. It runs straight down the entire chain, with only some minor linear interactions. It’s very easy for information to just flow along it unchanged.

![alt text](https://stanford.edu/~shervine/images/lstm.png)

$$\tilde{c}^{< t >}=\textrm{tanh}(W_c[\Gamma_r\star a^{< t-1 >},x^{< t >}]+b_c)$$
$$c^{< t >}= \Gamma_u\star\tilde{c}^{< t >}+\Gamma_f\star c^{< t-1 >}$$
$$a^{< t >}=\Gamma_o\star c^{< t >}$$

The LSTM does have the ability to remove or add information to the cell state, carefully regulated by structures called gates.

**Gates** :
 A system of gating units that controls the ﬂow of information
* Update gate $\Gamma_u$--> How much past should matter now?
* Forget Gate $\Gamma_f$-->Erase a cell or not?
* Output gate $ \Gamma_o$--> How much to reveal of a cell?
* Reveleance gate  $ \Gamma_r$-->  Drop previous information?

[Understanding LSTMs](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)


Here we are going to make Many to one LSTM network

### Many to One

$$T_x>1, T_y=1$$
![alt text](https://stanford.edu/~shervine/images/rnn-many-to-one.png)



In [0]:
! pip3 install torch torchvision
# ! pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cu92/torch_nightly.html

In [0]:
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms

In [0]:
# Device configuration
device= torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [0]:
#Hyperparameters
sequence_length=28
input_size=28
hidden_size=128
num_layers=2
num_classes=10
batch_size=100
epochs=2
learning_rate=0.01

### MNIST dataset

In [12]:
# MNIST dataset
train_dataset = torchvision.datasets.MNIST(root='../../data/',
                                           train=True, 
                                           transform=transforms.ToTensor(),
                                           download=True)

test_dataset = torchvision.datasets.MNIST(root='../../data/',
                                          train=False, 
                                          transform=transforms.ToTensor())

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Processing...
Done!


In [0]:
# Data loader
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size, 
                                          shuffle=False)

## LSTM 
Many to One

In [0]:
class LSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(LSTM, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, num_classes)
    
    def forward(self, x):
        # Set initial hidden and cell states 
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device) 
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)
        
        # Forward propagate LSTM
        out, _ = self.lstm(x, (h0, c0))  # out: tensor of shape (batch_size, seq_length, hidden_size)
        
        # Decode the hidden state of the last time step
        out = self.fc(out[:, -1, :])
        return out

In [0]:

model = LSTM(input_size, hidden_size, num_layers, num_classes).to(device)


### Loss and optimizer

In [0]:
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

### Train Model

In [23]:
total_step = len(train_loader)
for epoch in range(epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.reshape(-1, sequence_length, input_size).to(device)
        labels = labels.to(device)
        
        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, epochs, i+1, total_step, loss.item()))

Epoch [1/2], Step [100/600], Loss: 0.2355
Epoch [1/2], Step [200/600], Loss: 0.1477
Epoch [1/2], Step [300/600], Loss: 0.2432
Epoch [1/2], Step [400/600], Loss: 0.1224
Epoch [1/2], Step [500/600], Loss: 0.0158
Epoch [1/2], Step [600/600], Loss: 0.1991
Epoch [2/2], Step [100/600], Loss: 0.2247
Epoch [2/2], Step [200/600], Loss: 0.0908
Epoch [2/2], Step [300/600], Loss: 0.0690
Epoch [2/2], Step [400/600], Loss: 0.0720
Epoch [2/2], Step [500/600], Loss: 0.1179
Epoch [2/2], Step [600/600], Loss: 0.1952


### Test and evaluate

In [24]:
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.reshape(-1, sequence_length, input_size).to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Test Accuracy of the model on the 10000 test images: {} %'.format(100 * correct / total)) 

# Save the model checkpoint
torch.save(model.state_dict(), 'model.ckpt')

Test Accuracy of the model on the 10000 test images: 97.68 %
