# Long Short-Term Memory (LSTM) network

Are a type of recurrent neural network (RNN) architecture designed to address the vanishing gradient problem in traditional RNNs. 
LSTMs were introduced by Hochreiter & Schmidhuber in 1997 and have been widely used for sequence modeling tasks such as language modeling, speech recognition, and time series prediction.

Key features of LSTMs:

- Memory cell: A special unit that can maintain information over long periods.
- Gates: Mechanisms to control the flow of information (input gate, forget gate, output gate).
- Ability to learn long-term dependencies without suffering from vanishing/exploding gradients.

In [1]:
%pip install torch

import torch
import torch.nn as nn
import torch.optim as optim

We define a SimpleLSTM class that includes:

- An LSTM layer
- A fully connected (linear) layer for output

In the forward pass, we initialize the hidden state and cell state with zeros, then pass the input through the LSTM and final linear layer.

In [None]:
class SimpleLSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size):
        super(SimpleLSTM, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        # Initialize hidden state with zeros
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
        # Initialize cell state
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
        
        # We need to detach as we are doing truncated backpropagation through time (BPTT)
        # If we don't, we'll backprop all the way to the start even after going through another batch
        out, _ = self.lstm(x, (h0.detach(), c0.detach()))
        
        # Index hidden state of last time step
        out = self.fc(out[:, -1, :])
        return out

We set up hyperparameters for the model.

In [None]:
input_size = 10
hidden_size = 20
num_layers = 2
output_size = 1
num_epochs = 100
learning_rate = 0.01

We instantiate the model, define a loss function (Mean Squared Error), and set up an optimizer (Adam).

In [None]:
model = SimpleLSTM(input_size, hidden_size, num_layers, output_size)

# Loss and optimizer
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

We generate dummy data for training.

In [None]:
seq_length = 20
batch_size = 32

# Training loop

We implement a training loop that runs for a specified number of epochs. In each epoch:

- We generate new random input data
- Perform a forward pass
- Calculate the loss
- Perform backpropagation and update the model parameters

In [None]:
# Training loop
for epoch in range(num_epochs):
    # Generate a new batch of data for each epoch
    x = torch.randn(batch_size, seq_length, input_size)
    y = torch.randn(batch_size, output_size)
    
    # Forward pass
    outputs = model(x)
    loss = criterion(outputs, y)
    
    # Backward and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    if (epoch+1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

# Inference 

Finally, we test the trained model with a single random input sequence.

In [None]:
model.eval()
with torch.no_grad():
    test_input = torch.randn(1, seq_length, input_size)
    predicted = model(test_input)
    print(f"Predicted output for test input: {predicted.item():.4f}")