# Long Short-Term Memory (LSTM)

*This notebook is created with assistance from [ChatGPT 4](https://openai.com/gpt-4).*

A good article about the LSTM model is [*Understanding LSTM Network*](https://colah.github.io/posts/2015-08-Understanding-LSTMs/) by Christopher Olah, a former OpenAI researcher.

<img src="https://raw.githubusercontent.com/Climbo-Dev/climbo-code-samples/main/images/LSTM3-chain.png" width="800">

In [58]:
# The code here is generated by ChatGPT 4
import torch
from torch import nn
import torch.nn.functional as F

class LSTMCell(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(LSTMCell, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size

        # Weights for x
        self.Wi = nn.Parameter(torch.Tensor(input_size, hidden_size))
        self.Wf = nn.Parameter(torch.Tensor(input_size, hidden_size))
        self.Wg = nn.Parameter(torch.Tensor(input_size, hidden_size))
        self.Wo = nn.Parameter(torch.Tensor(input_size, hidden_size))

        # Weights for h
        self.Ri = nn.Parameter(torch.Tensor(hidden_size, hidden_size))
        self.Rf = nn.Parameter(torch.Tensor(hidden_size, hidden_size))
        self.Rg = nn.Parameter(torch.Tensor(hidden_size, hidden_size))
        self.Ro = nn.Parameter(torch.Tensor(hidden_size, hidden_size))

        self.init_weights()
    
    def init_weights(self):
        for p in self.parameters():
            if p.data.ndimension() >= 2:
                nn.init.xavier_uniform_(p.data)
            else:
                nn.init.zeros_(p.data)

    def forward(self, x, init_states):
        batch_size = x.size(0)
        h_t, c_t = init_states

        i_t = torch.sigmoid(x @ self.Wi + h_t @ self.Ri)
        f_t = torch.sigmoid(x @ self.Wf + h_t @ self.Rf)
        g_t = torch.tanh(x @ self.Wg + h_t @ self.Rg)
        o_t = torch.sigmoid(x @ self.Wo + h_t @ self.Ro)
        
        c_next = f_t * c_t + i_t * g_t
        h_next = o_t * torch.tanh(c_next)

        return h_next, c_next


class LSTM(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(LSTM, self).__init__()
        self.hidden_size = hidden_size
        self.lstm_cell = LSTMCell(input_size, hidden_size)
        self.fc = nn.Linear(hidden_size, output_size)
        
    def forward(self, x):
        batch_size = x.size(0)
        sequence_length = x.size(1)

        h_t = torch.zeros(batch_size, self.hidden_size).to(x.device)
        c_t = torch.zeros(batch_size, self.hidden_size).to(x.device)

        for i in range(sequence_length):
            h_t, c_t = self.lstm_cell(x[:, i, :], (h_t, c_t))

        out = self.fc(h_t)

        return out

ChatGPT 4 suggested a few datasets that could be used for training an LSTM model:

- **IMDB Movie Reviews Dataset**: This dataset contains 50,000 movie reviews that are marked as either positive or negative. You can train an LSTM model for sentiment analysis on this dataset. This dataset is available in the keras.datasets module in Python.

- **Human Activity Recognition Using Smartphones Data Set**: This dataset contains data from the accelerometer and gyroscope of a smartphone, capturing human activities like WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING. You can find this dataset on the UCI Machine Learning Repository. An LSTM model could be used to recognize these activities based on the sensor data.

- **Air Passengers Data Set**: This dataset contains monthly totals of international airline passengers from 1949 to 1960. You can use an LSTM model to forecast the number of airline passengers. This dataset is available in the datasets module of the seaborn library in Python (seaborn.load_dataset("flights")).

- **Penn Tree Bank (PTB) Dataset**: This dataset is a large collection of sentences, widely used for training and benchmarking models for natural language processing tasks, such as language modeling. LSTMs are often used for such tasks because of their ability to capture long-term dependencies in the data.

- **Daily Minimum Temperatures in Melbourne**: This dataset describes the minimum daily temperatures over 10 years (1981-1990) in Melbourne, Australia. The units are in degrees Celsius and there are 3,650 observations. This dataset can be used for univariate time series forecasting.

## IMDB Movie Reviews - Sentiment Analysis

A good reference is [Getting started with NLP for absolute beginners](https://www.kaggle.com/code/jhoward/getting-started-with-nlp-for-absolute-beginners).

This notebook focuses on a minimum example on model training. The Exploratary Data Analysis (EDA) can be found in a sibling notebook.

### Load Data

In [2]:
from itertools import islice
import numpy as np
import pandas as pd
from fastai.data.external import fastai_cfg, URLs, untar_data

imdb_path = untar_data(URLs.IMDB)

### Using Native Python + PyTorch

In [3]:
from torch import tensor
from torch.nn.functional import binary_cross_entropy
from transformers import AutoTokenizer

tokz = AutoTokenizer.from_pretrained('distilbert-base-uncased')

#### Train a LSTM Model with Sample Data

In [59]:
sample = pd.DataFrame([
    dict(sentiment=1, text="This movie is one of the best"),
    dict(sentiment=0, text="This movie is among the worst"),
])
sample["tokens"] = sample.text.apply(lambda t: tokz(t, return_tensors='pt').input_ids.squeeze())

# Model Configuration
embedding_dim = 10
learning_rate = 0.01
args = pd.Series(dict(input_size=embedding_dim, hidden_size=5, output_size=1))

# Setup Model & Toolkit
embed = torch.nn.Embedding(len(tokz), embedding_dim)

model = LSTM(**args)

preds, targets = [], []
for _, r in sample.iterrows():
    x_embed = embed(r.tokens)[None,:]
    p = model(x_embed)
    preds.append(p)
    targets.append(r.sentiment)

probs = torch.sigmoid(torch.concat(preds).squeeze())
targets = torch.tensor(targets).to(torch.float32)
loss = binary_cross_entropy(probs, targets)
loss.backward()

with torch.no_grad():
    for param in model.parameters():
        param -= learning_rate * param.grad

#### Train a full model

In [60]:
# Load Data
from glob import glob
from itertools import product

sentiments, tokens_list = [], []

for s, folder in [(1, "pos"), (0, "neg")]:
    for f in (imdb_path / 'train' / folder).glob('*.txt'):
        with open(f, 'r') as f:
            text = f.read().strip()
            sentiments.append(s)
            tokens_list.append(tokz(text, return_tensors='pt').input_ids.squeeze())

df = pd.DataFrame(dict(sentiment = np.float_(sentiments), tokens = tokens_list))

In [48]:
valid_pct = 0.2
num_valid = int(df.shape[0] * valid_pct)

df = df.sample(frac=1).reset_index(drop=True)
valid = df.iloc[:num_valid, :].reset_index(drop=True)
train = df.iloc[num_valid:, :]

In [56]:
from tqdm import tqdm
bar_format='{l_bar}{bar:100}{r_bar}{bar:-100b}'

embedding_dim = 10
args = pd.Series(dict(input_size=embedding_dim, hidden_size=5, output_size=1))

# Setup Model & Toolkit
embed = torch.nn.Embedding(len(tokz), embedding_dim, dtype=torch.float16)
model = LSTM(**args).half()

def evaluate(data):
    preds, targets = [], []
    
    with torch.no_grad():
        for idx, r in tqdm(list(data.iterrows()), bar_format=bar_format):
            x_embed = embed(r.tokens)[None,:]
            p = model(x_embed)
            preds.append(p)
            targets.append(r.sentiment)
    
    probs = torch.sigmoid(torch.concat(preds).squeeze())
    targets = torch.tensor(targets).to(torch.float32)
    loss = binary_cross_entropy(probs, targets)
    return np.float_(loss), pd.DataFrame(dict(prob=probs.detach().numpy(), target=targets.detach().numpy()))
# evaluate(valid[:10])

def one_epoch(data, bs, learning_rate: float=0.01):
    num_train = data.shape[0]
    
    df = data.sample(frac=1).reset_index(drop=True)
    df['batch'] = np.repeat(range(num_train // bs + 1), bs)[:num_train]
    
    for _, sdf in tqdm(list(df.groupby("batch")), bar_format=bar_format):
        preds, targets = [], []
        for _, r in sdf.iterrows():
            x_embed = embed(r.tokens)[None,:]
            p = model(x_embed)
            preds.append(p)
            targets.append(r.sentiment)
        
        probs = torch.sigmoid(torch.concat(preds).squeeze())
        targets = torch.tensor(targets)
        loss = binary_cross_entropy(probs, targets)
        loss.backward()
        
        with torch.no_grad():
            for param in model.parameters():
                param -= learning_rate * param.grad

In [10]:
valid_loss, _ = evaluate(valid)
print(f"valid loss: {valid_loss:10.6f}")

100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 5000/5000 [01:45<00:00, 47.60it/s]                                                                                                                                                                               

valid loss:   0.704275





In [None]:
model.to("cuda")
for _ in range(5):
    one_epoch(train, bs=32, learning_rate=0.01)
    valid_loss, _ = evaluate(valid)
    print(f"valid loss: {valid_loss:10.6f}")


### Using Fast AI

In [451]:
from fastai.text.data import TextDataLoaders