# 1. Introduction to LSTMs
## What are LSTMs?

Long Short-Term Memory (LSTM) networks are a type of recurrent neural network (RNN) capable of learning long-term dependencies. They were introduced to address the vanishing gradient problem that can occur when training traditional RNNs.

**Key Points:**
- LSTMs are a special kind of RNN, capable of learning long-term dependencies.
- They are widely used for sequential data, such as time series, speech, text, etc.


**Applications of LSTMs:**
- Time series prediction
- Natural language processing
- Speech recognition
- Anomaly detection


# 2. Understanding LSTM Architecture

LSTM Cell Structure
The LSTM architecture consists of a memory cell, an input gate, a forget gate, and an output gate. These gates control the flow of information, protecting the network from vanishing or exploding gradient issues.

1. Forget Gate: Decides what information to discard from the cell state.
2. Input Gate: Decides which new information to add to the cell state.
3. Output Gate: Decides what the next hidden state should be.

https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-chain.png

In [None]:
from IPython.display import Image
Image(url='https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-chain.png')


LSTMs improve upon standard RNNs by introducing a more complex structure composed of memory cells and gating mechanisms. Here’s a breakdown of the LSTM architecture:


1. Memory Cell:

The core of an LSTM unit is the memory cell, which maintains information over long periods of time. Unlike the hidden state in a standard RNN, the memory cell has mechanisms to add or remove information, allowing it to capture and retain long-term dependencies.

2. Gates

LSTMs use gates to control the flow of information into and out of the memory cell. There are three main gates in an LSTM:

Forget Gate (f_t):  This gate decides what portion of the memory cell's previous state should be forgotten. It takes the current input and the previous hidden state, and outputs a number between 0 and 1 for each value in the memory cell. A value of 1 means "keep this information," while 0 means "forget this information."

Input Gate (i_t): This gate determines how much of the new information should be added to the memory cell. It has two parts: one that decides which values to update, and another that creates a vector of new candidate values.

Output Gate (o_t): This gate controls what portion of the memory cell’s state should be output as the hidden state. It uses the memory cell state to compute the final hidden state.


3. Memory Cell Update

The memory cell state (C_t) is updated based on the input from the input gate and the previous memory cell state.

LSTM Cell Computation Steps

Forget Gate Calculation:

Compute the forget gate’s activation using the previous hidden state and the current input.

Input Gate Calculation:

Compute the input gate’s activation and the candidate memory cell values.

Memory Cell State Update:

Update the memory cell state by combining the previous memory cell state (scaled by the forget gate) and the new candidate values (scaled by the input gate).

Output Gate Calculation:

Compute the output gate’s activation and determine the final hidden state using the updated memory cell state.

Applications of LSTMs in NLP
LSTMs are widely used in NLP tasks, including:

1. Language modeling
2. Text generation
3. Sentiment analysis
4. Machine translation


x_t: Input at time step t

h_{t-1}: Hidden state from the previous time step

C_t: Updated memory cell state

f_t: Forget gate output

i_t: Input gate output

o_t: Output gate output

h_t: Hidden state output

https://en.wikipedia.org/wiki/Long_short-term_memory

# 3. Preparing Text Data for LSTM
We'll use the Keras library to preprocess text data, which includes tokenization, sequencing, and padding.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import re

In [None]:
np.random.seed(42)
torch.manual_seed(42)

<torch._C.Generator at 0x78c010b24d10>

In [None]:
text = """
A financial transaction is an agreement, or communication, between a buyer and seller to exchange goods, services, or assets for payment. Any transaction involves a change in the status of the finances of two or more businesses or individuals. A financial transaction always involves one or more financial asset, most commonly money or another valuable item such as gold or silver.

There are many types of financial transactions. The most common type, purchases, occur when a good, service, or other commodity is sold to a consumer in exchange for money. Most purchases are made with cash payments, including physical currency, debit cards, or cheques. The other main form of payment is credit, which gives immediate access to funds in exchange for repayment at a later date.

History

Silver coin of the Maurya Empire, from the 3rd century BC

There is no evidence to support the theory that ancient civilizations worked on systems of barter. Instead, most historians believe that ancient cultures worked on principles of gift economy and debt. In a gift economy, valuables are given without any formal declaration of repayment, often thought to be a form of reciprocal altruism. Official systems of credit and debt were first created around 1800 BCE by the Babylonians, who established the first formal interest rate limits with the Code of Hammurabi.

Many cultures around the world began using commodity money—objects whose value comes from their intrinsic value. These often included gold or silver coins, along with non-metal objects such as cowrie shells, beaver pelts, and dried corn. Between 1000 BCE and the first millennium CE, coinage became increasingly common throughout Europe and Asia. In England, banknotes were introduced starting in the 17th century. Each note promised to pay the bearer the value in gold upon demand—this is called a gold standard. In the 20th century, many countries gradually phased out the gold standard in favour of fiat money—money that is not backed by any commodity.

Since the start of the 21st century, online banking has become much more widespread. By 2001, tens of millions of people were doing their banking on the internet. By 2012, between 46 and 82 percent of all transactions were done electronically. Digital currencies, currency that is stored on electronic systems, have gained popularity. Bitcoin, invented in 2009, reached a cap of over US$1 trillion in 2021. One of the downsides of cryptocurrencies is that since they are not tethered to any tangible assets, their price can fluctuate wildly, sometimes by 20% or more in a single day.

Types of transactions

Purchases can be made through the use of physical currency, such as cash.

Cash transactions
A cash transaction is any transaction where money is exchanged for a good, service, or other commodity. Cash transactions can refer to items bought with physical money, such as coins or cash, or with a debit card. These differ from credit transactions because the money is immediately taken from the buyer and given to the seller.

Credit transactions
Transactions that use credit involve a deferred payment for the goods or services rendered. When something is bought using credit, it gives the seller an asset (the payment at a later date) and gives the buyer a liability (the amount that must be paid at a later date). Credit cards are an example of when credit is used, where the card issuer (usually a bank) gives the customer a line of credit with which they can make purchases. The liabilities the customer accrues with the card are usually paid off at a set date, and any unpaid liabilities create interest for the issuer.

Loans and mortgages are examples of credit. The lender agrees to give out a lump sum (the "principal") to the borrower, who pays back the loaned amount over a set period of time (called a "term"). The lender usually charges an additional percentage on top of the initial amount borrowed, called the "interest rate". Mortgages are similar to loans, but are usually for a larger amount of money and over a longer term, often for buying real estate. Mortgages are almost always secured by collateral, most commonly the real estate they are being used to purchase. If the borrower fails to make the necessary payments on the mortgage, the lender has the right to claim and sell the property in a process known as foreclosure.

Internal and external transactions
External transactions are any business transactions that involve more than one party. For example, a company buying inventory from a supplier would be considered external. All cash and credit transactions are external, since they affect the finances of more than one person or group. On the other hand, internal transactions only affect one business. Shifting goods between different departments in a business is an internal transaction, since it does not change the overall finances of the company.
"""

In [None]:
def clean_text(text):
    text = re.sub(r'\[.*?\]', '', text)  # Remove references like [1], [2], etc.
    text = re.sub(r'\s+', ' ', text)     # Remove extra whitespace
    text = text.replace('\n', ' ')       # Remove new lines
    text = text.lower()                  # Convert to lowercase
    return text

cleaned_text = clean_text(text)

In [None]:
print("Cleaned Text:")
print(cleaned_text[:500])

Cleaned Text:
 a financial transaction is an agreement, or communication, between a buyer and seller to exchange goods, services, or assets for payment. any transaction involves a change in the status of the finances of two or more businesses or individuals. a financial transaction always involves one or more financial asset, most commonly money or another valuable item such as gold or silver. there are many types of financial transactions. the most common type, purchases, occur when a good, service, or other


In [None]:
class CharTokenizer:
    def __init__(self):
        self.char2idx = {}
        self.idx2char = {}

    def fit(self, text):
        unique_chars = sorted(set(text))
        self.char2idx = {c: i for i, c in enumerate(unique_chars)}
        self.idx2char = {i: c for i, c in enumerate(unique_chars)}

    def texts_to_sequences(self, text):
        return [self.char2idx[char] for char in text]

    def sequences_to_texts(self, sequence):
        return [self.idx2char[idx] for idx in sequence]

In [None]:
tokenizer = CharTokenizer()
tokenizer.fit(cleaned_text)

In [None]:
# Convert text to sequences of integers
sequences = tokenizer.texts_to_sequences(cleaned_text)

In [None]:
# Create input-output pairs
def create_dataset(sequences, step):
    X, y = [], []
    for i in range(0, len(sequences) - step):
        X.append(sequences[i:i + step])
        y.append(sequences[i + step])
    return np.array(X), np.array(y)

In [None]:
# Prepare the data
step = 40  # Length of each input sequence
X, y = create_dataset(sequences, step)

# Convert data to PyTorch tensors
X = torch.tensor(X, dtype=torch.float32).unsqueeze(-1)
y = torch.tensor(y, dtype=torch.long)

In [None]:
# Create a PyTorch Dataset and DataLoader
class TextDataset(Dataset):
    def __init__(self, X, y):
        self.X = X
        self.y = y

    def __len__(self):
        return len(self.X)

    def __getitem__(self, idx):
        return self.X[idx], self.y[idx]

In [None]:
dataset = TextDataset(X, y)
dataloader = DataLoader(dataset, batch_size=128, shuffle=True)

In [None]:
# Define LSTM model
class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(LSTMModel, self).__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        _, (h_n, _) = self.lstm(x)
        output = self.fc(h_n.squeeze(0))
        return output

In [None]:
# Initialize model, loss function, and optimizer
input_size = 1
hidden_size = 128
output_size = len(tokenizer.char2idx)
model = LSTMModel(input_size, hidden_size, output_size)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

In [None]:
# Training loop
num_epochs = 20
for epoch in range(num_epochs):
    epoch_loss = 0.0
    for inputs, targets in dataloader:
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()
    print(f"Epoch {epoch + 1}/{num_epochs}, Loss: {epoch_loss / len(dataloader)}")

Epoch 1/20, Loss: 3.188887671420449
Epoch 2/20, Loss: 2.87601007285871
Epoch 3/20, Loss: 2.8111952919709053
Epoch 4/20, Loss: 2.7771467472377576
Epoch 5/20, Loss: 2.7516434318140934
Epoch 6/20, Loss: 2.730541229248047
Epoch 7/20, Loss: 2.7155682224976387
Epoch 8/20, Loss: 2.7009178713748327
Epoch 9/20, Loss: 2.6802993071706673
Epoch 10/20, Loss: 2.6647705906315853
Epoch 11/20, Loss: 2.647393929330926
Epoch 12/20, Loss: 2.6305983443009224
Epoch 13/20, Loss: 2.6165057608955786
Epoch 14/20, Loss: 2.604543836493241
Epoch 15/20, Loss: 2.5922984449486983
Epoch 16/20, Loss: 2.5720849790071187
Epoch 17/20, Loss: 2.557548849206222
Epoch 18/20, Loss: 2.5470267847964636
Epoch 19/20, Loss: 2.5357374517541182
Epoch 20/20, Loss: 2.5183053518596448


In [None]:
# Define the path where the model will be saved
model_path = "lstm_text_generation_model.pth"

# Save the model's state dictionary
torch.save(model.state_dict(), model_path)

print(f"Model saved to {model_path}")


Model saved to lstm_text_generation_model.pth


In [None]:
# Function to generate text
def generate_text(model, tokenizer, seed_text, length=100):
    model.eval()  # Set the model to evaluation mode
    input_sequence = tokenizer.texts_to_sequences(seed_text)
    input_sequence = torch.tensor(input_sequence, dtype=torch.float32).unsqueeze(0).unsqueeze(-1)
    result = seed_text
    for _ in range(length):
        output = model(input_sequence)
        predicted_index = torch.argmax(output, dim=1).item()
        predicted_char = tokenizer.idx2char[predicted_index]
        result += predicted_char
        input_sequence = torch.cat([input_sequence[:, 1:], torch.tensor([[[predicted_index]]], dtype=torch.float32)], dim=1)
    return result


In [None]:
# Define the seed text and generate text
seed_text = "a financial transaction is an agreement"
generated_text = generate_text(model, tokenizer, seed_text, length=100)

# Print the generated text
print("Generated Text:")
print(generated_text)

Generated Text:
a financial transaction is an agreement a coren th the the the the the the the the the the the the the the the the the the the the the the 
