# Building, Training, and Executing a Long Short-Term Memory (LSTM) Model Using Automation Lightning functions 

This notebook builds a LSTM Model using Lightning Helper functions

I am showing how much easier it can be using built in functions. 

#### Specifically, I am using `PyTorch LSTM, nn.LSTM()`

---

## Importing Modules

I have added copious comments to help better understand what each import is accomplishing.

In [10]:
import lightning as L # Lightning has tons of cool tools that make neural networks easier

import torch # torch will allow us to create tensors.
import torch.nn as nn # torch.nn allows us to create a neural network.
import torch.nn.functional as F # nn.functional give us access to the activation and loss functions.
from torch.optim import Adam # optim contains many optimizers. This time I am using Adam
from torch.utils.data import TensorDataset, DataLoader # needed for training data

----
## Example - Building a Long Short-Term Memory Unit using `PyTorch and Lightning`


For this LSTM example, I imagine that I have two companies: Company A and Company B with five day's worth of stock prices

![company stock](imgs/company_stock_prices.png)

Given this sequential data, I want to see if I can get the LSTM to remember what happened on Day 1 through Day 4, to see if I can correctly predict what will happen on Day 5. 

`The objective`: I will run the data from Day 1 through Day 4 through the LSTM to see If I can predict the values for Day 5 for both Company A and Company B.

For Company A, the goal is to predict that the value on Day 5 = 0, and for Company B,the goal is to predict that the value on Day 5 = 1.

### Creating the LSTM Model Class
# Using and optimzing the PyTorch LSTM, nn.LSTM()

Taking advantage of PyTorch's `nn.LSTM()` function. 

For the most part, using `nn.LSTM()` allows me to simplify the `__init__()` function and the `forward()` function. 

The other big difference is that this time, I can set the learning rate for the optimizer, Adam, to **0.1**. This change will speed up training a lot. Everything else stays the same.

In [None]:
# Instead of coding an LSTM manually, let's see what I can do with PyTorch's nn.LSTM()
class LightningLSTM(L.LightningModule):

    def __init__(self): # __init__() is the class constructor function, and I use it to initialize the Weights and Biases.

        super().__init__() # initialize an instance of the parent class, LightningModule.

        L.seed_everything(seed=42)

        # input_size = number of features (or variables) in the data. In my example,
        #              I only have a single feature (value)
        # hidden_size = this determines the dimension of the output
        #               in other words, if I set hidden_size=1, then I will have 1 output node
        #               if I set hiddeen_size=50, then I will have 50 output nodes (that can then be 50 input
        #               nodes to a subsequent fully connected neural network.
        self.lstm = nn.LSTM(input_size=1, hidden_size=1)


    def forward(self, input):
        # Transpose the input vector
        input_trans = input.view(len(input), 1)

        lstm_out, temp = self.lstm(input_trans)

        # lstm_out has the short-term memories for all inputs. I make my prediction with the last one
        prediction = lstm_out[-1]
        return prediction


    def configure_optimizers(self): # This method configures the optimizer I want to use for backpropagation.
        return Adam(self.parameters(), lr=0.1) # Set the learning rate to 0.1


    def training_step(self, batch, batch_idx): # Take a step during gradient descent.
        input_i, label_i = batch # Collect input
        output_i = self.forward(input_i[0]) # Run input through the neural network
        loss = (output_i - label_i)**2 # Loss = squared residual

        ###################
        #
        # Logging the loss and the predicted values so I can evaluate the training
        #
        ###################
        self.log("train_loss", loss)

        if (label_i == 0):
            self.log("out_0", output_i)
        else:
            self.log("out_1", output_i)

        return loss

## Creating an Instance for the LSTM

Now, that I have created class ,`LightningLSTM`, that defines an LSTM, I can use it to create a model and print out the randomly initialized `Weights` and `Biases`. 

Then, just for fun, I'll see what those random Weights and Biases predict for **Company A** and **Company B**. If they are good predictions, then I am done! However, the chances of getting good predictions from random values is very small. 

In [12]:
# Create the model object, print out parameters and see how well
## the untrained LSTM can make predictions...
model = LightningLSTM()

print("Before optimization, the parameters are...")
for name, param in model.named_parameters():
    print(name, param.data)

print("\nNow let's compare the observed and predicted values...")
# NOTE: To make predictions, we pass in the first 4 days worth of stock values
# in an array for each company. In this case, the only difference between the
# input values for Company A and B occurs on the first day. Company A has 0 and
# Company B has 1.
print("Company A: Observed = 0, Predicted =",
      model(torch.tensor([0., 0.5, 0.25, 1.])).detach())
print("Company B: Observed = 1, Predicted =",
      model(torch.tensor([1., 0.5, 0.25, 1.])).detach())

Seed set to 42


Before optimization, the parameters are...
lstm.weight_ih_l0 tensor([[ 0.7645],
        [ 0.8300],
        [-0.2343],
        [ 0.9186]])
lstm.weight_hh_l0 tensor([[-0.2191],
        [ 0.2018],
        [-0.4869],
        [ 0.5873]])
lstm.bias_ih_l0 tensor([ 0.8815, -0.7336,  0.8692,  0.1872])
lstm.bias_hh_l0 tensor([ 0.7388,  0.1354,  0.4822, -0.1412])

Now let's compare the observed and predicted values...
Company A: Observed = 0, Predicted = tensor([0.6675])
Company B: Observed = 1, Predicted = tensor([0.6665])


## Initial Results 
With the unoptimized paramters (i.e., using the initial random weights), the predicted value for **Company A**, **-0.6675**, isn't terrible, since it is relatively close to the observed value, **0**. However, the predicted value for **Company B**, **-0.6665**, _is_ bad, because it is relatively far from the observed value, **1**. So, that means I need to train the LSTM.

Note, we would still want to train, but it was a first attempt to see if our first attempt was close enough or not.




---
---

### Time to Train my LSTM

Train the LSTM unit and use `Lightning` and `TensorBoard` to evaluate



### Use `DataLoader`


In [13]:
## create the training data as a tensor for the neural network.
inputs = torch.tensor([[0., 0.5, 0.25, 1.], [1., 0.5, 0.25, 1.]]) #A and B
labels = torch.tensor([0., 1.]) # Anticipated output predictions for company A and company B

dataset = TensorDataset(inputs, labels)
dataloader = DataLoader(dataset)

Next, I have to create a `Lightning Trainer`.

* `L.Trainer` - A Class that I use to facilitate training of the data
    * I start with 300 epochs, which may or may not be good enough
    * Recall, I used the standard learning rate, 0.1, which makes learning fast

In [14]:
trainer = L.Trainer(max_epochs=300)
trainer.fit(model, train_dataloaders=dataloader)

Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs

  | Name | Type | Params | Mode 
--------------------------------------
0 | lstm | LSTM | 16     | train
--------------------------------------
16        Trainable params
0         Non-trainable params
16        Total params
0.000     Total estimated model params size (MB)
1         Modules in train mode
0         Modules in eval mode


Training: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=300` reached.


### Okay, now I that I have trained the model with 300 Epochs, I can see how good the predictions are.

 NOTE: Because I have set Adam's learning rate to 0.1, It will train much, much faster.
 Before, with the manual made LSTM and the default learning rate, 0.001, it took about 5000 epochs to fully train
 the model. Now, with the learning rate set to 0.1, I only need 300 epochs. 
 
 Now, because I am doing so few epochs, I have to tell the trainer to add stuff to the log files every 2 steps (or epoch, since I have two rows of training data) because the with default logging settings, updating the log files every 50 steps, results in a terrible looking graphs.

In [15]:
print("After optimization, the parameters are...")
for name, param in model.named_parameters():
    print(name, param.data)

After optimization, the parameters are...
lstm.weight_ih_l0 tensor([[3.5364],
        [1.3869],
        [1.5390],
        [1.2488]])
lstm.weight_hh_l0 tensor([[5.2070],
        [2.9577],
        [3.2652],
        [2.0678]])
lstm.bias_ih_l0 tensor([-0.9143,  0.3724, -0.1815,  0.6376])
lstm.bias_hh_l0 tensor([-1.0570,  1.2414, -0.5685,  0.3092])


In [None]:
# Now that training is done, I print out the new predictions...
print("\nNow let's compare the observed and predicted values...")
print("Company A: Observed = 0, Predicted =", model(torch.tensor([0., 0.5, 0.25, 1.])).detach())
print("Company B: Observed = 1, Predicted =", model(torch.tensor([1., 0.5, 0.25, 1.])).detach())


Now let's compare the observed and predicted values...
Company A: Observed = 0, Predicted = tensor([6.7842e-05])
Company B: Observed = 1, Predicted = tensor([0.9809])


### Summarizing these results after 300 epochs:

* The predictions are great. 
    * Company A - Day 5 prediction is 6.7842e-05 --  much closer to 0
    * Company B - Day 5 prediction is 0.9809 -- very close to 1

* TensorBoard
    * Have a look at the `loss` values and `predictions` that were saved in the log files using `TensorBoard`


[TensorBoard](https://www.tensorflow.org/tensorboard) is a visualization toolkit for TensorFlow that provides tools and visualizations for machine learning experimentation. It is particularly useful for understanding, debugging, and optimizing machine learning models. 


### To get TensorBoard working with VS code

* Open the command palette (Ctrl/Cmd + Shift + P)
    * you may need to add tensorboard to your current virtual environment
        * in terminal I used `uv add tensorbard` as I use uv to add modules
        * Note: all of this should be done for you with this project as all dependencies are in the pyproject.toml file. 

   * I had to `restart and run the code` in the notebook and it resulted in TensorBoard VS Code extension addition message 

   * Or if need be: Search for the command “Python: Launch TensorBoard” and press enter.
   
   * You will be able to select the folder where your TensorBoard log files are located. By default, the current working directory will be used. Here, I used the `lightning_logs` directory

    * VSCode will then open a new tab with TensorBoard and its lifecycle will be managed by VS Code as well. This tab means that to kill the TensorBoard process all you have to do is close the TensorBoard tab.

### Looking at the TensorBoard Training Loss Results

In the figures below, I show the TensorBoard loss (train_loss) figure and
* The predictions for Company A(out_0) and the predictions for Company B(out_1).
    * note the X - axis refers to the epoch runs
    * note the Y - axis refers to the stock prediction values
* Recall:
    * Company A, I want to predict 0
    * Company B, I want to predict 1

![train_loss](imgs/loss_part_4.png)

![out_0](imgs/out_0_part4.png)

![out_1](imgs/out_1_part4.png)



#### Summary
In all three figures, the loss (`train_loss`) and the predictions for Company A (`out_0`) and Company B (`out_1`) started to taper off after 500 steps or 250 epochs (recall, I used every two steps), suggesting that adding more epochs may not improve the predictions much, so I am done. I have built the 5th day LSTM predictor!
