# Lecture 11: Hierarchical PyTorch and high level frameworks

https://bit.ly/torch-intro3 / https://colab.research.google.com/drive/1ZITGTm6UED8qNs9HbBS45QCB3tk2UnvF?usp=sharing 
In this lecture, we dive into more advanced concepts in PyTorch, focusing on creating hierarchical neural networks.

We'll also leverage PyTorch Lightning to streamline our training process and TensorBoard for logging and visualization.

## Setup and Preliminaries

Before we start, ensure that you have PyTorch Lightning, and TensorBoard installed in your environment:

In [14]:
!pip install -q -U lightning tensorboard

In [15]:
import torch
import torch.nn as nn
import torch.optim as optim
import lightning as L
from lightning import Trainer
from lightning.pytorch.loggers import TensorBoardLogger
from torch.utils.data import DataLoader, TensorDataset
import numpy as np
import matplotlib.pyplot as plt

## Generating the Dataset
Let's generate synthetic data based on the equation $y = \cos(X_0 \cdot 2.1 - 0.9) + X_2 \cdot \exp(-X_3^2)$.

We will create a dataset where each sample will have features $[X_0, X_1, X_2, X_3]$ and the target $y$.

In [16]:
num_samples = 100_000
n_features = 4

X = np.random.rand(num_samples, n_features) * 2 - 1  # Generate random samples in the range [-1, 1]
y = np.cos(X[:, 0] * 2.1 - 0.9) + X[:, 2] * np.exp(-X[:, 3]**2)


X = torch.tensor(X, dtype=torch.float)
y = torch.tensor(y, dtype=torch.float)


In [17]:
print (X)

tensor([[-0.2691, -0.5700,  0.9034, -0.4306],
        [ 0.6267, -0.1377, -0.0991, -0.4562],
        [-0.9838, -0.2826, -0.6672, -0.7430],
        ...,
        [ 0.3953,  0.3029, -0.1189,  0.9153],
        [ 0.6491,  0.7937, -0.6526, -0.7236],
        [ 0.2078,  0.4156,  0.9341, -0.1918]])


Set up our data loader (no train/validation for simplicity)

In [18]:
print (y)

tensor([ 0.8560,  0.8342, -1.3688,  ...,  0.9461,  0.5081,  1.7948])


In [19]:
dataset = TensorDataset(X, y)
batch_size = 1024
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=4)

## Defining the Hierarchical Model


We will define a hierarchical neural network in PyTorch that incorporates feature engineering within the model itself.

One sub-network will learn to transform inputs for the cosine term, while another learns the exponential term's effect.

In [20]:
class MLP(nn.Module):
    def __init__(self, hidden_size=64, n_features=4, n_layers=2):
        super().__init__()

        self.net = nn.Sequential(
            nn.Linear(n_features, hidden_size),
            nn.ReLU(),
            *[
                nn.Sequential(
                    nn.Linear(hidden_size, hidden_size),
                    nn.ReLU()
                )
                for _ in range(n_layers)
            ],
            nn.Linear(hidden_size, 1)
        )

    def forward(self, x):
        return self.net(x)

## Integrating Sub-networks into the Hierarchical Model

In this section, we combine the Cosine and Exponential sub-networks into a single hierarchical model that also incorporates the necessary feature transformations.

In [21]:
class HierarchicalModel(nn.Module):
    def __init__(self, hidden_size=64, n_layers=2):
        super().__init__()

        # All subnetworks get wrapped into this model!
        self.cosine_argument = MLP(hidden_size=hidden_size, n_features=n_features, n_layers=n_layers)
        self.exp_term = MLP(hidden_size=hidden_size, n_features=n_features, n_layers=n_layers)

    def forward(self, x):
        return torch.cos(self.cosine_argument(x)) + self.exp_term(x)


## Refactoring with PyTorch Lightning
Next, we refactor our model to leverage PyTorch Lightning's features, simplifying training and logging procedures.

In [22]:
class LitHierarchicalModel(L.LightningModule):
    def __init__(self, hidden_size=64, n_layers=2, learning_rate=1e-3):
        super().__init__()
        # Automatically track hyperparameters:
        self.save_hyperparameters()

        self.model = HierarchicalModel(
            hidden_size=self.hparams["hidden_size"],
            n_layers=self.hparams["n_layers"]
        )
    
    def configure_optimizers(self):
        # Declare optimizers *within* lightning model:
        optimizer = optim.Adam(self.parameters(), lr=self.hparams["learning_rate"])
        return optimizer
    
    def forward(self, x):
        return self.model(x)
    
    def training_step(self, batch, batch_idx):
        x, y = batch
        y_pred = self(x)
        loss = nn.MSELoss()(y_pred, y)

        # For tracking metrics:
        self.log('train_loss', loss)

        # (Can also track anything else we want here.)
        return loss
    

## Setting Up TensorBoard Logging
We'll use TensorBoard to visualize our training process, including loss curves and model graphs.

In [23]:
logger = TensorBoardLogger("tb_logs", name="my_model")

## Training the Model
With everything set up, we can now train our model using PyTorch Lightning's Trainer, which integrates seamlessly with our TensorBoard logger.

In [24]:
model = LitHierarchicalModel(hidden_size=32, n_layers=2, learning_rate=1e-3)
model

LitHierarchicalModel(
  (model): HierarchicalModel(
    (cosine_argument): MLP(
      (net): Sequential(
        (0): Linear(in_features=4, out_features=32, bias=True)
        (1): ReLU()
        (2): Sequential(
          (0): Linear(in_features=32, out_features=32, bias=True)
          (1): ReLU()
        )
        (3): Sequential(
          (0): Linear(in_features=32, out_features=32, bias=True)
          (1): ReLU()
        )
        (4): Linear(in_features=32, out_features=1, bias=True)
      )
    )
    (exp_term): MLP(
      (net): Sequential(
        (0): Linear(in_features=4, out_features=32, bias=True)
        (1): ReLU()
        (2): Sequential(
          (0): Linear(in_features=32, out_features=32, bias=True)
          (1): ReLU()
        )
        (3): Sequential(
          (0): Linear(in_features=32, out_features=32, bias=True)
          (1): ReLU()
        )
        (4): Linear(in_features=32, out_features=1, bias=True)
      )
    )
  )
)

In [25]:
trainer = Trainer(max_epochs=100, logger=logger)
trainer

GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


<lightning.pytorch.trainer.trainer.Trainer at 0x2859b71d0>

In [26]:
trainer.fit(model, dataloader)


  | Name  | Type              | Params
--------------------------------------------
0 | model | HierarchicalModel | 4.6 K 
--------------------------------------------
4.6 K     Trainable params
0         Non-trainable params
4.6 K     Total params
0.018     Total estimated model params size (MB)


Epoch 0:   0%|          | 0/98 [00:00<?, ?it/s] 

Epoch 87:   0%|          | 0/98 [00:00<?, ?it/s, v_num=2]         

## Visualizing Training with TensorBoard
To view the training logs, you can try to launch TensorBoard from the terminal:
```bash
tensorboard --logdir tb_logs/
```
You can navigate to the provided URL to view loss curves, model graphs, and other metrics.

# Lightning options

https://lightning.ai/docs/pytorch/stable/common/trainer.html

Many of these are state-of-the-art performance tricks!

# Karpathy Blog Post

For any deep learning project, try:

https://karpathy.github.io/2019/04/25/recipe/

## Practical Exercises




1. Experiment with different architectures for the cosine and exponential sub-networks. How does changing the depth or width of these sub-networks affect performance?
2. Utilize additional features of TensorBoard, such as custom scalars or model profiling, to gain deeper insights into the training process.
3. Explore the advanced features of PyTorch Lightning, such as callbacks for model checkpointing and early stopping, to further improve your model training workflow.

