<a href="https://colab.research.google.com/github/jeffheaton/app_deep_learning/blob/main/t81_558_class_04_2_schedule.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# T81-558: Applications of Deep Neural Networks

**Module 4: Training for Tabular Data**

- Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
- For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).


# Module 4 Material

- Part 4.1: Using K-Fold Cross-validation with PyTorch [[Video]](https://www.youtube.com/watch?v=Q8ZQNvZwsNE&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/app_deep_learning/blob/main/t81_558_class_04_1_kfold.ipynb)
- **Part 4.2: Training Schedules for PyTorch**  [[Video]](https://www.youtube.com/watch?v=lMMlbmfvKDQ&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/app_deep_learning/blob/main/t81_558_class_04_2_schedule.ipynb)
- Part 4.3: Dropout Regularization [[Video]](https://www.youtube.com/watch?v=4ixjgw6Q42U&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/app_deep_learning/blob/main/t81_558_class_04_3_dropout.ipynb)
- Part 4.4: Batch Normalization [[Video]](https://www.youtube.com/watch?v=1U5nOKh9OLQ&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/app_deep_learning/blob/main/t81_558_class_04_4_batch_norm.ipynb)
- Part 4.5: RAPIDS for Tabular Data [[Video]](https://www.youtube.com/watch?v=KgoXuhG_kfs&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/app_deep_learning/blob/main/t81_558_class_04_5_rapids.ipynb)


# Google CoLab Instructions

The following code ensures that Google CoLab is running and maps Google Drive if needed. We also initialize the PyTorch device to either GPU/MPS (if available) or CPU.


In [1]:
import copy
import torch

try:
    import google.colab

    COLAB = True
    print("Note: using Google CoLab")
except:
    print("Note: not using Google CoLab")
    COLAB = False

# Make use of a GPU or MPS (Apple) if one is available.  (see module 3.2)
device = (
    "mps"
    if getattr(torch, "has_mps", False)
    else "cuda"
    if torch.cuda.is_available()
    else "cpu"
)
print(f"Using device: {device}")


# Early stopping (see module 3.4)
class EarlyStopping:
    def __init__(self, patience=5, min_delta=0, restore_best_weights=True):
        self.patience = patience
        self.min_delta = min_delta
        self.restore_best_weights = restore_best_weights
        self.best_model = None
        self.best_loss = None
        self.counter = 0
        self.status = ""

    def __call__(self, model, val_loss):
        if self.best_loss is None:
            self.best_loss = val_loss
            self.best_model = copy.deepcopy(model.state_dict())
        elif self.best_loss - val_loss >= self.min_delta:
            self.best_model = copy.deepcopy(model.state_dict())
            self.best_loss = val_loss
            self.counter = 0
            self.status = f"Improvement found, counter reset to {self.counter}"
        else:
            self.counter += 1
            self.status = f"No improvement in the last {self.counter} epochs"
            if self.counter >= self.patience:
                self.status = f"Early stopping triggered after {self.counter} epochs."
                if self.restore_best_weights:
                    model.load_state_dict(self.best_model)
                return True
        return False

Note: not using Google CoLab
Using device: mps


# Part 4.2: Training Schedules for PyTorch

Learning rate schedules are mechanisms used during the training of neural networks to adjust the learning rate over time. They're designed to decrease the learning rate as the training progresses, allowing the network to make large adjustments in the initial stages of training, when the weights are likely far from their optimal values, and then make smaller adjustments as the training progresses, to fine-tune the weights. This adjustment helps mitigate the risk of overshooting the minimum point of the loss function and helps to reach convergence more smoothly.

In PyTorch, one of the learning rate scheduling tools is the StepLR class, found in the **torch.optim.lr_scheduler** module. **StepLR** is a type of learning rate schedule that decreases the learning rate by a certain factor every few epochs. This allows the learning rate to decrease in a step-wise fashion rather than continuously, which can be beneficial in some cases, as it gives the model time to 'settle' into areas of the loss landscape before the learning rate is reduced further.

StepLR takes three parameters:

* **optimizer:** The optimizer you're using to train your model (e.g., SGD, Adam).
* **step_size:** This is the number of epochs after which you want to reduce the learning rate. For instance, if step_size=10, then the learning rate will be reduced every 10 epochs.
* **gamma:** This is the factor by which the learning rate will be reduced at each step. For instance, if gamma=0.1, the learning rate will be multiplied by 0.1 at each step, effectively reducing it by 90%.

The **StepLR** scheduler is used during the training loop. After each step of the optimizer (after **optimizer.step()**), you call scheduler.step() to adjust the learning rate according to the schedule.

It's worth noting that the choice of **step_size** and gamma can be important, and may need to be tuned based on your specific problem and dataset. Too large a **step_size** and the learning rate may not reduce quickly enough; too small and it may reduce too quickly. Similarly, a gamma too close to 1 may not reduce the learning rate significantly enough, while a gamma too small may reduce it too quickly.

We now apply a learning rate to the k-fold cross validation example from the previous section.


In [2]:
import pandas as pd
from scipy.stats import zscore
from sklearn.model_selection import train_test_split

# Read the data set
df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/jh-simple-dataset.csv",
    na_values=['NA','?'])

# Generate dummies for job
df = pd.concat([df,pd.get_dummies(df['job'],prefix="job",dtype=int)],axis=1)
df.drop('job', axis=1, inplace=True)

# Generate dummies for area
df = pd.concat([df,pd.get_dummies(df['area'],prefix="area",dtype=int)],axis=1)
df.drop('area', axis=1, inplace=True)

# Generate dummies for product
df = pd.concat([df,pd.get_dummies(df['product'],prefix="product",dtype=int)],axis=1)
df.drop('product', axis=1, inplace=True)

# Missing values for income
med = df['income'].median()
df['income'] = df['income'].fillna(med)

# Standardize ranges
df['income'] = zscore(df['income'])
df['aspect'] = zscore(df['aspect'])
df['save_rate'] = zscore(df['save_rate'])
df['subscriptions'] = zscore(df['subscriptions'])

Now that the feature vector is created a 5-fold cross-validation can be performed to generate out-of-sample predictions. We will assume 500 epochs and not use early stopping. Later we will see how we can estimate a more optimal epoch count.


In [3]:
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim.lr_scheduler import StepLR
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler

# Convert to PyTorch Tensors
x_columns = df.columns.drop(['age', 'id'])
x = torch.tensor(df[x_columns].values, dtype=torch.float32, device=device)
y = torch.tensor(df['age'].values, dtype=torch.float32, device=device).view(-1, 1)

# Set random seed for reproducibility
torch.manual_seed(42)

# Cross-Validate
kf = KFold(n_splits=5, shuffle=True, random_state=42)

# Early stopping parameters
patience = 10

fold = 0
for train_idx, test_idx in kf.split(x):
    fold += 1
    print(f"Fold #{fold}")

    x_train, x_test = x[train_idx], x[test_idx]
    y_train, y_test = y[train_idx], y[test_idx]

    # PyTorch DataLoader
    train_dataset = TensorDataset(x_train, y_train)
    train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

    # Create the model and optimizer
    model = nn.Sequential(
        nn.Linear(x.shape[1], 20),
        nn.ReLU(),
        nn.Linear(20, 10),
        nn.ReLU(),
        nn.Linear(10, 1)
    )
    model = torch.compile(model,backend="aot_eager").to(device)
    
    optimizer = optim.Adam(model.parameters())
    # adjust learning rate every 50 epochs
    scheduler = StepLR(optimizer, step_size=50, gamma=0.90)  
    loss_fn = nn.MSELoss()

    # Early Stopping variables
    best_loss = float('inf')
    early_stopping_counter = 0

    # Training loop
    EPOCHS = 500
    epoch = 0
    done = False
    es = EarlyStopping()

    while not done and epoch<EPOCHS:
        epoch += 1
        model.train()
        for x_batch, y_batch in train_loader:
            optimizer.zero_grad()
            output = model(x_batch)
            loss = loss_fn(output, y_batch)
            loss.backward()
            optimizer.step()

        scheduler.step()  # apply learning rate schedule
        # Validation
        model.eval()
        with torch.no_grad():
            val_output = model(x_test)
            val_loss = loss_fn(val_output, y_test)

        if es(model, val_loss):
            done = True

    print(f"Epoch {epoch}/{EPOCHS}, Validation Loss: "
      f"{val_loss.item()}, {es.status}")

# Final evaluation
model.eval()
with torch.no_grad():
    oos_pred = model(x_test)
score = torch.sqrt(loss_fn(oos_pred, y_test)).item()
print(f"Fold score (RMSE): {score}")


Fold #1
Epoch 199/500, Validation Loss: 0.5704286694526672, Early stopping triggered after 5 epochs.
Fold #2
Epoch 146/500, Validation Loss: 0.4538898766040802, Early stopping triggered after 5 epochs.
Fold #3
Epoch 165/500, Validation Loss: 0.7377960085868835, Early stopping triggered after 5 epochs.
Fold #4
Epoch 165/500, Validation Loss: 0.4583687484264374, Early stopping triggered after 5 epochs.
Fold #5
Epoch 137/500, Validation Loss: 1.2551430463790894, Early stopping triggered after 5 epochs.
Fold score (RMSE): 1.1172937154769897
