# **04.2 Model Training - Deep Learning ML**

In [None]:
!pip install torch torchvision mlflow 

In [4]:
import pandas as pd

import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.model_selection import train_test_split

import mlflow
import mlflow.pytorch

StatementMeta(, 32ffa4e6-5433-4bcc-8e8b-787047ed3c92, 6, Finished, Available)

In [5]:
# Define the file paths
path = '/lakehouse/default/Files/data/processed/model_ready_data.csv'

# Load the dataset into pandas dataframes
df = pd.read_csv(path)

features = [
    'VV', 'SA', 'AR', 'WBLR', 'FA', 'TWD', 'ORF', 'ORR', 'GHBHR', 'UBC',
    'CWVWR', 'CWVR', 'TO', 'ATW', 'BWR', 'CWLR', 'FAR', 'WDFR_RATIO', 'CW_WD_IMPACT',
    'WDF', 'WDR'  
]

# Define features (X) and target (y) for the model
X = df[features]
y = df[['CITYE_KWH/100MI', 'HIGHWAYE_KWH/100MI', 'COMBE_KWH/100MI']]

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


StatementMeta(, 32ffa4e6-5433-4bcc-8e8b-787047ed3c92, 7, Finished, Available)

# Energy Consumption Model Training

This script defines and trains a neural network model for predicting energy consumption based on input features. The model and training process are implemented using PyTorch, a popular deep learning library.

## Model Definition

The `EnergyConsumptionModel` class inherits from `nn.Module`, which is a base class for all neural network modules in PyTorch. The model is defined with an initializer (`__init__`) and a forward pass method (`forward`).

- **Initializer (`__init__`)**: Takes `input_size`, `hidden_size`, and `output_size` as parameters to define the network architecture. It consists of a sequential container (`nn.Sequential`) that chains together:
  - A linear transformation (`nn.Linear`) from input to hidden layer.
  - A ReLU activation function (`nn.ReLU`) for non-linear transformation.
  - Another linear transformation from the hidden layer to the output layer.

- **Forward Pass (`forward`)**: Defines how the data flows through the network. It takes an input tensor `x` and passes it through the network defined in `__init__`, returning the network's output.

## Model Training Function

The `train_model` function is responsible for training the model. It takes training and testing datasets (`X_train`, `y_train`, `X_test`, `y_test`), along with a dictionary of parameters (`params`), and performs the following steps:

1. **Data Preparation**: Converts training and testing data into PyTorch tensors, which are multi-dimensional arrays similar to numpy arrays but can be used on GPUs to accelerate computing.

2. **Model Initialization**: Initializes the `EnergyConsumptionModel` with the appropriate `input_size`, `hidden_size` from `params`, and `output_size`.

3. **Loss Function and Optimizer**:
   - Uses Mean Squared Error Loss (`nn.MSELoss`) as the criterion for evaluating how well the model is performing.
   - Uses the Adam optimizer (`optim.Adam`) for updating model weights based on the computed gradients, with a learning rate specified in `params`.

4. **Training Loop**: Iterates over the number of epochs specified in `params`, performing the following steps for each epoch:
   - Clears old gradients from the last step (`optimizer.zero_grad()`).
   - Computes the model's output on the training data.
   - Calculates the loss between the model's predictions and the actual values.
   - Backpropagates the loss to compute gradients (`loss.backward()`).
   - Updates the model parameters (`optimizer.step()`).
   - Logs the training loss every 10 epochs using `mlflow`, a platform for managing the machine learning lifecycle.

5. **Model Evaluation**: Switches the model to evaluation mode (`model.eval()`), disables gradient computations, and calculates the loss on the testing dataset. Logs the test loss using `mlflow`.

6. **Logging**: Logs training parameters and the trained model using `mlflow` for tracking experiments and model management.

This approach demonstrates a structured and modular way to define, train, and evaluate neural network models using PyTorch and manage experiments with MLflow.


In [6]:
class EnergyConsumptionModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(EnergyConsumptionModel, self).__init__()
        self.network = nn.Sequential(
            nn.Linear(input_size, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, output_size)
        )
    
    def forward(self, x):
        return self.network(x)


def train_model(X_train, y_train, X_test, y_test, params):
    # Convert data to PyTorch tensors
    X_train_tensor = torch.tensor(X_train.values, dtype=torch.float32)
    y_train_tensor = torch.tensor(y_train.values, dtype=torch.float32)
    X_test_tensor = torch.tensor(X_test.values, dtype=torch.float32)
    y_test_tensor = torch.tensor(y_test.values, dtype=torch.float32)

    # Initialize the model
    input_size = X_train.shape[1]
    output_size = y_train.shape[1]
    model = EnergyConsumptionModel(input_size, params['hidden_size'], output_size)

    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr=params['learning_rate'])

    # Training loop
    model.train()
    for epoch in range(params['epochs']):
        optimizer.zero_grad()
        output = model(X_train_tensor)
        loss = criterion(output, y_train_tensor)
        loss.backward()
        optimizer.step()

        # Log loss every 10 epochs
        if epoch % 10 == 0:
            mlflow.log_metric("training_loss", loss.item(), step=epoch)

    # Evaluate the model
    model.eval()
    with torch.no_grad():
        predictions = model(X_test_tensor)
        test_loss = criterion(predictions, y_test_tensor)
        mlflow.log_metric("test_loss", test_loss.item())

    # Log parameters and the model
    mlflow.log_params(params)
    mlflow.pytorch.log_model(model, "model")

StatementMeta(, 32ffa4e6-5433-4bcc-8e8b-787047ed3c92, 8, Finished, Available)

This code snippet demonstrates the use of MLflow, a platform for managing the machine learning lifecycle, including experimentation, reproducibility, and deployment. Specifically, it's used here to conduct a series of experiments for predicting energy consumption with multiple output targets, indicating a regression problem that predicts several continuous variables simultaneously.

1. **Setting the Experiment Name**: The experiment is uniquely identified as "Multi-output_Energy_Consumption_Prediction". This allows MLflow to organize and track the runs under this specific experiment, facilitating easier comparison and analysis of different models or parameter sets.

2. **Defining Parameter Grid**: A parameter grid is defined as a list of dictionaries, where each dictionary represents a different combination of hyperparameters for the model. This grid includes variations in:
   - `hidden_size`: The size of the hidden layer(s) in the neural network, affecting the model's capacity to learn complex patterns.
   - `learning_rate`: The step size at each iteration while moving toward a minimum of the loss function, affecting how fast or slow the model learns.
   - `epochs`: The number of times the learning algorithm will work through the entire training dataset, affecting how well the model learns from the data.

   These parameters are varied to observe their impact on the model's performance, with some being increased or decreased from a baseline setup.

3. **Iterating Through Parameter Grid**: A for-loop iterates over each set of parameters in the grid. For each set of parameters:
   - A new MLflow run is started using `mlflow.start_run()`. This creates a unique run within the experiment for tracking and logging purposes.
   - The current set of parameters is logged using `mlflow.log_params(params)`. This enables tracking which parameters were used in each experiment run, important for reproducibility and analysis.
   - The model training process is initiated with `train_model(X_train, y_train, X_test, y_test, params)`, where `X_train` and `y_train` represent the training features and targets, respectively, and `X_test` and `y_test` are the testing features and targets. This function is presumably defined elsewhere and is responsible for training the model with the given parameters and data.

This approach allows for systematic exploration of the parameter space to understand how different hyperparameters affect model performance, leveraging MLflow for tracking and managing these experiments.


In [8]:
# Set the experiment name
mlflow.set_experiment("Multi-output_Energy_Consumption_Prediction")

# Define params to iterate through
param_grid = [
    {'hidden_size': 100, 'learning_rate': 0.001, 'epochs': 100},    # Baseline
    {'hidden_size': 100, 'learning_rate': 0.01, 'epochs': 100},     # Increased Learning Rate
    {'hidden_size': 100, 'learning_rate': 0.0001, 'epochs': 100},   # Decreased Learning Rate
    {'hidden_size': 200, 'learning_rate': 0.001, 'epochs': 100},    # Increased Hidden Size
    {'hidden_size': 50, 'learning_rate': 0.001, 'epochs': 100},     # Decreased Hidden Size
    {'hidden_size': 100, 'learning_rate': 0.001, 'epochs': 200},    # Increased Epochs
    {'hidden_size': 100, 'learning_rate': 0.001, 'epochs': 50}      # Decreased Epochs
]

for params in param_grid:
    with mlflow.start_run():
        mlflow.log_params(params)  # Log current parameters
        print(f"Training model with params: {params}")
        train_model(X_train, y_train, X_test, y_test, params)

StatementMeta(, 32ffa4e6-5433-4bcc-8e8b-787047ed3c92, 10, Finished, Available)

2024/03/03 19:12:28 INFO mlflow.tracking.fluent: Experiment with name 'Multi-output_Energy_Consumption_Prediction' does not exist. Creating a new experiment.


Training model with params: {'hidden_size': 100, 'learning_rate': 0.01, 'epochs': 100}


Training model with params: {'hidden_size': 100, 'learning_rate': 0.0001, 'epochs': 100}




Training model with params: {'hidden_size': 200, 'learning_rate': 0.001, 'epochs': 100}




Training model with params: {'hidden_size': 50, 'learning_rate': 0.001, 'epochs': 100}




Training model with params: {'hidden_size': 100, 'learning_rate': 0.001, 'epochs': 200}




Training model with params: {'hidden_size': 100, 'learning_rate': 0.001, 'epochs': 50}


