## MLflow with Optuna: Hyperparameter Optimization and Tracking

A critical part of building production-grade models is ensuring that a given model's parameters are selected to create the best inference set possible. However, the sheer number of combinations and their resultant metrics can become overwhelming to track manually. That's where tools like MLflow and Optuna come into play.

### Objective: 
In this notebook, you'll learn how to integrate MLflow with Optuna for hyperparameter optimization. We'll guide you through the process of:

* Setting up your environment with MLflow tracking.
* Generating our training and evaluation data sets.
* Defining a partial function that fits a machine learning model.
* Using Optuna for hyperparameter tuning.
* Leveraging child runs within MLflow to keep track of each iteration during the hyperparameter tuning process.

### Why Optuna?
Optuna is an open-source hyperparameter optimization framework in Python. It provides an efficient approach to searching over hyperparameters, incorporating the latest research and techniques. With its integration into MLflow, every trial can be systematically recorded.

### Child Runs in MLflow:
One of the core features we will be emphasizing is the concept of 'child runs' in MLflow. When performing hyperparameter tuning, each iteration (or trial) in Optuna can be considered a 'child run'. This allows us to group all the runs under one primary 'parent run', ensuring that the MLflow UI remains organized and interpretable. Each child run will track the specific hyperparameters used and the resulting metrics, providing a consolidated view of the entire optimization process.

### What's Ahead?

**Data Preparation**: We'll start by loading and preprocessing our dataset.

**Model Definition**: Defining a machine learning model that we aim to optimize.

**Optuna Study**: Setting up an Optuna study to find the best hyperparameters for our model.

**MLflow Integration**: Tracking each Optuna trial as a child run in MLflow.

**Analysis**: Reviewing the tracked results in the MLflow UI.

By the end of this notebook, you'll have hands-on experience in setting up an advanced hyperparameter tuning workflow, emphasizing best practices and clean organization using MLflow and Optuna. 

**Let's dive in!**

In [1]:
import math
from datetime import datetime, timedelta

import numpy as np
import optuna
import pandas as pd
import xgboost as xgb
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

import mlflow

### Configure the tracking server uri

Depending on where you are running this notebook, your configuration may vary for how you initialize the interface with the MLflow Tracking Server. 

For this example, we're using a locally running tracking server, but other options are available (The easiest is to use the free managed service within [Databricks Community Edition](https://community.cloud.databricks.com/)). 

Please see [the guide to running notebooks here](https://www.mlflow.org/docs/latest/getting-started/running-notebooks/index.html) for more information on setting the tracking server uri and configuring access to either managed or self-managed MLflow tracking servers.

In [2]:
# NOTE: review the links mentioned above for guidance on connecting to a managed tracking server, such as the free Databricks Community Edition

mlflow.set_tracking_uri("http://localhost:8080")

In [10]:
experiment_id = get_or_create_experiment("Apples Demand")

In [12]:
# Set the current active MLflow experiment
mlflow.set_experiment(experiment_id=experiment_id)

# Preprocess the dataset
X = df.drop(columns=["date", "demand"])
y = df["demand"]
train_x, valid_x, train_y, valid_y = train_test_split(X, y, test_size=0.25)
dtrain = xgb.DMatrix(train_x, label=train_y)
dvalid = xgb.DMatrix(valid_x, label=valid_y)

#### Hyperparameter Tuning and Model Training using Optuna and MLflow

The `objective` function serves as the core of our hyperparameter tuning process using Optuna. Additionally, it trains an XGBoost model using the selected hyperparameters and logs metrics and parameters to MLflow.

##### MLflow Nested Runs

The function starts a new nested run in MLflow. Nested runs are useful for organizing hyperparameter tuning experiments as they allow you to group individual runs under a parent run.

##### Defining Hyperparameters

Optuna's `trial.suggest_*` methods are used to define a range of possible values for hyperparameters. Here's what each hyperparameter does:

- `objective` and `eval_metric`: Define the loss function and evaluation metric.
- `booster`: Type of boosting to be used (`gbtree`, `gblinear`, or `dart`).
- `lambda` and `alpha`: Regularization parameters.
- Additional parameters like `max_depth`, `eta`, and `gamma` are specific to tree-based models (`gbtree` and `dart`).

##### Model Training

An XGBoost model is trained using the chosen hyperparameters and the preprocessed training dataset (`dtrain`). Predictions are made on the validation set (`dvalid`), and the mean squared error (`mse`) is calculated.

##### Logging with MLflow

All the selected hyperparameters and metrics (`mse` and `rmse`) are logged to MLflow for later analysis and comparison.

- `mlflow.log_params`: Logs the hyperparameters.
- `mlflow.log_metric`: Logs the metrics.

##### Why This Function is Important

- **Automated Tuning**: Optuna automates the process of finding the best hyperparameters.
- **Experiment Tracking**: MLflow allows us to keep track of each run's hyperparameters and performance metrics, making it easier to analyze, compare, and reproduce experiments later.

In the next step, this objective function will be used by Optuna to find the optimal set of hyperparameters for our XGBoost model.


#### Housekeeping: Streamlining Logging for Optuna Trials

As we embark on our hyperparameter tuning journey with Optuna, it's essential to understand that the process can generate a multitude of runs. In fact, so many that the standard output (stdout) from the default logger can quickly become inundated, producing pages upon pages of log reports. 

While the verbosity of the default logging configuration is undeniably valuable during the code development phase, initiating a full-scale trial can result in an overwhelming amount of information. Considering this, logging every single detail to stdout becomes less practical, especially when we have dedicated tools like MLflow to meticulously track our experiments.

To strike a balance, we'll utilize callbacks to tailor our logging behavior.

##### Implementing a Logging Callback:

The callback we're about to introduce will modify the default reporting behavior. Instead of logging every trial, we'll only receive updates when a new hyperparameter combination yields an improvement over the best metric value recorded thus far.

This approach offers two salient benefits:

1. **Enhanced Readability**: By filtering out the extensive log details and focusing only on the trials that show improvement, we can gauge the efficacy of our hyperparameter search. For instance, if we observe a diminishing frequency of 'best result' reports early on, it might suggest that fewer iterations would suffice to pinpoint an optimal hyperparameter set. On the other hand, a consistent rate of improvement might indicate that our feature set requires further refinement.
   
2. **Progress Indicators**: Especially pertinent for extensive trials that span hours or even days, receiving periodic updates provides assurance that the process is still in motion. These 'heartbeat' notifications affirm that our system is diligently at work, even if it's not flooding stdout with every minute detail.

Moreover, MLflow's user interface (UI) complements this strategy. As each trial concludes, MLflow logs the child run, making it accessible under the umbrella of the parent run.

In the ensuing code, we:

1. Adjust Optuna's logging level to report only errors, ensuring a decluttered stdout.
2. Define a `champion_callback` function, tailored to log only when a trial surpasses the previously recorded best metric.

Let's dive into the implementation:


In [13]:
# override Optuna's default logging to ERROR only
optuna.logging.set_verbosity(optuna.logging.ERROR)

# define a logging callback that will report on only new challenger parameter configurations if a
# trial has usurped the state of 'best conditions'


def champion_callback(study, frozen_trial):
    """
    Logging callback that will report when a new trial iteration improves upon existing
    best trial values.

    Note: This callback is not intended for use in distributed computing systems such as Spark
    or Ray due to the micro-batch iterative implementation for distributing trials to a cluster's
    workers or agents.
    The race conditions with file system state management for distributed trials will render
    inconsistent values with this callback.
    """

    winner = study.user_attrs.get("winner", None)

    if study.best_value and winner != study.best_value:
        study.set_user_attr("winner", study.best_value)
        if winner:
            improvement_percent = (abs(winner - study.best_value) / study.best_value) * 100
            print(
                f"Trial {frozen_trial.number} achieved value: {frozen_trial.value} with "
                f"{improvement_percent: .4f}% improvement"
            )
        else:
            print(f"Initial trial {frozen_trial.number} achieved value: {frozen_trial.value}")

In [14]:
def objective(trial):
    with mlflow.start_run(nested=True):
        # Define hyperparameters
        params = {
            "objective": "reg:squarederror",
            "eval_metric": "rmse",
            "booster": trial.suggest_categorical("booster", ["gbtree", "gblinear", "dart"]),
            "lambda": trial.suggest_float("lambda", 1e-8, 1.0, log=True),
            "alpha": trial.suggest_float("alpha", 1e-8, 1.0, log=True),
        }

        if params["booster"] == "gbtree" or params["booster"] == "dart":
            params["max_depth"] = trial.suggest_int("max_depth", 1, 9)
            params["eta"] = trial.suggest_float("eta", 1e-8, 1.0, log=True)
            params["gamma"] = trial.suggest_float("gamma", 1e-8, 1.0, log=True)
            params["grow_policy"] = trial.suggest_categorical(
                "grow_policy", ["depthwise", "lossguide"]
            )

        # Train XGBoost model
        bst = xgb.train(params, dtrain)
        preds = bst.predict(dvalid)
        error = mean_squared_error(valid_y, preds)

        # Log to MLflow
        mlflow.log_params(params)
        mlflow.log_metric("mse", error)
        mlflow.log_metric("rmse", math.sqrt(error))

    return error

#### Orchestrating Hyperparameter Tuning, Model Training, and Logging with MLflow

This section of the code serves as the orchestration layer, bringing together Optuna for hyperparameter tuning and MLflow for experiment tracking. 

##### Initiating Parent Run

We begin by starting a parent MLflow run with the name "Best Run". All subsequent operations, including Optuna's trials, are nested under this parent run, providing a structured way to organize our experiments.

##### Hyperparameter Tuning with Optuna

- `study = optuna.create_study(direction='minimize')`: We create an Optuna study object aiming to minimize our objective function.
- `study.optimize(objective, n_trials=10)`: The `objective` function is optimized over 10 trials.

##### Logging Best Parameters and Metrics

After Optuna finds the best hyperparameters, we log these, along with the best mean squared error (`mse`) and root mean squared error (`rmse`), to MLflow.

##### Logging Additional Metadata

Using `mlflow.set_tags`, we log additional metadata like the project name, optimization engine, model family, and feature set version. This helps in better categorizing and understanding the context of the model run.

##### Model Training and Artifact Logging

- We train an XGBoost model using the best hyperparameters.
- Various plots—correlation with demand, feature importance, and residuals—are generated and logged as artifacts in MLflow.
  
##### Model Serialization and Logging

Finally, the trained model is logged to MLflow using `mlflow.xgboost.log_model`, along with an example input and additional metadata. The model is stored in a specified artifact path and its URI is retrieved.

##### Why This Block is Crucial

- **End-to-End Workflow**: This code block represents an end-to-end machine learning workflow, from hyperparameter tuning to model evaluation and logging.
- **Reproducibility**: All details about the model, including hyperparameters, metrics, and visual diagnostics, are logged, ensuring that the experiment is fully reproducible.
- **Analysis and Comparison**: With all data logged in MLflow, it becomes easier to analyze the performance of various runs and choose the best model for deployment.

In the next steps, we'll explore how to retrieve and use the logged model for inference.


#### Setting a Descriptive Name for the Model Run

Before proceeding with model training and hyperparameter tuning, it's beneficial to assign a descriptive name to our MLflow run. This name serves as a human-readable identifier, making it easier to track, compare, and analyze different runs.

##### The Importance of Naming Runs:

- **Reference by Name**: While MLflow provides unique identifying keys like `run_id` for each run, having a descriptive name allows for more intuitive referencing, especially when using particular APIs and navigating the MLflow UI.
  
- **Clarity and Context**: A well-chosen run name can provide context about the hypothesis being tested or the specific modifications made, aiding in understanding the purpose and rationale of a particular run.
  
- **Automatic Naming**: If you don't specify a run name, MLflow will generate a unique fun name for you. However, this might lack the context and clarity of a manually chosen name.

##### Best Practices:

When naming your runs, consider the following:

1. **Relevance to Code Changes**: The name should reflect any code or parameter modifications made for that run.
2. **Iterative Runs**: If you're executing multiple runs iteratively, it's a good idea to update the run name for each iteration to avoid confusion.

In the subsequent steps, we will set a name for our parent run. Remember, if you execute the model training multiple times, consider updating the run name for clarity.


In [15]:
run_name = "first_attempt"

In [16]:
# Initiate the parent run and call the hyperparameter tuning child run logic
with mlflow.start_run(experiment_id=experiment_id, run_name=run_name, nested=True):
    # Initialize the Optuna study
    study = optuna.create_study(direction="minimize")

    # Execute the hyperparameter optimization trials.
    # Note the addition of the `champion_callback` inclusion to control our logging
    study.optimize(objective, n_trials=500, callbacks=[champion_callback])

    mlflow.log_params(study.best_params)
    mlflow.log_metric("best_mse", study.best_value)
    mlflow.log_metric("best_rmse", math.sqrt(study.best_value))

    # Log tags
    mlflow.set_tags(
        tags={
            "project": "Apple Demand Project",
            "optimizer_engine": "optuna",
            "model_family": "xgboost",
            "feature_set_version": 1,
        }
    )

    # Log a fit model instance
    model = xgb.train(study.best_params, dtrain)

    # Log the correlation plot
    mlflow.log_figure(figure=correlation_plot, artifact_file="correlation_plot.png")

    # Log the feature importances plot
    importances = plot_feature_importance(model, booster=study.best_params.get("booster"))
    mlflow.log_figure(figure=importances, artifact_file="feature_importances.png")

    # Log the residuals plot
    residuals = plot_residuals(model, dvalid, valid_y)
    mlflow.log_figure(figure=residuals, artifact_file="residuals.png")

    artifact_path = "model"

    mlflow.xgboost.log_model(
        xgb_model=model,
        artifact_path=artifact_path,
        input_example=train_x.iloc[[0]],
        model_format="ubj",
        metadata={"model_data_version": 1},
    )

    # Get the logged model uri so that we can load it from the artifact store
    model_uri = mlflow.get_artifact_uri(artifact_path)

Initial trial 0 achieved value: 1593256.879424474
Trial 1 achieved value: 1593250.8071099266 with  0.0004% improvement
Trial 2 achieved value: 30990.735000917906 with  5041.0552% improvement
Trial 5 achieved value: 22804.947010998963 with  35.8948% improvement
Trial 7 achieved value: 18232.507769997483 with  25.0785% improvement
Trial 10 achieved value: 15670.64645523901 with  16.3482% improvement
Trial 11 achieved value: 15561.843005727616 with  0.6992% improvement
Trial 21 achieved value: 15144.954353687495 with  2.7527% improvement
Trial 23 achieved value: 14846.71981618512 with  2.0088% improvement
Trial 55 achieved value: 14570.287261018764 with  1.8972% improvement


  input_schema = _infer_schema(input_ex)


#### Understanding the Artifact URI in MLflow

The output 'mlflow-artifacts:/908436739760555869/c8d64ce51f754eb698a3c09239bcdcee/artifacts/model' represents a unique Uniform Resource Identifier (URI) for the trained model artifacts within MLflow. This URI is a crucial component of MLflow's architecture, and here's why:

##### Simplified Access to Model Artifacts

The `model_uri` abstracts away the underlying storage details, providing a consistent and straightforward way to reference model artifacts, regardless of where they are stored. Whether your artifacts are on a local filesystem, in a cloud storage bucket, or on a network mount, the URI remains a consistent reference point.

##### Abstraction of Storage Details

MLflow is designed to be storage-agnostic. This means that while you might switch the backend storage from, say, a local directory to an Amazon S3 bucket, the way you interact with MLflow remains consistent. The URI ensures that you don't need to know the specifics of the storage backend; you only need to reference the model's URI.

##### Associated Information and Metadata

Beyond just the model files, the URI provides access to associated metadata, the model artifact, and other logged artifacts (files and images). This ensures that you have a comprehensive set of information about the model, aiding in reproducibility, analysis, and deployment.

##### In Summary

The `model_uri` serves as a consistent, abstracted reference to your model and its associated data. It simplifies interactions with MLflow, ensuring that users don't need to worry about the specifics of underlying storage mechanisms and can focus on the machine learning workflow.


In [17]:
model_uri

'mlflow-artifacts:/908436739760555869/c28196b19e1843bca7e22f07d796e740/artifacts/model'

#### Loading the Trained Model with MLflow

With the line:

```python
loaded = mlflow.xgboost.load_model(model_uri)
```
we're leveraging MLflow's native model loader for XGBoost. Instead of using the generic pyfunc loader, which provides a universal Python function interface for models, we're using the XGBoost-specific loader.

##### Benefits of Native Loading:

- **Fidelity**: Loading the model using the native loader ensures that you're working with the exact same model object as it was during training. This means all nuances, specifics, and intricacies of the original model are preserved.

- **Functionality**: With the native model object in hand, you can utilize all of its inherent methods and properties. This allows for more flexibility, especially when you need advanced features or fine-grained control during inference.

- **Performance**: Using the native model object might offer performance benefits, especially when performing batch inference or deploying the model in environments optimized for the specific machine learning framework.

In essence, by loading the model natively, we ensure maximum compatibility and functionality, allowing for a seamless transition from training to inference.

In [18]:
loaded = mlflow.xgboost.load_model(model_uri)

Downloading artifacts:   0%|          | 0/6 [00:00<?, ?it/s]

#### Example: Batch Inference Using the Loaded Model

After loading the model natively, performing batch inference is straightforward. 

In the following cell, we're going to perform a prediction based on the entire source feature set. 
Although doing an inference action on the entire training and validation dataset features is of very limited utility in a real-world application, we'll use our generated synthetic data here to illustrate using the native model for inference. 


#### Performing Batch Inference and Augmenting Data

In this section, we're taking our entire dataset and performing batch inference using our loaded XGBoost model. We'll then append these predictions back into our original dataset to compare, analyze, or further process.

##### Steps Explained:

1. **Creating a DMatrix**: `batch_dmatrix = xgb.DMatrix(X)`: We first convert our features (`X`) into XGBoost's optimized DMatrix format. This data structure is specifically designed for efficiency and speed in XGBoost.

2. **Predictions**: `inference = loaded.predict(batch_dmatrix)`: Using the previously loaded model (`loaded`), we perform batch inference on the entire dataset.

3. **Creating a New DataFrame**: `infer_df = df.copy()`: We create a copy of the original DataFrame to ensure that we're not modifying our original data.

4. **Appending Predictions**: `infer_df["predicted_demand"] = inference`: The predictions are then added as a new column, `predicted_demand`, to this DataFrame.

##### Best Practices:

- **Always Copy Data**: When augmenting or modifying datasets, it's generally a good idea to work with a copy. This ensures that the original data remains unchanged, preserving data integrity.

- **Batch Inference**: When predicting on large datasets, using batch inference (as opposed to individual predictions) can offer significant speed improvements.

- **DMatrix Conversion**: While converting to DMatrix might seem like an extra step, it's crucial for performance when working with XGBoost. It ensures that predictions are made as quickly as possible.

In the subsequent steps, we can further analyze the differences between the actual demand and our model's predicted demand, potentially visualizing the results or calculating performance metrics.


In [19]:
batch_dmatrix = xgb.DMatrix(X)

inference = loaded.predict(batch_dmatrix)

infer_df = df.copy()

infer_df["predicted_demand"] = inference

#### Visualizing the Augmented DataFrame

Below, we display the `infer_df` DataFrame. This augmented dataset now includes both the actual demand (`demand`) and the model's predictions (`predicted_demand`). By examining this table, we can get a quick sense of how well our model's predictions align with the actual demand values.


In [20]:
infer_df

Unnamed: 0,date,average_temperature,rainfall,weekend,holiday,price_per_kg,promo,demand,previous_days_demand,competitor_price_per_kg,marketing_intensity,predicted_demand
0,2010-01-14 11:52:20.662955,30.584727,1.199291,0,0,1.726258,0,851.375336,851.276659,1.935346,0.098677,953.708496
1,2010-01-15 11:52:20.662954,15.465069,1.037626,0,0,0.576471,0,906.855943,851.276659,2.344720,0.019318,1013.409973
2,2010-01-16 11:52:20.662954,10.786525,5.656089,1,0,2.513328,0,1108.304909,906.836626,0.998803,0.409485,1152.382446
3,2010-01-17 11:52:20.662953,23.648154,12.030937,1,0,1.839225,0,1099.833810,1157.895424,0.761740,0.872803,1352.879272
4,2010-01-18 11:52:20.662952,13.861391,4.303812,0,0,1.531772,0,983.949061,1148.961007,2.123436,0.820779,1121.233032
...,...,...,...,...,...,...,...,...,...,...,...,...
4995,2023-09-18 11:52:20.659592,21.643051,3.821656,0,0,2.391010,0,1140.210762,1563.064082,1.504432,0.756489,1070.676636
4996,2023-09-19 11:52:20.659591,13.808813,1.080603,0,1,0.898693,0,1285.149505,1189.454273,1.343586,0.742145,1156.580688
4997,2023-09-20 11:52:20.659590,11.698227,1.911000,0,0,2.839860,0,965.171368,1284.407359,2.771896,0.742145,1086.527710
4998,2023-09-21 11:52:20.659589,18.052081,1.000521,0,0,1.188440,0,1368.369501,1014.429223,2.564075,0.742145,1085.064087


#### Wrapping Up: Reflecting on Our Comprehensive Machine Learning Workflow

Throughout this guide, we embarked on a detailed exploration of an end-to-end machine learning workflow. We began with data preprocessing, delved deeply into hyperparameter tuning with Optuna, leveraged MLflow for structured experiment tracking, and concluded with batch inference. 

##### Key Takeaways:

- **Hyperparameter Tuning with Optuna**: We harnessed the power of Optuna to systematically search for the best hyperparameters for our XGBoost model, aiming to optimize its performance.

- **Structured Experiment Tracking with MLflow**: MLflow's capabilities shone through as we logged experiments, metrics, parameters, and artifacts. We also explored the benefits of nested child runs, allowing us to logically group and structure our experiment iterations.

- **Model Interpretation**: Various plots and metrics equipped us with insights into our model's behavior. We learned to appreciate its strengths and identify potential areas for refinement.

- **Batch Inference**: The nuances of batch predictions on extensive datasets were explored, alongside methods to seamlessly integrate these predictions back into our primary data.

- **Logging Visual Artifacts**: A significant portion of our journey emphasized the importance of logging visual artifacts, like plots, to MLflow. These visuals serve as invaluable references, capturing the state of the model, its performance, and any alterations to the feature set that might sway the model's performance metrics.

By the end of this guide, you should possess a robust understanding of a well-structured machine learning workflow. This foundation not only empowers you to craft effective models but also ensures that each step, from data wrangling to predictions, is transparent, reproducible, and efficient.

We're grateful you accompanied us on this comprehensive journey. The practices and insights gleaned will undoubtedly be pivotal in all your future machine learning endeavors!
