Iris Classification with MLflow and Optuna

This project demonstrates how to perform hyperparameter tuning for a logistic regression model using the Iris dataset. The code leverages MLflow for experiment tracking and Optuna for hyperparameter optimization.

Project Structure

iris_classification.py: Main script for training and tuning the logistic regression model.
mlflow_utils.py: Utility functions for MLflow.
optuna_utils.py: Utility functions for Optuna.
blob_storage_deploy.py: Script for uploading files to Azure Blob Storage.
start_mlflow_server.sh: Script to start the MLflow server.
mlartifacts/: Directory containing MLflow artifacts.
mlruns/: Directory containing MLflow run data.
models/: Directory containing registered models.

Requirements

Python 3.x
MLflow
Optuna
scikit-learn
Azure Storage Blob

Setup

Install the required Python packages:

pip install mlflow optuna scikit-learn azure-storage-blob

Start the MLflow server:
```
./start_mlflow_server.sh
```

Running the Scripts

Iris Classification

To run the iris_classification.py script, execute the following command:

python iris_classification.py

Blob Storage Deployment

To run the blob_storage_deploy.py script, execute the following command:

python blob_storage_deploy.py

Script Explanations

`iris_classification.py`

The iris_classification.py script performs the following steps:

Import Libraries: Imports necessary libraries including MLflow, Optuna, and scikit-learn.
Load Dataset: Loads the Iris dataset using scikit-learn's load_iris function.

Split Data: Splits the dataset into training and validation sets.

X_train, X_valid, y_train, y_valid = train_test_split(
    iris.data, iris.target, test_size=0.3
)

Set MLflow Experiment: Sets the current active MLflow experiment using the get_or_create_experiment function from mlflow_utils.py.
```
experiment_id = get_or_create_experiment("Iris Classification")
mlflow.set_experiment(experiment_id=experiment_id)
```

Start MLflow Run: Initiates an MLflow run and creates an Optuna study for hyperparameter tuning.

with mlflow.start_run(nested=True):
    study = optuna.create_study(
        direction="maximize",
        study_name="Iris Classification",
        load_if_exists=True,
    )

Optimize Hyperparameters: Uses Optuna to optimize the hyperparameters of a logistic regression model. The logistic_regression_error function from optuna_utils.py is used as the objective function.
```
study.optimize(
    lambda trial: logistic_regression_error(
        trial, X_train, X_valid, y_train, y_valid
    ),
    n_trials=10,
)
```

Load Best Model: Loads the best model of the study from a file if it exists.

if os.path.exists("best_model.pkl"):
    best_model = joblib.load("best_model.pkl")

Log Parameters and Metrics: Logs the best hyperparameters and the best accuracy to MLflow.
Set Tags: Logs tags related to the project, optimizer engine, model family, and feature set version.
Train Model: Trains a logistic regression model using the best hyperparameters found by Optuna.
Log Model: Logs the trained model as an artifact in MLflow.
Print Model URI: Prints the URI of the logged model.
Evaluate Model: Evaluates the model on the validation set and logs the evaluation metrics to MLflow.

`blob_storage_deploy.py`

The blob_storage_deploy.py script performs the following steps:

Import Libraries: Imports necessary libraries including os and azure.storage.blob.
Define Function: Defines the upload_directory_to_blob function to upload a directory to Azure Blob Storage.
Connect to Azure Blob Storage: Connects to Azure Blob Storage using the connection string from the environment variable AZURE_STORAGE_CONNECTION_STRING.
Recursively Upload Files: Recursively uploads each file in the specified local directory to Azure Blob Storage, excluding certain files like requirements.txt, python_env.yaml, and conda.yaml.
Get Blob Client: Gets the blob client for each file and uploads the file to the specified container and blob name.
Handle Exceptions: Handles any exceptions that occur during the upload process and prints an error message.

Utility Functions

mlflow_utils.py: Contains the get_or_create_experiment function to manage MLflow experiments.
optuna_utils.py: Contains the champion_callback and logistic_regression_error functions for Optuna optimization.

Output in MLflow Dashboard

When you run the iris_classification.py script, the following will be logged and displayed in the MLflow dashboard:

Experiment Creation/Loading: An experiment will be created or loaded in MLflow. This experiment will contain all the runs related to the Iris classification project.
Nested Runs: A nested run will be logged within the experiment. This parent run will encapsulate several child runs, each corresponding to a training run performed during the Optuna hyperparameter tuning process.
Training Runs: At the lowest level, there will be multiple training runs logged by Optuna. Each run will represent a different set of hyperparameters evaluated during the tuning process. Metrics such as accuracy and loss will be logged for each run.
Best Parameters Logging: Once Optuna identifies the best hyperparameters, these parameters will be logged in the parent run. The parent run will take the name of the best trial. This includes the best hyperparameters, the corresponding performance metrics, and the model itself so there is no need to retrain.
Model Registration: The best model will be loaded from the best trial and registered as a new artifact in MLflow. The model will have a default validation status set to "pending".
Tags and Metrics: Various tags and metrics will be logged for both the parent and child runs, providing detailed information about the experiment, optimizer engine, model family, and feature set version.

By navigating to the MLflow dashboard, you can visualize and analyze the experiment, nested runs, and the performance of different hyperparameter configurations. The registered model can be accessed and managed through the MLflow model registry.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Iris Classification with MLflow and Optuna

Project Structure

Requirements

Setup

Running the Scripts

Iris Classification

Blob Storage Deployment

Script Explanations

`iris_classification.py`

`blob_storage_deploy.py`

Utility Functions

Output in MLflow Dashboard

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
blob_storage_deploy.py		blob_storage_deploy.py
iris_classification.py		iris_classification.py
mlflow_utils.py		mlflow_utils.py
optuna_utils.py		optuna_utils.py
start_mlflow_server.sh		start_mlflow_server.sh

Folders and files

Latest commit

History

Repository files navigation

Iris Classification with MLflow and Optuna

Project Structure

Requirements

Setup

Running the Scripts

Iris Classification

Blob Storage Deployment

Script Explanations

iris_classification.py

blob_storage_deploy.py

Utility Functions

Output in MLflow Dashboard

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`iris_classification.py`

`blob_storage_deploy.py`

Packages