<a href="https://colab.research.google.com/github/aaubs/ds-master/blob/main/notebooks/M6_MLflow_intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It provides tools for tracking experiments, packaging and sharing code, and deploying models.

MLflow provides a unified platform for managing the entire machine learning lifecycle, from experimentation to deployment. It helps to increase productivity, collaboration, and reproducibility in data science projects.

Here are the main components of MLflow:

> 1. Experiment Tracking: This component helps you to track your machine learning experiments by recording and visualizing metrics, parameters, and artifacts. It allows you to easily compare different runs and reproduce results.

> 2. Model Packaging: This component provides a simple format for packaging data science code in a reusable and reproducible way. It also allows you to specify dependencies, such as libraries and data files, and to run code in different environments. This component provides a standardized way to package and deploy machine learning models. It supports a variety of popular frameworks, such as TensorFlow, PyTorch, and Scikit-learn, and provides tools for deploying models to a variety of platforms, such as Docker containers and cloud services.

> 3. Model Registry: This component provides a centralized repository for managing and sharing machine learning models. It allows you to track model versions, assign permissions, and share models with other users.

###Experiment Tracking:


In [None]:
!pip install mlflow --q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.6/17.6 MB[0m [31m39.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m184.3/184.3 kB[0m [31m10.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m212.3/212.3 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m572.4/572.4 kB[0m [31m26.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.5/79.5 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m147.5/147.5 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m82.7/82.7 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.7/78.7 kB[0m [31m3.6 MB/s[

In [None]:
import mlflow
import numpy as np
import pandas as pd
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import Ridge

mlflow.set_experiment('BostonHousing')

# Start an MLflow run
mlflow.start_run(run_name="run_2", nested=True)

# Log parameters
mlflow.log_param("alpha", 0.5)

# Load data
data = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv")

# Split data into features and target
X = data.drop("medv", axis=1)
y = data["medv"]

# Train a Ridge regression model
model = Ridge(alpha=0.5)
model.fit(X, y)

# Log metrics
y_pred = model.predict(X)
mse = mean_squared_error(y, y_pred)
mlflow.log_metric("mse", mse)


2023/04/12 09:42:23 INFO mlflow.tracking.fluent: Experiment with name 'BostonHousing' does not exist. Creating a new experiment.


In MLflow, an experiment is a named container for a set of runs. A run is a single execution of a machine learning training or inference process.

###Model Packaging:


In [None]:
# Save the model
mlflow.sklearn.save_model(model, "model")

# End the MLflow run
# mlflow.end_run()

####Model Registry:


In [None]:
# Register the model in the MLflow registry
run_id_active = mlflow.active_run().info.run_id
model_uri = "runs:/" + run_id_active + "/model"
model_version = mlflow.register_model(model_uri, "MyModel")

Successfully registered model 'MyModel'.
2023/04/12 09:42:28 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: MyModel, version 1
Created version '1' of model 'MyModel'.


In [None]:
!mlflow experiments search -v all 

Experiment Id       Name           Artifact Location                        
------------------  -------------  -----------------------------------------
0                   Default        file:///content/mlruns/0                 
609375362595481804  BostonHousing  file:///content/mlruns/609375362595481804


In [None]:
import mlflow

# Set the name of the experiment
experiment_name = "BostonHousing"

# Get the experiment ID for the experiment with the specified name
experiment_id = mlflow.get_experiment_by_name(experiment_name).experiment_id

# Search for runs associated with the experiment ID
runs = mlflow.search_runs(experiment_ids=experiment_id)

# # Print information about each run
# for run in runs:
#     print(f"Run {run.info.run_id} completed at {run.info.end_time} with status {run.info.status}")


In [None]:
runs

Unnamed: 0,run_id,experiment_id,status,artifact_uri,start_time,end_time,metrics.mse,params.alpha,tags.mlflow.user,tags.mlflow.source.name,tags.mlflow.runName,tags.mlflow.source.type
0,52bec99a2dd145d19cbd2c9dcede1af2,609375362595481804,RUNNING,file:///content/mlruns/609375362595481804/52be...,2023-04-12 09:42:24.111000+00:00,,21.952713,0.5,root,/usr/local/lib/python3.9/dist-packages/ipykern...,run_2,LOCAL


In [None]:
import mlflow.pyfunc

# Load the model from the registry

model_uri = '/content/model'
loaded_model = mlflow.pyfunc.load_model(model_uri)

In [None]:
import mlflow

print(mlflow.__version__)

2.2.2


In [None]:
!mlflow deployments --help

Usage: mlflow deployments [OPTIONS] COMMAND
                          [ARGS]...

  Deploy MLflow models to custom targets. Run
  `mlflow deployments help --target-name <target-
  name>` for more details on the supported URI
  format and config options for a given target.
  Support is currently installed for deployment
  to: sagemaker

  See all supported deployment targets and
  installation instructions in https://mlflow.org/
  docs/latest/plugins.html#community-plugins

  You can also write your own plugin for
  deployment to a custom target. For instructions
  on writing and distributing a plugin, see https:
  //mlflow.org/docs/latest/plugins.html#writing-
  your-own-mlflow-plugins.

Options:
  --help  Show this message and exit.

Commands:
  create           Deploy the model at...
  create-endpoint  Create an endpoint with...
  delete           Delete the deployment with...
  delete-endpoint  Delete the specified...
  explain          Generate explanations of...
  get              

In [None]:
# mlflow.create_experiment('BostonHousing')
# # Get the experiment ID for the experiment with the specified name
# experiment_id = mlflow.get_experiment_by_name('BostonHousing').experiment_id
# experiment_id = experiment.experiment_id

# # Search for runs in the experiment and sort them by start time
# runs = mlflow.search_runs(experiment_ids=[experiment_id], order_by=["start_time desc"])

# # Get the run ID of the most recent run
# run_id = runs.iloc[0]["run_id"]