MLflow is a tool to manage end-to-end machine learning pipelines. It helps you track and record various machine learning experiments. It has 4 major components. 


1. MLflow Tracking
2. MLflow Projects
3. MLflow Models
4. MLflow Model Registry

I have written about each of them in the blog. 


MLflow concepts:

MLflow Tracking is organized around the concept of runs, which are executions of some piece of data science code. Each run records the following information:

1. Code Version : Git commit hash used for the run, if it was run from an MLflow Project.

2. Start & End Time :Start and end time of the run

3. Parameters :Key-value input parameters of your choice. Both keys and values are strings.

4. Metrics : Key-value metrics, where the value is numeric. Each metric can be updated throughout the course of the run (for example, to track how your model’s loss function is converging), and MLflow records and lets you visualize the metric’s full history.

5. Artifacts : Output files such as model and others. 


MLflow runs can be recorded to
1. local files
2. to a SQLAlchemy compatible database, 
3. remotely to a tracking server.
By default, the MLflow Python API logs runs locally to files in an mlruns directory wherever you ran your program. You can then run mlflow ui to see the logged runs.

MLflow uses two components for storage: backend store and artifact store.
While the **backend store** records runs, model's parameters, metrics, tags, notes, metadata, etc), the **artifact store** records artifacts like (files, models, images, in-memory objects, or model summary, etc).



### MLflow setup:

Tracking server: no

Backend store: local filesystem

Artifacts store: local filesystem


For this example both backend store and artifact store will be done locally. 

In [2]:
import mlflow

In [3]:
print(f"tracking URI: '{mlflow.get_tracking_uri()}'")

tracking URI: 'file:///home/shivam/MLflow_examples/Experiment_tracking/mlruns'


Till now, we don't have mlruns directory. It will be created after executing next cell in the present directory only. 

In [4]:
mlflow.list_experiments()
## This will create the mlflow directory

[<Experiment: artifact_location='file:///home/shivam/MLflow_examples/Experiment_tracking/mlruns/0', experiment_id='0', lifecycle_stage='active', name='Default', tags={}>]

**Creating an example and using new run**

MLflow Tracking is organized around the concept of runs, which are executions of some piece of data science code. Each run records the following information:


1. Code Version
2. Git commit hash used for the run, if it was run from an MLflow Project.
3. Start & End Time
4. Parameters
5. Key-value input parameters of your choice. Both keys and values are strings.
6. Metrics
7. Artifacts

In [5]:
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score


In [6]:
mlflow.set_experiment("experiment-local-run")

2023/01/17 15:14:48 INFO mlflow.tracking.fluent: Experiment with name 'experiment-local-run' does not exist. Creating a new experiment.


<Experiment: artifact_location='file:///home/shivam/MLflow_examples/Experiment_tracking/mlruns/1', experiment_id='1', lifecycle_stage='active', name='experiment-local-run', tags={}>

We can log models in two ways. One using **mlflow.log_model** and one using **mlflow.log_artifacts**. We will see second flavor in a while.

In [7]:

with mlflow.start_run():

    X, y = load_iris(return_X_y=True)

    params = {"C": 0.1, "random_state": 42}
    mlflow.log_params(params)

    lr = LogisticRegression(**params).fit(X, y)
    y_pred = lr.predict(X)
    mlflow.log_metric("accuracy", accuracy_score(y, y_pred))

    mlflow.sklearn.log_model(lr, artifact_path="models")
    print(f"default artifacts URI: '{mlflow.get_artifact_uri()}'")
    mlflow.set_tag(key = "Message",value="This uses mlflow.log_model flavor to log the model")

default artifacts URI: 'file:///home/shivam/MLflow_examples/Experiment_tracking/mlruns/1/819b61d672f84cdfa675f6c0ce0a16a9/artifacts'




Now, you can use **mlflow ui** to see your experiment result.  Go to url "http://localhost:5000/#/experiments/1" to see your experiment. 

Till now for "experiment-local-run" we have one run and correpondingly one folder indise mlfruns/1/ folder. Let's now change 'C' to 0.5 and then see.



In [8]:
## let's now change c to 0.5 and then record the experiment
mlflow.set_experiment("experiment-local-run")

with mlflow.start_run():

    X, y = load_iris(return_X_y=True)

    params = {"C": 0.5, "random_state": 42}
    mlflow.log_params(params)

    lr = LogisticRegression(**params).fit(X, y)
    y_pred = lr.predict(X)
    mlflow.log_metric("accuracy", accuracy_score(y, y_pred))

    mlflow.sklearn.log_model(lr, artifact_path="models")
    print(f"default artifacts URI: '{mlflow.get_artifact_uri()}'")
    mlflow.set_tag(key = "Message",value="This uses mlflow.log_model flavor to log the model but with a different c value")
    

default artifacts URI: 'file:///home/shivam/MLflow_examples/Experiment_tracking/mlruns/1/5265c129630040a9b2ef8657d48eb84e/artifacts'


Go inside mlruns folder. Folder with name mlruns\1\ denote experiment. This has now two folder correspoinding to each run. 

In [11]:
mlflow.list_experiments()

[<Experiment: artifact_location='file:///home/shivam/MLflow_examples/Experiment_tracking/mlruns/1', experiment_id='1', lifecycle_stage='active', name='my-experiment-1', tags={}>,
 <Experiment: artifact_location='file:///home/shivam/MLflow_examples/Experiment_tracking/mlruns/0', experiment_id='0', lifecycle_stage='active', name='Default', tags={}>]

### Creating an another experiment from here onwards

Here, we will change model saving. It will use mlflow.log_artifact flavor of saving the model

In [14]:
import pickle

In [16]:
mlflow.set_experiment("experiment-local-run-diff-flavor-model")

with mlflow.start_run():

    X, y = load_iris(return_X_y=True)

    params = {"C": 0.8, "random_state": 56}
    mlflow.log_params(params)

    lr = LogisticRegression(**params).fit(X, y)
    y_pred = lr.predict(X)
    mlflow.log_metric("accuracy", accuracy_score(y, y_pred))
    with open("Local_model_folder/logistic_regression.pkl",'wb') as f:
        pickle.dump(lr,f)
    mlflow.log_artifact("Local_model_folder/logistic_regression.pkl", artifact_path="models")
    

#     mlflow.sklearn.log_model(lr, artifact_path="models")
    print(f"default artifacts URI: '{mlflow.get_artifact_uri()}'")

2023/01/17 15:48:50 INFO mlflow.tracking.fluent: Experiment with name 'experiment-local-run-diff-flavor-model' does not exist. Creating a new experiment.


default artifacts URI: 'file:///home/shivam/MLflow_examples/Experiment_tracking/mlruns/2/4cf3a14847cf47a0b2cca7c79aa184f5/artifacts'


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [18]:
mlflow.list_experiments()

[<Experiment: artifact_location='file:///home/shivam/MLflow_examples/Experiment_tracking/mlruns/1', experiment_id='1', lifecycle_stage='active', name='experiment-local-run', tags={}>,
 <Experiment: artifact_location='file:///home/shivam/MLflow_examples/Experiment_tracking/mlruns/0', experiment_id='0', lifecycle_stage='active', name='Default', tags={}>,
 <Experiment: artifact_location='file:///home/shivam/MLflow_examples/Experiment_tracking/mlruns/2', experiment_id='2', lifecycle_stage='active', name='experiment-local-run-diff-flavor-model', tags={}>]

In [19]:
print ("Atrifact uri is {}".format(str(mlflow.get_artifact_uri())))

Atrifact uri is file:///home/shivam/MLflow_examples/Experiment_tracking/mlruns/2/3534052c37904b068e7101d485316ccb/artifacts


Bonus tip: If you will go and see "UI" you will find the differences in the model tab in "experiment-local-run" experient and model tab in "experiment-local-run-diff-flavor-model". Model saving using MLFlow model flavor gives you more initial head start. This is the core idea behind **Mlflow model** module. 