# **MLFlow: Unified Platform for Experiment Tracking and Model Registry**

MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It provides a suite of tools and components designed to streamline the development, experimentation, productionization, and collaboration aspects of machine learning projects. MLflow is widely used by data scientists, machine learning engineers, and researchers to track experiments, package and share code, and deploy models at scale.

## **Key Features:**
1. Experiment Tracking
2. Model Registry

<img src="img/tracking_experiments.PNG">


## **Introduction to Experiment Tracking**

#### **Why Track?**
1. Organization  
2. Optimization  
3. Reproducibility  


#### **What do you want to track for each Experiment Run?**
1. Training and Validation Data Used
2. Hyperparameters
3. Metrics
4. Models
5. Training Time
6. Model Size


#### **Tool - MLFlow**  
MLFlow helps you to organize your experiments into runs.


#### **Terminologies:**
1. Experiment  
2. Run  
3. Metadata  (i.e. Tags, Parameters, Metrics)  
4. Artifacts (i.e. Output files associated with experiment runs)


#### **MLFlow keeps track of:**
> Tags  
> Parameters  
> Metrics  
> Models  
> Artifact  
> Source code, Start and End Time, Authors etc..


**Run below mentioned commands to install mlflow on your system:**  
```
pip install mlflow
```

**Run the following commnd to open the MLFlow Dashboard**
```
mlflow ui
mlflow ui --backend-store-uri sqlite:///mlflow.db
```

## **Loading the Data**

In [2]:
pip install seaborn

Collecting seaborn
  Downloading seaborn-0.13.2-py3-none-any.whl (294 kB)
     -------------------------------------- 294.9/294.9 KB 1.4 MB/s eta 0:00:00
Installing collected packages: seaborn
Successfully installed seaborn-0.13.2
Note: you may need to restart the kernel to use updated packages.


You should consider upgrading via the 'G:\Innomatics Internship\Task -  Experiment Tracking and Model Management\.env_mlflow\Scripts\python.exe -m pip install --upgrade pip' command.


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
iris = pd.read_csv('data/iris.csv')

In [3]:
iris.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


In [5]:
print(iris.shape)

(150, 6)


## **Running the Experiment**

In [6]:
from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler, MinMaxScaler

from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB

from sklearn.model_selection import GridSearchCV

from sklearn.pipeline import Pipeline

In [7]:
import warnings

warnings.filterwarnings('ignore')

In [8]:
X = iris[['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']]

y = iris['Species']

In [9]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

print(X_train.shape, X_test.shape)

(112, 4) (38, 4)


## **Auto Logging KNN Experiment Run using MLFlow**

**Step 1 - Import MLFlow and set the experiment name**
```python
import mlflow

mlflow.set_experiment("EXPERIMENT_NAME")
```

**Step 2 - Start the auto logger**
```python
mlflow.sklearn.autolog()

# Initialize the auto logger
# max_tuning_runs=None will make sure that all the runs are recorded.
# By default top 5 runs will be recorded for each experiment
```
**Step 3 - Start the experiment run**
```python
with mlflow.start_run() as run:
    clf.fit(X_train, y_train)
```



<img src="img/tracking_experiments_hyperparameters.JPG">

In [22]:
import mlflow

mlflow.set_experiment("iris_species_prediction")

<Experiment: artifact_location='file:///G:/Innomatics%20Internship/Task%20-%20%20Experiment%20Tracking%20and%20Model%20Management/mlruns/338614161403077691', creation_time=1713371809282, experiment_id='338614161403077691', last_update_time=1713371809282, lifecycle_stage='active', name='iris_species_prediction', tags={}>

In [23]:
# Define pipeline steps
pipe_1 = Pipeline(
    [
        ('scaler', StandardScaler()),
        ('classifier', KNeighborsClassifier())
    ]
)


# Observe the Key Value Pair format
parameter_grid_1 = [
    {
        'scaler': [StandardScaler(), MinMaxScaler()],
        'classifier__n_neighbors' : [i for i in range(3, 21, 2)],              
        'classifier__p' : [1, 2, 3]
    }
]

In [24]:
clf = GridSearchCV(
    estimator=pipe_1, 
    param_grid=parameter_grid_1, 
    scoring='accuracy',
    cv=5,
    return_train_score=True,
    verbose=1
)

# Initialize the auto logger
# max_tuning_runs=None will make sure that all the runs are recorded.
# By default top 5 runs will be recorded for each experiment
mlflow.sklearn.autolog(max_tuning_runs=None)

with mlflow.start_run() as run:
    %time clf.fit(X_train, y_train)

Fitting 5 folds for each of 54 candidates, totalling 270 fits
CPU times: total: 21 s
Wall time: 24 s


## **Auto Logging SVM Experiment Run using MLFlow**

In [25]:
pipe_2 = Pipeline(
    [
        ('scaler', StandardScaler()),
        ('classifier', SVC())
    ]
)


# Observe the Key Value Pair format
parameter_grid_2 = [
    {
        'scaler': [StandardScaler(), MinMaxScaler()],
        'classifier__kernel' : ['rbf'], 
        'classifier__C' : [0.1, 0.01, 1, 10, 100]
    }, 
    {
        'scaler': [StandardScaler(), MinMaxScaler()],
        'classifier__kernel' : ['poly'], 
        'classifier__degree' : [2, 3, 4, 5], 
        'classifier__C' : [0.1, 0.01, 1, 10, 100]
    }, 
    {
        'scaler': [StandardScaler(), MinMaxScaler()],
        'classifier__kernel' : ['linear'], 
        'classifier__C' : [0.1, 0.01, 1, 10, 100]
    }
]

In [15]:
clf = GridSearchCV(
    estimator=pipe_2, 
    param_grid=parameter_grid_2, 
    scoring='accuracy',
    cv=5,
    return_train_score=True,
    verbose=1
)

# Initialize the auto logger
# max_tuning_runs=None will make sure that all the runs are recorded.
# By default top 5 runs will be recorded for each experiment
mlflow.sklearn.autolog(max_tuning_runs=None)

with mlflow.start_run() as run:
    %time clf.fit(X_train, y_train)

Fitting 5 folds for each of 60 candidates, totalling 300 fits
CPU times: total: 22.3 s
Wall time: 31.5 s


## **Auto Logging All Experiment Runs using MLFlow**

In [26]:
pipelines = {
    'knn' : Pipeline([
        ('scaler', StandardScaler()),
        ('classifier', KNeighborsClassifier())
    ]), 
    'svc' : Pipeline([
        ('scaler', StandardScaler()),
        ('classifier', SVC())
    ]),
    'logistic_regression': Pipeline([
        ('scaler', StandardScaler()),
        ('classifier', LogisticRegression())
    ]),
    'random_forest': Pipeline([
        ('scaler', StandardScaler()),
        ('classifier', RandomForestClassifier())
    ]),
    'decision_tree': Pipeline([
        ('scaler', StandardScaler()),
        ('classifier', DecisionTreeClassifier())
    ]),
    'naive_bayes': Pipeline([
        ('scaler', StandardScaler()),
        ('classifier', GaussianNB())
    ])
}

# Define parameter grid for each algorithm
param_grids = {
    'knn': [
        {
            'scaler': [StandardScaler(), MinMaxScaler()],
            'classifier__n_neighbors' : [i for i in range(3, 21, 2)], 
            'classifier__p' : [1, 2, 3]
        }
    ],
    'svc': [
        {
            'scaler': [StandardScaler(), MinMaxScaler()],
            'classifier__kernel' : ['rbf'], 
            'classifier__C' : [0.1, 0.01, 1, 10, 100]
        }, 
        {
            'scaler': [StandardScaler(), MinMaxScaler()],
            'classifier__kernel' : ['poly'], 
            'classifier__degree' : [2, 3, 4, 5], 
            'classifier__C' : [0.1, 0.01, 1, 10, 100]
        }, 
        {
            'scaler': [StandardScaler(), MinMaxScaler()],
            'classifier__kernel' : ['linear'], 
            'classifier__C' : [0.1, 0.01, 1, 10, 100]
        }
    ],
    'logistic_regression': [
        {
            'scaler': [StandardScaler(), MinMaxScaler()],
            'classifier__C': [0.1, 1, 10], 
            'classifier__penalty': ['l2']
        }, 
        {
            'scaler': [StandardScaler(), MinMaxScaler()],
            'classifier__C': [0.1, 1, 10], 
            'classifier__penalty': ['l1'], 
            'classifier__solver': ['liblinear']
        }, 
        {
            'scaler': [StandardScaler(), MinMaxScaler()],
            'classifier__C': [0.1, 1, 10], 
            'classifier__penalty': ['elasticnet'], 
            'classifier__l1_ratio': [0.4, 0.5, 0.6],
            'classifier__solver': ['saga']
        }
    ],
    'random_forest': [
        {
            'scaler': [StandardScaler(), MinMaxScaler()],
            'classifier__n_estimators': [50, 100, 200]
        }
    ],
    'decision_tree': [
        {
            'scaler': [StandardScaler(), MinMaxScaler()],
            'classifier__max_depth': [None, 5, 10]
        }
    ],
    'naive_bayes': [
        {
            'scaler': [StandardScaler(), MinMaxScaler()]
        }
    ]
}

In [17]:
best_models = {}

# Run the Pipeline
for algo in pipelines.keys():
    print("*"*10, algo, "*"*10)
    grid_search = GridSearchCV(estimator=pipelines[algo], 
                               param_grid=param_grids[algo], 
                               cv=5, 
                               scoring='accuracy', 
                               return_train_score=True,
                               verbose=1
                              )
    
    mlflow.sklearn.autolog(max_tuning_runs=None)
    
    with mlflow.start_run() as run:
        %time grid_search.fit(X_train, y_train)
        
    print('Train Score: ', grid_search.best_score_)
    print('Test Score: ', grid_search.score(X_test, y_test))
    
    best_models[algo] = grid_search.best_estimator_
    print()

********** knn **********
Fitting 5 folds for each of 54 candidates, totalling 270 fits
CPU times: total: 14.4 s
Wall time: 19 s
Train Score:  0.9644268774703558
Test Score:  0.9736842105263158

********** svc **********
Fitting 5 folds for each of 60 candidates, totalling 300 fits
CPU times: total: 12.9 s
Wall time: 18 s
Train Score:  0.9644268774703558
Test Score:  0.9736842105263158

********** logistic_regression **********
Fitting 5 folds for each of 30 candidates, totalling 150 fits
CPU times: total: 6.97 s
Wall time: 11.7 s
Train Score:  0.9640316205533598
Test Score:  0.9736842105263158

********** random_forest **********
Fitting 5 folds for each of 6 candidates, totalling 30 fits
CPU times: total: 12.6 s
Wall time: 18.8 s
Train Score:  0.9553359683794467
Test Score:  0.9736842105263158

********** decision_tree **********
Fitting 5 folds for each of 6 candidates, totalling 30 fits
CPU times: total: 1.86 s
Wall time: 7.19 s
Train Score:  0.9640316205533598
Test Score:  0.97368

In [27]:
# Stop the auto logger

mlflow.sklearn.autolog(disable=True)

## **Custom Experiment Tracking and Database Integration with MLFlow**

**Step 1 - Import MLFlow**
```python
import mlflow
```

**Step 2 - Set the tracker and experiment**
```python
mlflow.set_tracking_uri(DATABASE_URI)
mlflow.set_experiment("EXPERIMENT_NAME")
```

**Step 3 - Start a experiment run**
```python
with mlflow.start_run():
```

**Step 4 - Logging the metadata**
```python
mlflow.set_tag(KEY, VALUE) 
mlflow.log_param(KEY, VALUE) 
mlflow.log_metric(KEY, VALUE)
```

**Step 5 - Logging the model and other files (2 ways)**  
Way 1 -
```python
mlflow.<FRAMEWORK>.log_model(MODEL_OBJECT, artifact_path="PATH")
```  
Way 2 - 
```python
mlflow.log_artifact(LOCAL_PATH, artifact_path="PATH")
```

In [19]:
import time
import joblib
import os

In [18]:
# mlflow.set_tracking_uri("sqlite:///mlflow_1.db")

# mlflow.set_experiment("Iris Species Prediction")

In [21]:
dev = "Kanav Bansal"
best_models = {}

for algo in pipelines.keys():
    print("*"*10, algo, "*"*10)
    grid_search = GridSearchCV(estimator=pipelines[algo], 
                               param_grid=param_grids[algo], 
                               cv=5, 
                               scoring='accuracy', 
                               return_train_score=True,
                               verbose=1
                              )

    # Fit
    start_fit_time = time.time()
    grid_search.fit(X_train, y_train)
    end_fit_time = time.time()

    # Predict
    start_predict_time = time.time()
    y_pred = grid_search.predict(X_test)
    end_predict_time = time.time()

    # Saving the best model
    joblib.dump(grid_search.best_estimator_, f'best_models/{algo}.pkl')
    model_size = os.path.getsize(f'best_models/{algo}.pkl')

    # Pring Log
    print('Train Score: ', grid_search.best_score_)
    print('Test Score: ', grid_search.score(X_test, y_test))
    print("Fit Time: ", end_fit_time - start_fit_time)
    print("Predict Time: ", end_predict_time - start_predict_time)
    print("Model Size: ", model_size)
    
    print()

    # Start the experiment run
    with mlflow.start_run() as run:
        # Log tags with mlflow.set_tag()
        mlflow.set_tag("developer", dev)

        # Log Parameters with mlflow.log_param()
        mlflow.log_param("algorithm", algo)
        mlflow.log_param("hyperparameter_grid", param_grids[algo])
        mlflow.log_param("best_hyperparameter", grid_search.best_params_)

        # Log Metrics with mlflow.log_metric()
        mlflow.log_metric("train_score", grid_search.best_score_)
        mlflow.log_metric("test_score", grid_search.score(X_test, y_test))
        mlflow.log_metric("fit_time", end_fit_time - start_fit_time)
        mlflow.log_metric("predict_time", end_predict_time - start_predict_time)
        mlflow.log_metric("model_size", model_size)

        # Log Model using mlflow.sklearn.log_model()
        mlflow.sklearn.log_model(grid_search.best_estimator_, f"{algo}_model")

********** knn **********
Fitting 5 folds for each of 54 candidates, totalling 270 fits
Train Score:  0.9644268774703558
Test Score:  0.9736842105263158
Fit Time:  6.395898103713989
Predict Time:  0.03989386558532715
Model Size:  11965

********** svc **********
Fitting 5 folds for each of 60 candidates, totalling 300 fits
Train Score:  0.9644268774703558
Test Score:  0.9736842105263158
Fit Time:  3.4437897205352783
Predict Time:  0.001995086669921875
Model Size:  5634

********** logistic_regression **********
Fitting 5 folds for each of 30 candidates, totalling 150 fits
Train Score:  0.9640316205533598
Test Score:  0.9736842105263158
Fit Time:  2.0594899654388428
Predict Time:  0.0029954910278320312
Model Size:  2142

********** random_forest **********
Fitting 5 folds for each of 6 candidates, totalling 30 fits
Train Score:  0.9553359683794467
Test Score:  0.9736842105263158
Fit Time:  8.827395915985107
Predict Time:  0.0069806575775146484
Model Size:  80839

********** decision_tre

## **Introduction to Model Registry**

Model Registry provides functionality for managing and versioning machine learning models and their associated metadata. It allows data scientists and machine learning engineers to track, share, and collaborate on models throughout their lifecycle, from experimentation to production deployment.

Key Features:
1. Model Registration
2. Model Versioning
3. Stage Transitions
4. Intra Team Collaboration

#### **Model Versioning**  
In the context of software development and deployment, "Archived," "Staged," and "Production" are typically used to denote different stages or environments in the software lifecycle. These tags help developers and teams manage the lifecycle of software releases and ensure smooth transitions between different stages of development and deployment. Here's a brief explanation of each:

1. **Archived**: These versions are no longer in active use.
   - This tag is usually associated with versions of software or code that have been retired or deprecated. 
   - Archived versions are no longer actively maintained or used in production environments.
   - Archived versions may be kept for historical reference or auditing purposes but are not intended for active use.

2. **Staged**: These versions are ready for deployment pending final validation.
   - The "Staged" tag is often used to represent versions of software or code that have undergone testing and are ready for deployment to a production environment.
   - Staged versions have typically passed through development, testing, and quality assurance stages and are considered stable and reliable.
   - In some development workflows, staged versions may be deployed to a pre-production or staging environment for final validation before being promoted to production.

3. **Production**: These versions are actively serving users in live environments.
   - The "Production" tag refers to versions of software or code that are actively running in a live environment and serving end-users or customers.
   - Production versions are expected to be stable, performant, and reliable, as they are handling real-world traffic and interactions.
   - Changes to production versions often follow strict release procedures and may involve deployment strategies such as blue-green deployment or canary releases to minimize disruptions.
  

<img src="img/model_management.PNG">
