<h1>Part 4 - Experiment Tracking</h1>

# Experiment Tracking and Model Management with MLFlow

## Exploring MLFlow

MLflow setup:
* Tracking server: no
* Backend store: local filesystem
* Artifacts store: local filesystem

The experiments can be explored locally by launching the MLflow UI.

Let's print the tracking server URI, where the experiments and runs are going to be logged. We observe it refers to a local path.

In [31]:
import mlflow

print(f"tracking URI: '{mlflow.get_tracking_uri()}'")

tracking URI: 'file:///c:/Users/moume/Downloads/ESVIL/MLO/esilv-mlops-crashcourse-24/lessons/01-model-and-experiment-management/mlruns'


After this initialization, we can connect create a client to connect to the API and see what experiments are present.

By refering to mlflow's [documentation](https://mlflow.org/docs/latest/python_api/mlflow.client.html), create a client and display a list of the available experiments using the search_experiments function. This function could prove useful later to programatically explore experiments (rather than in the UI)

In [32]:
from mlflow.tracking import MlflowClient

client = MlflowClient()

experiments = client.search_experiments()

print(experiments)


[<Experiment: artifact_location='file:///c:/Users/moume/Downloads/MLO/esilv-mlops-crashcourse-24/lessons/01-model-and-experiment-management/mlruns/611815272231168090', creation_time=1729526927524, experiment_id='611815272231168090', last_update_time=1729526927524, lifecycle_stage='active', name='NY Taxi Expeirment', tags={}>, <Experiment: artifact_location='file:///c:/Users/moume/Downloads/MLO/esilv-mlops-crashcourse-24/lessons/01-model-and-experiment-management/mlruns/682279027198655690', creation_time=1729098690606, experiment_id='682279027198655690', last_update_time=1729098690606, lifecycle_stage='active', name='iris-experiment-1', tags={}>, <Experiment: artifact_location='file:///c:/Users/moume/Downloads/MLO/esilv-mlops-crashcourse-24/lessons/01-model-and-experiment-management/mlruns/0', creation_time=1729098218598, experiment_id='0', last_update_time=1729098218598, lifecycle_stage='active', name='Default', tags={}>]


We see that there is a default experiment for which the runs are stored locally in the mlruns folder.

### Creating an experiment and logging a new run

An experiment is a logical entity regrouping the logs of multiple attempts at solving a same problem, called runs. \
We will now work with the classic sklearn dataset iris. Our goal here is to manage to classify the different iris species. To track our models performance, we will log every attempt as a "run" and create a new experiment "iris-experiment-1" to regroup them.

Lookup the mlflow.run and mlflow.start_run functions [here](https://mlflow.org/docs/latest/python_api/mlflow.html?highlight=start_run#mlflow.start_run) to find out how to manage runs.
Explore [this part](https://mlflow.org/docs/latest/python_api/mlflow.html) to learn more about the log_params, log_metrics and log_artifact functions. Find out how to log sklearn models [here](https://mlflow.org/docs/latest/python_api/mlflow.sklearn.html])

Complete the following in order to log the parameters, interesting metrics and the model.

In [33]:
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

mlflow.set_experiment("iris-experiment-1")

with mlflow.start_run() as run:
    run_id = run.info.run_id

    X, y = load_iris(return_X_y=True)

    params = {"C": 0.1, "random_state": 42}
    mlflow.log_params(params)

    model = LogisticRegression(**params).fit(X, y)
    y_pred = model.predict(X)
    
    acc = accuracy_score(y,y_pred)

    mlflow.log_metric("accuracy",acc)

    print(f"default artifacts URI: '{mlflow.get_artifact_uri()}'")

default artifacts URI: 'file:///c:/Users/moume/Downloads/MLO/esilv-mlops-crashcourse-24/lessons/01-model-and-experiment-management/mlruns/682279027198655690/080d29f8a6724927adc0b419bfacf197/artifacts'


In [34]:
experiments = client.search_experiments()
experiments

[<Experiment: artifact_location='file:///c:/Users/moume/Downloads/MLO/esilv-mlops-crashcourse-24/lessons/01-model-and-experiment-management/mlruns/611815272231168090', creation_time=1729526927524, experiment_id='611815272231168090', last_update_time=1729526927524, lifecycle_stage='active', name='NY Taxi Expeirment', tags={}>,
 <Experiment: artifact_location='file:///c:/Users/moume/Downloads/MLO/esilv-mlops-crashcourse-24/lessons/01-model-and-experiment-management/mlruns/682279027198655690', creation_time=1729098690606, experiment_id='682279027198655690', last_update_time=1729098690606, lifecycle_stage='active', name='iris-experiment-1', tags={}>,
 <Experiment: artifact_location='file:///c:/Users/moume/Downloads/MLO/esilv-mlops-crashcourse-24/lessons/01-model-and-experiment-management/mlruns/0', creation_time=1729098218598, experiment_id='0', last_update_time=1729098218598, lifecycle_stage='active', name='Default', tags={}>]

Try running the training script with various parameters to have runs to compare.
You can now explore your run(s) using the ui: \
(Paste "mlflow ui --host 0.0.0.0 --port 5002" in your terminal, or run the cell below)

**N.B.** Make sure you are in the lecture folder and not the repo root!

In [35]:
!mlflow ui --host 127.0.0.1 --port 5002

  from google.protobuf import service as _service
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\moume\miniconda3\Scripts\mlflow.exe\__main__.py", line 4, in <module>
  File "C:\Users\moume\miniconda3\Lib\site-packages\mlflow\__init__.py", line 32, in <module>
    import mlflow.tracking._model_registry.fluent
  File "C:\Users\moume\miniconda3\Lib\site-packages\mlflow\tracking\__init__.py", line 8, in <module>
    from mlflow.tracking.client import MlflowClient
  File "C:\Users\moume\miniconda3\Lib\site-packages\mlflow\tracking\client.py", line 16, in <module>
    from mlflow.entities import Experiment, Run, RunInfo, Param, Metric, RunTag, FileInfo, ViewType
  File "C:\Users\moume\miniconda3\Lib\site-packages\mlflow\entities\__init__.py", line 6, in <module>
    from mlflow.entities.experiment import Experiment
  File "C:\Users\moume\miniconda3\Lib\site-packages\mlflow\entities\

You will have to kill the cell to continue experimenting

### Interacting with the model registry

If you are satisfied with the last run's model, you can transform the logged model into a registered model. It will be logged in the Model Registry, which makes it easier to use in production and manage versions.

In [36]:
# We already have our run id from above. Let's use it to register the model

result = mlflow.register_model(f"runs:/{run_id}/models", "iris_lr_model")

Registered model 'iris_lr_model' already exists. Creating a new version of this model...
Created version '5' of model 'iris_lr_model'.


# Use Case

Now we will get back to our taxi rides use case: 

In [37]:
import pandas as pd
import seaborn as sns
import numpy as np

from sklearn.feature_extraction import DictVectorizer
from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error

from typing import List
from scipy.sparse import csr_matrix

## 0 - Download Data

In [38]:
!pip install gdown




[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: C:\Users\moume\AppData\Local\Programs\Python\Python312\python.exe -m pip install --upgrade pip


In [39]:
import gdown
import os

DATA_FOLDER = "../../data"
train_path = f"{DATA_FOLDER}/yellow_tripdata_2021-01.parquet"
test_path = f"{DATA_FOLDER}/yellow_tripdata_2021-02.parquet"
predict_path = f"{DATA_FOLDER}/yellow_tripdata_2021-03.parquet"


if not os.path.exists(DATA_FOLDER):
    os.makedirs(DATA_FOLDER)
    print(f"New directory {DATA_FOLDER} created!")

gdown.download(
    "https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet",
    train_path,
    quiet=False,
)
gdown.download(
    "https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-02.parquet",
    test_path,
    quiet=False,
)
gdown.download(
    "https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-03.parquet",
    predict_path,
    quiet=False,
)

Downloading...
From: https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet
To: c:\Users\moume\Downloads\ESVIL\MLO\esilv-mlops-crashcourse-24\data\yellow_tripdata_2021-01.parquet
100%|██████████| 21.7M/21.7M [00:02<00:00, 10.4MB/s]
Downloading...
From: https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-02.parquet
To: c:\Users\moume\Downloads\ESVIL\MLO\esilv-mlops-crashcourse-24\data\yellow_tripdata_2021-02.parquet
100%|██████████| 21.8M/21.8M [00:02<00:00, 10.7MB/s]
Downloading...
From: https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-03.parquet
To: c:\Users\moume\Downloads\ESVIL\MLO\esilv-mlops-crashcourse-24\data\yellow_tripdata_2021-03.parquet
100%|██████████| 30.0M/30.0M [00:02<00:00, 13.2MB/s]


'../../data/yellow_tripdata_2021-03.parquet'

## 1 - Load data

In [40]:
def load_data(path: str):
    return pd.read_parquet(path)


train_df = load_data(train_path)
train_df.head()

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,airport_fee
0,1,2021-01-01 00:30:10,2021-01-01 00:36:12,1.0,2.1,1.0,N,142,43,2,8.0,3.0,0.5,0.0,0.0,0.3,11.8,2.5,
1,1,2021-01-01 00:51:20,2021-01-01 00:52:19,1.0,0.2,1.0,N,238,151,2,3.0,0.5,0.5,0.0,0.0,0.3,4.3,0.0,
2,1,2021-01-01 00:43:30,2021-01-01 01:11:06,1.0,14.7,1.0,N,132,165,1,42.0,0.5,0.5,8.65,0.0,0.3,51.95,0.0,
3,1,2021-01-01 00:15:48,2021-01-01 00:31:01,0.0,10.6,1.0,N,138,132,1,29.0,0.5,0.5,6.05,0.0,0.3,36.35,0.0,
4,2,2021-01-01 00:31:49,2021-01-01 00:48:21,1.0,4.94,1.0,N,68,33,1,16.5,0.5,0.5,4.06,0.0,0.3,24.36,2.5,


## 2 - Prepare the data

Let's prepare the data to make it Machine Learning ready. \
For this, we need to clean it, compute the target (what we want to predict), and compute some features to help the model understand the data better.

### 2-1 Compute the target

We want to predict a taxi trip duration in minutes. Let's compute it as a difference between the drop-off time and the pick-up time for each trip.

In [41]:
def compute_target(
    df: pd.DataFrame,
    pickup_column: str = "tpep_pickup_datetime",
    dropoff_column: str = "tpep_dropoff_datetime",
) -> pd.DataFrame:
    df["duration"] = df[dropoff_column] - df[pickup_column]
    df["duration"] = df["duration"].dt.total_seconds() / 60
    return df


train_df = compute_target(train_df)

In [42]:
train_df["duration"].describe()

count    1.369769e+06
mean     1.391168e+01
std      1.312006e+02
min     -1.350846e+05
25%      5.566667e+00
50%      9.066667e+00
75%      1.461667e+01
max      2.881770e+04
Name: duration, dtype: float64

Let's remove outliers and reduce the scope to trips between 1 minute and 1 hour

In [43]:
MIN_DURATION = 1
MAX_DURATION = 60


def filter_outliers(df: pd.DataFrame, min_duration: int = 1, max_duration: int = 60) -> pd.DataFrame:
    return df[df["duration"].between(min_duration, max_duration)]


train_df = filter_outliers(train_df)

### 2-2 Prepare features

#### 2-2-1 Categorical features

Most machine learning models don't work with categorical features. Because of this, they must be transformed so that the ML model can consume them.

In [44]:
CATEGORICAL_COLS = ["PUlocationID", "DOlocationID"]


def encode_categorical_cols(df: pd.DataFrame, categorical_cols: List[str] = None) -> pd.DataFrame:
    if categorical_cols is None:
        categorical_cols = ["PULocationID", "DOLocationID", "passenger_count"]
    df[categorical_cols] = df[categorical_cols].fillna(-1).astype("int")
    df[categorical_cols] = df[categorical_cols].astype("str")
    return df


train_df = encode_categorical_cols(train_df)

In [45]:
def extract_x_y(
    df: pd.DataFrame,
    categorical_cols: List[str] = None,
    dv: DictVectorizer = None,
    with_target: bool = True,
) -> dict:

    if categorical_cols is None:
        categorical_cols = ["PULocationID", "DOLocationID", "passenger_count"]
    dicts = df[categorical_cols].to_dict(orient="records")

    y = None
    if with_target:
        if dv is None:
            dv = DictVectorizer()
            dv.fit(dicts)
        y = df["duration"].values

    x = dv.transform(dicts)
    return x, y, dv


X_train, y_train, dv = extract_x_y(train_df)

## 3 - Train model

We train a basic linear regression model to have a baseline performance

In [46]:
def train_model(x_train: csr_matrix, y_train: np.ndarray):
    lr = LinearRegression()
    lr.fit(x_train, y_train)
    return lr


model = train_model(X_train, y_train)

## 4 - Evaluate model

We evaluate the model on train and test data

### 4-1 On train data

In [47]:
def predict_duration(input_data: csr_matrix, model: LinearRegression):
    return model.predict(input_data)


def evaluate_model(y_true: np.ndarray, y_pred: np.ndarray):
    return mean_squared_error(y_true, y_pred, squared=False)

    
prediction = predict_duration(X_train, model)
train_me = evaluate_model(y_train, prediction)
train_me

6.782412053170702

### 4-2 On test data

In [48]:
test_df = load_data(test_path)

test_df.head()

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,airport_fee
0,1,2021-02-01 00:40:47,2021-02-01 00:48:28,1.0,2.3,1.0,N,141,226,2,8.5,3.0,0.5,0.0,0.0,0.3,12.3,2.5,
1,1,2021-02-01 00:07:44,2021-02-01 00:20:31,1.0,1.6,1.0,N,43,263,2,9.5,3.0,0.5,0.0,0.0,0.3,13.3,0.0,
2,1,2021-02-01 00:59:36,2021-02-01 01:24:13,1.0,5.3,1.0,N,114,263,2,19.0,3.0,0.5,0.0,0.0,0.3,22.8,2.5,
3,2,2021-02-01 00:03:26,2021-02-01 00:16:32,1.0,2.79,1.0,N,236,229,1,11.0,0.5,0.5,2.96,0.0,0.3,17.76,2.5,
4,2,2021-02-01 00:20:20,2021-02-01 00:24:03,2.0,0.64,1.0,N,229,140,1,4.5,0.5,0.5,1.66,0.0,0.3,9.96,2.5,


In [49]:
test_df = compute_target(test_df)
print(test_df.head())
test_df = encode_categorical_cols(test_df)
print(test_df.head())
X_test, y_test, _ = extract_x_y(test_df, dv=dv)

   VendorID tpep_pickup_datetime tpep_dropoff_datetime  passenger_count  \
0         1  2021-02-01 00:40:47   2021-02-01 00:48:28              1.0   
1         1  2021-02-01 00:07:44   2021-02-01 00:20:31              1.0   
2         1  2021-02-01 00:59:36   2021-02-01 01:24:13              1.0   
3         2  2021-02-01 00:03:26   2021-02-01 00:16:32              1.0   
4         2  2021-02-01 00:20:20   2021-02-01 00:24:03              2.0   

   trip_distance  RatecodeID store_and_fwd_flag  PULocationID  DOLocationID  \
0           2.30         1.0                  N           141           226   
1           1.60         1.0                  N            43           263   
2           5.30         1.0                  N           114           263   
3           2.79         1.0                  N           236           229   
4           0.64         1.0                  N           229           140   

   payment_type  fare_amount  extra  mta_tax  tip_amount  tolls_amount  \


In [50]:
y_pred_test = predict_duration(X_test, model)
test_me = evaluate_model(y_test, y_pred_test)
test_me

58.375054515981205

## 5 - Log Model Parameters to MlFlow

Now that all our development functions are built and tested, let's create a training pipeline and log the training parameters, logs and model to MlFlow.

Create a training flow, log all the important parameters, metrics and model. Try to find what could be important and needs to be logged.

In [51]:
# Set the experiment name
experiment_name = "NY Taxi Expeirment"
mlflow.set_experiment(experiment_name)

# Start a run
with mlflow.start_run() as run:
    run_id = run.info.run_id

    # Set tags for the run
    mlflow.set_tag("version", "3.0")
    # Load data
    train_data = load_data(train_path)
    # Compute target
    train_data = compute_target(train_data)

    # Filter outliers
    train_data = filter_outliers(train_data)

    # Encode categorical columns
    train_data = encode_categorical_cols(train_data)

    # Extract X and y
    X_train, y_train, dv = extract_x_y(train_data)

    # Train model
    model = train_model(X_train, y_train)

    # Evaluate model
    prediction = predict_duration(X_train, model)
    train_me = evaluate_model(y_train, prediction)
    train_me
    
    # Evaluate model on test set
    test_data = load_data(test_path)
    test_data = compute_target(test_data)
    test_data = encode_categorical_cols(test_data)
    X_test, y_test, _ = extract_x_y(test_data, dv=dv)
    y_pred_test = predict_duration(X_test, model)
    test_me = evaluate_model(y_test, y_pred_test)
    test_me
    
    mlflow.log_param("model_type", "the 1.0 version")  
    mlflow.log_metric("train_metric", train_me)  
    mlflow.log_metric("test_metric", test_me)      

    # Log your model
    mlflow.sklearn.log_model(model, "model")

    # Register your model in mlfow model registry
    mlflow.register_model(f"runs:/{run_id}/model", "NY_Taxi_Model") 


Registered model 'NY_Taxi_Model' already exists. Creating a new version of this model...
Created version '3' of model 'NY_Taxi_Model'.


If the model is satisfactory, we stage it as production using the appropriate version. This will help us retreiving it for predictions.

Create a mlflow client and use the [mlflow documentation](https://mlflow.org/docs/latest/python_api/mlflow.client.html?highlight=transition_model_version_stage#mlflow.client.MlflowClient.transition_model_version_stage) to stage the appropriate model as being in "production".

In [52]:
client = MlflowClient()

model_name = "NY_Taxi_Model"
model_version = 1  


client.transition_model_version_stage(
    name=model_name,
    version=model_version,
    stage="Production",
    archive_existing_versions=True 
)

print(f"Model {model_name} version {model_version} is now in production.")

  client.transition_model_version_stage(


Model NY_Taxi_Model version 1 is now in production.


## 6 - Predict

We can now use our model to predict on fresh unseen data and forecast what is going to be the duration of a tawi trip depending on trip characteristics.

In [56]:
# Load prediction data
x_file = "C:\\Users\\moume\\Downloads\\ESVIL\\MLO\\esilv-mlops-crashcourse-24\\lessons\\01-model-and-experiment-management\\mlruns\\611815272231168090\\0002d25b646c4f94a3c7bd6b2aa18f2e\\artifacts\\model"
predict_df = load_data(predict_path)

# Apply feature engineering
predict_df = encode_categorical_cols(predict_df)
X_pred, _, dv2= extract_x_y(predict_df, dv=dv, with_target=False)

# Load production model
model_uri = f"models:/{model_name}/Production"
model = mlflow.sklearn.load_model(model_uri)

# Make predictions
y_pred = predict_duration(X_pred, model)
y_pred

  latest = client.get_latest_versions(name, None if stage is None else [stage])


array([11.32720643, 11.53781398, 11.53781398, ..., 13.78254575,
       20.56516851, 23.34317854])

In [None]:
import pickle

def save_pickles(path: str, obj:any):
    with open(path, "wb") as f:
        pickle.dump(obj,f)
save_pickles("C:/Users/moume/Downloads/ESVIL/MLO/esilv-mlops-crashcourse-24/lessons/02-model-deployment/web_service/local_models/dv_v0.0.2.pkl",dv)

## 7 - To go further

If you managed to go this far, you can try solving the use case using an other regression model like [XGBoost](https://xgboost.readthedocs.io/en/stable/) for instance.

In [None]:
%pip install xgboost

In [None]:
import xgboost as xgb


