When came with a new model. We want to ask some questions. Like what has changed from previous version of model to new version. Is there any preprocessing needed? What are extra libraries that we need to run a new model

And what if when running this new model in production we face some issues and roll back to old model. We need to know where the old model is stored

When doing an ML task, we use the MLFlow Tracking Server to log the parameters, metrics, artifactions and also many different model versions

Once we believe those models are fit for production, then we will "register model" to the MLFlow registry

MLFlow registry is the place where we store the production ready models. So whenver a deployment engineer wants to update the models, they can take a look at the Model Registry to find the new prod ready models

The MLflow Model Registry component is a centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of an MLflow Model. It provides model lineage (which MLflow experiment and run produced the model), model versioning, model aliasing, model tagging, and annotations.

Model Registry does not deploy the models, instead it stores the models that are prod ready

### Tracking Experiments with a Local Database

Till now, we have used local files, now we will use local database like sqlite and store the information there

We use the following CLI `mlflow ui --port 8080 --backend-store-uri sqlite:///mlruns.db`

In [5]:
# set the following environment variable
%env MLFLOW_TRACKING_URI=sqlite:///../mlruns.db

env: MLFLOW_TRACKING_URI=sqlite:///../mlruns.db


We will start logging information

In [6]:
import mlflow

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
# from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression

mlflow.create_experiment('mysql-experiment-4')
mlflow.sklearn.autolog()

db = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)

# Create and train models.
rf = LinearRegression()
rf.fit(X_train, y_train)

# Use the model to make predictions on the test dataset.
predictions = rf.predict(X_test)

2024/08/31 17:42:03 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2024/08/31 17:42:03 INFO mlflow.store.db.utils: Updating database tables
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> 451aebb31d03, add metric step
INFO  [alembic.runtime.migration] Running upgrade 451aebb31d03 -> 90e64c465722, migrate user column to tags
INFO  [alembic.runtime.migration] Running upgrade 90e64c465722 -> 181f10493468, allow nulls for metric values
INFO  [alembic.runtime.migration] Running upgrade 181f10493468 -> df50e92ffc5e, Add Experiment Tags Table
INFO  [alembic.runtime.migration] Running upgrade df50e92ffc5e -> 7ac759974ad8, Update run tags with larger limit
INFO  [alembic.runtime.migration] Running upgrade 7ac759974ad8 -> 89d4b8295536, create latest metrics table
INFO  [89d4b8295536_create_latest_metrics_table_py] Migration complete!
INFO  

In [7]:
X_test.shape

(111, 10)

In [9]:
import pandas as pd

df = pd.from_(db)

AttributeError: module 'pandas' has no attribute 'read_dict'

In [None]:
eval_data = X_test
eval_data["label"] = y_test
eval_data["predictions"] = predictions

# Create the PandasDataset for use in mlflow evaluate
pd_dataset = mlflow.data.from_pandas(
    eval_data, predictions="predictions", targets="label"
)

mlflow.set_experiment("Diabetes")

# Log the Dataset, model, and execute an evaluation run using the configured Dataset
with mlflow.start_run() as run:
    mlflow.log_input(pd_dataset, context="training")

    mlflow.sklearn.log_model(
        artifact_path="white-wine-xgb", xgb_model=model, input_example=X_test
    )

    result = mlflow.evaluate(data=pd_dataset, predictions=None, model_type="classifier")