# Demo 1: Simple MLFlow integration with Azure ML

**Authored by:** Joshua Isanan

**Date:** 07/28/2024

Let's first load our MLClient. This will serve as our interface to load our saved Data Asset in Azure ML.

In [None]:
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
import numpy as np

resource_group = "RESOURCE GROUP"
subscription_id = "SUBSCRIPTION ID"
workspace = "WORKSPACE NAME"

ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace
)

Load the dataset that we're going to use for our ML Model.

In [None]:
dataset =  ml_client.data.get(name="purchase-prediction-dataset", version="1")

In [None]:
import pandas as pd
import numpy as np

dataset = pd.read_csv(dataset.path)

Convert the numerical datatypes to float. This makes it easier for MLFlow to infer signature.

In [None]:
m = dataset.select_dtypes(np.number)
dataset[m.columns]= m.astype('float')

In [None]:
dataset

Let's import the mlflow library, then create the experiment.

In [None]:
import mlflow

mlflow.set_experiment(experiment_name="purchase-prediction-classification")

Split the dataset to train and test.

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    dataset.drop("PurchaseStatus", axis=1), dataset["PurchaseStatus"], test_size=0.3, random_state=5)

Install the necessary libraries that we'll need.

In [None]:
%pip install xgboost
%pip install matplotlib
%pip install shap

Create the model pipeline, this includes the dropping of columns that we'd like to drop, as well as the one-hot encoding of our categorical column.

In [None]:
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OrdinalEncoder
from xgboost import XGBClassifier

preprocessor = ColumnTransformer(
    [
        ('column_dropper', 'drop', ["Gender"]),
        (
        "cat_encoding",
        OrdinalEncoder(
            categories="auto",
            encoded_missing_value=np.nan,
        ),
        ["ProductCategory"],
        )
    ],
    remainder="passthrough",
    verbose_feature_names_out=False,
)

model = XGBClassifier(use_label_encoder=False, eval_metric="auc")

pipeline = Pipeline(
    steps=[
        ("preprocessor", preprocessor),
        ("model", model)
    ]
)

Create the signature, this allows MLFlow to recognize the input and output structure of our data.

In [None]:
from mlflow.models import infer_signature

signature = infer_signature(X_test, y_test)

Let's fit the model, then start our MLFlow Experiment. 
We'll also log the entire pipeline (this includes the data processing steps and the model) into our MLFlow experiment.

Let's then call MLFlow's evaluate method. This allows us to check the performance of the model when observing unseen data.

Afterwards, we'll register the model in our Azure Machine Learning Model Registry and call it "initial_model".

In [None]:
pipeline.fit(X_train, y_train)

with mlflow.start_run(run_name="Initial Model Training") as run:
    eval_data = X_test
    eval_data["PurchaseStatus"] = y_test

    pipeline_model = mlflow.sklearn.log_model(pipeline, artifact_path="pipeline", signature=signature)

    mlflow.evaluate(
        pipeline_model.model_uri,
        eval_data,
        targets="PurchaseStatus",
        model_type="classifier",
        evaluators=["default"],
    )

mlflow_model = mlflow.register_model(
    pipeline_model.model_uri, "initial_model"
)

Let's try loading the model that we registered.

In [None]:
# saved_model = ml_client.models.get("", version="1")

Take a look at its content.

In [None]:
# saved_model

Load the model with mlflow.

In [None]:
# saved_model = mlflow.pyfunc.load_model(saved_model.properties['mlflow.modelSourceUri'])

Test out a prediction.

In [None]:
# saved_model.predict({
#     "Age": 24.0,
#     "Gender": 0.0,
#     "AnnualIncome": 1223322.0,
#     "NumberOfPurchases": 8.0,
#     "ProductCategory": 2.0,
#     "TimeSpentOnWebsite": 7.0,
#     "LoyaltyProgram": 7.0,
#     "DiscountsAvailed": 33.0
# })