# 🌍 Overview

This demo is a minimalistic MLOps project intended to showcase how to put ML workflows in production. It features: 

- A feature engineering pipeline that loads data and prepares it for training.
- A training pipeline that loads the preprocessed dataset and trains a model.
- A batch inference pipeline that runs predictions on the trained model with new data.
- A stack switching and leveraging of Sagemaker step operator to outsource training to Cloud
- An analysis of training artifacts and their lineage (including connection with W&B)

<img src="_assets/pipeline_overview.png" width="50%" alt="Pipelines Overview">

# 👶 Step 0. Install Requirements

Let's install ZenML to get started. First we'll install the latest version of
ZenML as well as the `sklearn` and `xgboost` integration of ZenML:

In [None]:
! pip3 install -r requirements.txt
! zenml integration install sklearn xgboost -y
! zenml connect --url https://1cf18d95-zenml.cloudinfra.zenml.io 
! zenml model delete breast_cancer_classifier -y

import IPython
IPython.Application.instance().kernel.do_shutdown(restart=True)

In [3]:
# Initialize ZenML and set the default stack
!zenml init
!zenml stack set local-sagemaker-step-operator-wandb

[?25l[2;36mFound existing ZenML repository at path [0m
[2;32m'/home/htahir1/workspace/zenml_io/zenml-projects/classifier-e2e'[0m[2;36m.[0m
[2;32m⠋[0m[2;36m Initializing ZenML repository at [0m
[2;36m/home/htahir1/workspace/zenml_io/zenml-projects/classifier-e2e.[0m
[2K[1A[2K[1A[2K[32m⠋[0m Initializing ZenML repository at 
/home/htahir1/workspace/zenml_io/zenml-projects/classifier-e2e.

[1A[2K[1A[2K[1A[2K[?25l[32m⠋[0m Setting the repository active stack to 
[2K[1A[2K[2;36mActive repository stack set to: [0m[2;32m'local-sagemaker-step-operator-wandb'[0m
[2;32m⠋[0m[2;36m Setting the repository active stack to [0m
[2K[1A[2K[32m⠋[0m Setting the repository active stack to 
'local-sagemaker-step-operator-wandb'...
[1A[2K[1A[2K

In [2]:
# Do the imports at the top
from zenml import Model
from zenml.client import Client
from zenml.logger import get_logger

from pipelines import training, inference

logger = get_logger(__name__)

# Initialize the ZenML client to fetch objects from the ZenML Server
client = Client()

# ⌚ Step 1: Training pipeline

Now that we have our data it makes sense to train some models to get a sense of
how difficult the task is. The Breast Cancer dataset is sufficiently large and complex 
that it's unlikely we'll be able to train a model that behaves perfectly since the problem 
is inherently complex, but we can get a sense of what a reasonable baseline looks like.

We'll start with two simple models, a SGD Classifier and a Random Forest
Classifier, both batteries-included from `sklearn`. We'll train them both on the
same data and then compare their performance.

<img src="_assets/cloud_mcp.png" width="60%" alt="Model Control Plane">

In [4]:
# let's have a look at model training step
%pycat steps/model_trainer.py

[0;31m# Apache Software License 2.0[0m[0;34m[0m
[0;34m[0m[0;31m#[0m[0;34m[0m
[0;34m[0m[0;31m# Copyright (c) ZenML GmbH 2024. All rights reserved.[0m[0;34m[0m
[0;34m[0m[0;31m#[0m[0;34m[0m
[0;34m[0m[0;31m# Licensed under the Apache License, Version 2.0 (the "License");[0m[0;34m[0m
[0;34m[0m[0;31m# you may not use this file except in compliance with the License.[0m[0;34m[0m
[0;34m[0m[0;31m# You may obtain a copy of the License at[0m[0;34m[0m
[0;34m[0m[0;31m#[0m[0;34m[0m
[0;34m[0m[0;31m# http://www.apache.org/licenses/LICENSE-2.0[0m[0;34m[0m
[0;34m[0m[0;31m#[0m[0;34m[0m
[0;34m[0m[0;31m# Unless required by applicable law or agreed to in writing, software[0m[0;34m[0m
[0;34m[0m[0;31m# distributed under the License is distributed on an "AS IS" BASIS,[0m[0;34m[0m
[0;34m[0m[0;31m# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.[0m[0;34m[0m
[0;34m[0m[0;31m# See the License for the specific lang

Our two training steps both return different kinds of classifier
models, so we use the generic `ClassifierMixin` type hint for the return type.

ZenML allows you to load any version of any dataset that is tracked by the framework
directly into a pipeline using the `Client().get_artifact_version` interface. This is very convenient
in this case, as we'd like to send our preprocessed dataset from the older pipeline directly
into the training pipeline.

In [5]:
# let's have a look at training pipeline
%pycat pipelines/training.py

[0;31m# Apache Software License 2.0[0m[0;34m[0m
[0;34m[0m[0;31m#[0m[0;34m[0m
[0;34m[0m[0;31m# Copyright (c) ZenML GmbH 2024. All rights reserved.[0m[0;34m[0m
[0;34m[0m[0;31m#[0m[0;34m[0m
[0;34m[0m[0;31m# Licensed under the Apache License, Version 2.0 (the "License");[0m[0;34m[0m
[0;34m[0m[0;31m# you may not use this file except in compliance with the License.[0m[0;34m[0m
[0;34m[0m[0;31m# You may obtain a copy of the License at[0m[0;34m[0m
[0;34m[0m[0;31m#[0m[0;34m[0m
[0;34m[0m[0;31m# http://www.apache.org/licenses/LICENSE-2.0[0m[0;34m[0m
[0;34m[0m[0;31m#[0m[0;34m[0m
[0;34m[0m[0;31m# Unless required by applicable law or agreed to in writing, software[0m[0;34m[0m
[0;34m[0m[0;31m# distributed under the License is distributed on an "AS IS" BASIS,[0m[0;34m[0m
[0;34m[0m[0;31m# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.[0m[0;34m[0m
[0;34m[0m[0;31m# See the License for the specific lang

The end goal of this quick baseline evaluation is to understand which of the two
models performs better. We'll use the `evaluator` step to compare the two
models. This step takes in the model from the trainer step, and computes its score
over the testing set.

Soon you will see that it is relatively easy to train ML models using ZenML pipelines. But it can be somewhat clunky to track
all the models produced as you develop your experiments and use-cases. Luckily, ZenML offers a *Model Control Plane*,
which is a central register of all your ML models.

You can easily create a ZenML Model and associate it with your pipelines using the `Model` object:

In [6]:
pipeline_settings = {}

# Lets add some metadata to the model to make it identifiable
pipeline_settings["model"] = Model(
    name="breast_cancer_classifier",
    license="Apache 2.0",
    description="A breast cancer classifier",
)

In [7]:
# Let's train the XGBoost model and tag the version name with "xgboost"
pipeline_settings["model"].tags = ["breast_cancer", "classifier", "xgboost"]

# Use an XGBoost model with fixed seed.
training.with_options(enable_cache=False,**pipeline_settings)(
    model_type="xgboost",
    random_state=42
)

xgboost_run = client.get_pipeline("training").last_run

[1;35mInitiating a new run for the pipeline: [0m[1;36mtraining[1;35m.[0m
[1;35mReusing registered pipeline version: [0m[1;36m(version: 146)[1;35m.[0m
[1;35mNew model version [0m[1;36m2[1;35m was created.[0m
[1;35mExecuting a new run.[0m
[1;35mCaching is disabled by default for [0m[1;36mtraining[1;35m.[0m
[1;35mUsing user: [0m[1;36mhamza@zenml.io[1;35m[0m
[1;35mUsing stack: [0m[1;36mlocal-sagemaker-step-operator-wandb[1;35m[0m
[1;35m  step_operator: [0m[1;36msagemaker-eu[1;35m[0m
[1;35m  container_registry: [0m[1;36maws-eu[1;35m[0m
[1;35m  experiment_tracker: [0m[1;36mzenml_wandb[1;35m[0m
[1;35m  orchestrator: [0m[1;36mdefault[1;35m[0m
[1;35m  image_builder: [0m[1;36mlocal[1;35m[0m
[1;35m  artifact_store: [0m[1;36ms3-zenfiles[1;35m[0m
[33mCould not import GCP service connector: No module named 'google.api_core'.[0m
[33mCould not import Azure service connector: No module named 'azure.identity'.[0m
[33mCould not import Kub

[1;35mTraining model XGBClassifier(base_score=None, booster=None, callbacks=None,
              colsample_bylevel=None, colsample_bynode=None,
              colsample_bytree=None, device=None, early_stopping_rounds=None,
              enable_categorical=False, eval_metric=None, feature_types=None,
              gamma=None, grow_policy=None, importance_type=None,
              interaction_constraints=None, learning_rate=None, max_bin=None,
              max_cat_threshold=None, max_cat_to_onehot=None,
              max_delta_step=None, max_depth=None, max_leaves=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              multi_strategy=None, n_estimators=None, n_jobs=None,
              num_parallel_tree=None, random_state=None, ...)...[0m


VBox(children=(Label(value='0.004 MB of 0.004 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

[1;35mStep [0m[1;36mmodel_trainer[1;35m has finished in [0m[1;36m23.844s[1;35m.[0m
[1;35mStep [0m[1;36mmodel_evaluator[1;35m has started.[0m
[1;35mInitializing wandb with entity None, project name: None, run_name: training-2024_02_09-15_38_23_251071_model_evaluator.[0m


[33mYour artifact was materialized under Python version 'unknown' but you are currently using '3.8.10'. This might cause unexpected behavior since pickle is not reproducible across Python versions. Attempting to load anyway...[0m
[1;35mTrain accuracy=100.00%[0m
[1;35mTest accuracy=97.25%[0m


VBox(children=(Label(value='0.004 MB of 0.004 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,▁
train_accuracy,▁

0,1
test_accuracy,0.97248
train_accuracy,1.0


[1;35mImplicitly linking artifact [0m[1;36moutput[1;35m to model [0m[1;36mbreast_cancer_classifier[1;35m version [0m[1;36m2[1;35m.[0m
[1;35mStep [0m[1;36mmodel_evaluator[1;35m has finished in [0m[1;36m18.366s[1;35m.[0m
[1;35mStep [0m[1;36mmodel_promoter[1;35m has started.[0m
[33mYour artifact was materialized under Python version 'unknown' but you are currently using '3.8.10'. This might cause unexpected behavior since pickle is not reproducible across Python versions. Attempting to load anyway...[0m
[1;35mImplicitly linking artifact [0m[1;36moutput[1;35m to model [0m[1;36mbreast_cancer_classifier[1;35m version [0m[1;36m2[1;35m.[0m
[1;35mStep [0m[1;36mmodel_promoter[1;35m has finished in [0m[1;36m17.530s[1;35m.[0m
[1;35mPipeline run has finished in [0m[1;36m2m0s[1;35m.[0m
[1;35mDashboard URL: https://1cf18d95-zenml.cloudinfra.zenml.io/workspaces/default/pipelines/7b61c1dd-cc67-4d51-b5e1-f609cf7794c3/runs/28a6e92a-5c69-4397-830f-afca1d4

In [None]:
# Let's train the SGD model and tag the version name with "sgd"
pipeline_settings["model"].tags = ["breast_cancer", "classifier", "sgd"]

# Use a SGD classifier
sgd_run = training.with_options(enable_cache=True,**pipeline_settings)(
    model_type="sgd",
    random_state=42
)

sgd_run = client.get_pipeline("training").last_run

[1;35mInitiating a new run for the pipeline: [0m[1;36mtraining[1;35m.[0m
[1;35mReusing registered pipeline version: [0m[1;36m(version: 147)[1;35m.[0m
[33mProvided model configuration does not match existing model [0m[1;36mbreast_cancer_classifier[33m with the following changes: {'tags added': ['sgd'], 'tags removed': ['xgboost']}. If you want to update the model configuration, please use the [0m[1;36mzenml model update[33m command.[0m
[1;35mExecuting a new run.[0m
[1;35mUsing user: [0m[1;36mhamza@zenml.io[1;35m[0m
[1;35mUsing stack: [0m[1;36mlocal-sagemaker-step-operator-wandb[1;35m[0m
[1;35m  step_operator: [0m[1;36msagemaker-eu[1;35m[0m
[1;35m  container_registry: [0m[1;36maws-eu[1;35m[0m
[1;35m  experiment_tracker: [0m[1;36mzenml_wandb[1;35m[0m
[1;35m  orchestrator: [0m[1;36mdefault[1;35m[0m
[1;35m  image_builder: [0m[1;36mlocal[1;35m[0m
[1;35m  artifact_store: [0m[1;36ms3-zenfiles[1;35m[0m
[1;35mUsing cached version of [

[33mProvided model configuration does not match existing model [0m[1;36mbreast_cancer_classifier[33m with the following changes: {'tags added': ['sgd'], 'tags removed': ['xgboost']}. If you want to update the model configuration, please use the [0m[1;36mzenml model update[33m command.[0m
[33mProvided model configuration does not match existing model [0m[1;36mbreast_cancer_classifier[33m with the following changes: {'tags added': ['sgd'], 'tags removed': ['xgboost']}. If you want to update the model configuration, please use the [0m[1;36mzenml model update[33m command.[0m
[33mProvided model configuration does not match existing model [0m[1;36mbreast_cancer_classifier[33m with the following changes: {'tags added': ['sgd'], 'tags removed': ['xgboost']}. If you want to update the model configuration, please use the [0m[1;36mzenml model update[33m command.[0m
[33mProvided model configuration does not match existing model [0m[1;36mbreast_cancer_classifier[33m with

VBox(children=(Label(value='0.004 MB of 0.009 MB uploaded\r'), FloatProgress(value=0.410205652699879, max=1.0)…

[33mProvided model configuration does not match existing model [0m[1;36mbreast_cancer_classifier[33m with the following changes: {'tags added': ['sgd'], 'tags removed': ['xgboost']}. If you want to update the model configuration, please use the [0m[1;36mzenml model update[33m command.[0m
[33mProvided model configuration does not match existing model [0m[1;36mbreast_cancer_classifier[33m with the following changes: {'tags added': ['sgd'], 'tags removed': ['xgboost']}. If you want to update the model configuration, please use the [0m[1;36mzenml model update[33m command.[0m
[33mProvided model configuration does not match existing model [0m[1;36mbreast_cancer_classifier[33m with the following changes: {'tags added': ['sgd'], 'tags removed': ['xgboost']}. If you want to update the model configuration, please use the [0m[1;36mzenml model update[33m command.[0m
[33mProvided model configuration does not match existing model [0m[1;36mbreast_cancer_classifier[33m with

[33mProvided model configuration does not match existing model [0m[1;36mbreast_cancer_classifier[33m with the following changes: {'tags added': ['sgd'], 'tags removed': ['xgboost']}. If you want to update the model configuration, please use the [0m[1;36mzenml model update[33m command.[0m
[33mProvided model configuration does not match existing model [0m[1;36mbreast_cancer_classifier[33m with the following changes: {'tags added': ['sgd'], 'tags removed': ['xgboost']}. If you want to update the model configuration, please use the [0m[1;36mzenml model update[33m command.[0m
[33mYour artifact was materialized under Python version 'unknown' but you are currently using '3.8.10'. This might cause unexpected behavior since pickle is not reproducible across Python versions. Attempting to load anyway...[0m
[33mProvided model configuration does not match existing model [0m[1;36mbreast_cancer_classifier[33m with the following changes: {'tags added': ['sgd'], 'tags removed': [

VBox(children=(Label(value='0.004 MB of 0.004 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,▁
train_accuracy,▁

0,1
test_accuracy,0.74312
train_accuracy,0.66204


[33mProvided model configuration does not match existing model [0m[1;36mbreast_cancer_classifier[33m with the following changes: {'tags added': ['sgd'], 'tags removed': ['xgboost']}. If you want to update the model configuration, please use the [0m[1;36mzenml model update[33m command.[0m
[1;35mImplicitly linking artifact [0m[1;36moutput[1;35m to model [0m[1;36mbreast_cancer_classifier[1;35m version [0m[1;36m3[1;35m.[0m
[33mProvided model configuration does not match existing model [0m[1;36mbreast_cancer_classifier[33m with the following changes: {'tags added': ['sgd'], 'tags removed': ['xgboost']}. If you want to update the model configuration, please use the [0m[1;36mzenml model update[33m command.[0m
[1;35mStep [0m[1;36mmodel_evaluator[1;35m has finished in [0m[1;36m21.367s[1;35m.[0m
[1;35mStep [0m[1;36mmodel_promoter[1;35m has started.[0m
[33mProvided model configuration does not match existing model [0m[1;36mbreast_cancer_classifier[33m w

You can see from the logs already how our model training went: the
`XGBClassifier` performed considerably better than the `SGDClassifier`.
We can use the ZenML `Client` to verify this:

In [None]:
# The evaluator returns a float value with the accuracy
xgboost_run.steps["model_evaluator"].output.load() >= sgd_run.steps["model_evaluator"].output.load()

Running both pipelines has created two associated **model versions**.
You can list your ZenML model and their versions as follows:

In [None]:
zenml_model = client.get_model("breast_cancer_classifier")
print(zenml_model)

versions = zenml_model.versions

print(f"Model {zenml_model.name} has {len(versions)} versions")

versions[-2].version, versions[-1].version

The interesting part is that ZenML went ahead and linked all artifacts produced by the
pipelines to that model version, including the two pickle files that represent our
SGD and RandomForest classifier. We can see all artifacts directly from the model
version object:

In [None]:
# Let's load the XGBoost version
xgboost_zenml_model_version = client.list_model_versions("breast_cancer_classifier", tag="xgboost")[-1]

# We can now load our classifier directly as well
xgboost_classifier = xgboost_zenml_model_version.get_artifact("breast_cancer_classifier").load()

xgboost_classifier

If you are a [ZenML Cloud](https://zenml.io/cloud) user, you can see all of this visualized in the dashboard:

<img src="_assets/cloud_mcp_screenshot.png" width="70%" alt="Model Control Plane">

There is a lot more you can do with ZenML models, including the ability to
track metrics by adding metadata to it, or having them persist in a model
registry. However, these topics can be explored more in the
[ZenML docs](https://docs.zenml.io).

For now, we will use the ZenML model control plane to promote our best
model to `production`. You can do this by simply setting the `stage` of
your chosen model version to the `production` tag.

In [None]:
# Set our best classifier to production
xgboost_zenml_model_version.set_stage("production", force=True)

Of course, normally one would only promote the model by comparing to all other model
versions and doing some other tests. But that's a bit more advanced use-case. See the
[e2e_batch example](https://github.com/zenml-io/zenml/tree/main/examples/e2e) to get
more insight into that sort of flow!

Once the model is promoted, we can now consume the right model version in our
batch inference pipeline directly. Let's see how that works.

# 🫅 Step 2: Consuming the model in production

The batch inference pipeline simply takes the model marked as `production` and runs inference on it
with `live data`. The critical step here is the `inference_predict` step, where we load the model in memory
and generate predictions:

<img src="_assets/inference_pipeline.png" width="45%" alt="Inference pipeline">

In [None]:
# let's have a look at training pipeline
%pycat steps/inference_predict.py


Apart from the loading the model, we must also load the preprocessing pipeline that we ran in feature engineering,
so that we can do the exact steps that we did on training time, in inference time. Let's bring it all together:

In [None]:
# let's have a look at training pipeline
%pycat pipelines/inference.py

The way to load the right model is to pass in the `production` stage into the `Model` config this time.
This will ensure to always load the production model, decoupled from all other pipelines:

In [None]:
pipeline_settings = {"enable_cache": False}

# Lets add some metadata to the model to make it identifiable
pipeline_settings["model"] = Model(
    name="breast_cancer_classifier",
    version="production", # We can pass in the stage name here!
)

In [None]:
# the `with_options` method allows us to pass in pipeline settings
#  and returns a configured pipeline
inference.with_options(**pipeline_settings)()

ZenML automatically links all artifacts to the `production` model version as well, including the predictions
that were returned in the pipeline. This completes the MLOps loop of training to inference:

In [None]:
# Fetch production model
production_model_version = client.get_model_version("breast_cancer_classifier", "production")

# Get the predictions artifact
production_model_version.get_artifact("predictions").load()

You can also see all predictions ever created as a complete history in the dashboard:

<img src="_assets/cloud_mcp_predictions.png" width="70%" alt="Model Control Plane">

# 🙏 Step 3: Bringing it all together

Let's run all the moving pieces we navigated in the previous steps using production ready python script `run.py`

In [None]:
# let's clean up previous partial runs first
! zenml model delete breast_cancer_classifier -y

In [None]:
!zenml stack set local-wandb
!zenml stack describe local-wandb

In [None]:
!python3 run.py --training-pipeline --inference-pipeline

Now full run executed on local stack and experiment is tracked using Model Control Plane and Weights&Biases.

Let's move some heavy lifting steps to the Sagemaker keeping light ones on local, so we can reduce costs. This can be achieved using step operators and step configurations.

To make this happen we will use following configuration option for step level settings:
```yaml
steps:
  model_trainer:
    
    settings:
      step_operator: sagemaker-eu
        sagemaker:
          estimator_args: 
            instance_type : ml.m5.large # select instance type
```

<img src="_assets/local_sagmaker_so_stack.png" width="60%" alt="Sagemaker step_op stack">

In [None]:
!zenml stack set local-sagemaker-step-operator-wandb
!zenml stack describe local-sagemaker-step-operator-wandb

In [None]:
!python3 run.py --training-pipeline --inference-pipeline --custom-training-suffix _sagemaker

# 🐙 Step 4: Analyzing results

In [None]:
sgd_model_version = client.list_model_versions("breast_cancer_classifier",tag="sgd")[-1]
xgboost_model_version = client.list_model_versions("breast_cancer_classifier",tag="xgboost")[-1]
print(f"SGD version is staged as `{sgd_model_version.stage}`")
print(f"XGBoost version is staged as `{xgboost_model_version.stage}`")

At first, let's pull some meta information collected during models evaluation stage. To recall we used this step as evaluator:
```python
@step
def model_evaluator(
    model: ClassifierMixin,
    dataset_trn: pd.DataFrame,
    dataset_tst: pd.DataFrame,
    min_train_accuracy: float = 0.0,
    min_test_accuracy: float = 0.0,
    target: Optional[str] = "target",
) -> float:
    # Calculate the model accuracy on the train and test set
    trn_acc = model.score(...)
    tst_acc = model.score(...)

    ...
    
    predictions = model.predict(dataset_tst.drop(columns=[target]))
    metadata = {
        "train_accuracy": float(trn_acc),
        "test_accuracy": float(tst_acc),
        "confusion_matrix": confusion_matrix(dataset_tst[target], predictions)
        .ravel()
        .tolist(),
    }
    log_model_metadata(metadata={"wandb_url": wandb.run.url})
    log_artifact_metadata(
        metadata=metadata,
        artifact_name="breast_cancer_classifier",
    )

    wandb.log({"train_accuracy": metadata["train_accuracy"]})
    wandb.log({"test_accuracy": metadata["test_accuracy"]})
    wandb.log(
        {
            "confusion_matrix": wandb.sklearn.plot_confusion_matrix(
                dataset_tst[target], predictions, ["No Cancer", "Cancer"]
            )
        }
    )
    return float(tst_acc)
```
First we pull Accuracy metrics out of both model version for comparison:

In [None]:
sgd_clf_metadata = sgd_model_version.get_artifact("breast_cancer_classifier").run_metadata
xgboost_clf_metadata = xgboost_model_version.get_artifact("breast_cancer_classifier").run_metadata
print(f"SGD{' (production)' if sgd_model_version.stage == 'production' else ''} metrics: train={sgd_clf_metadata['train_accuracy'].value*100:.2f}% test={sgd_clf_metadata['test_accuracy'].value*100:.2f}%")
print(f"XGBoost{' (production)' if xgboost_model_version.stage == 'production' else ''} metrics: train={xgboost_clf_metadata['train_accuracy'].value*100:.2f}% test={xgboost_clf_metadata['test_accuracy'].value*100:.2f}%")

Now lets' plot collected Confusion Matrixes:

In [None]:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

def plot_confusion_matrix(metadata_pointer, tp: str,ax):
    confusion_matrix = np.array(metadata_pointer["confusion_matrix"].value, dtype=float).reshape((2,2))
    confusion_matrix /= np.sum(confusion_matrix)
    sns.heatmap(confusion_matrix, annot=True,fmt='.2%',cmap="coolwarm",ax=ax)
    ax.set_title(f"{tp} confusion matrix")
    ax.set_ylabel("Ground Label")
    ax.set_xlabel("Predicted Label")

fig, ax = plt.subplots(1,2,figsize=(15,4))
plot_confusion_matrix(sgd_clf_metadata, "SGD",ax[0])
plot_confusion_matrix(xgboost_clf_metadata, "RF",ax[1])

So far we were able to collect all the information we tracked using Model Control Plane, but we also had Weights&Biases tracking enabled - let's dive into.

Thanks to Model Control Plane metadata we establish a nice connection between those 2 entities:

In [None]:
print(f'SGD version: {sgd_model_version.run_metadata["wandb_url"].value}')
print(f'RF version: {xgboost_model_version.run_metadata["wandb_url"].value}')

With Model Control Plane we can also easily track lineage of artifacts and pipeline runs:

In [None]:
for artifact_name, versions in sgd_model_version.data_artifacts.items():
    if versions:
        print(f"Existing version of `{artifact_name}`:")
        for version_name, artifact_ in  versions.items():
            print(version_name, artifact_.data_type.attribute)

In [None]:
for run_name, run_ in sgd_model_version.pipeline_runs.items():
    print(run_name, run_.id)