# ZenML Quickstart Guide

<a href="https://colab.research.google.com/github/zenml-io/zenml/blob/main/examples/quickstart/notebooks/quickstart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction

ZenML is an extensible, open-source MLOps framework for creating portable, 
production-ready MLOps pipelines.

ZenML pipelines are infrastructure-agnostic so you can switch between 
development and production environments without requiring any code changes. 
This gives both Data Scientists and MLOps Engineers more freedom and 
independence in how they approach their work:

TODO: Architecture / Workflow visualization

### ZenML for MLOps Engineers

As an MLOps expert, ZenML enables you to define, deploy, and manage
sophisticated production infrastructure and tooling setups that are easy to 
share with your colleagues.

Since infrastructure is decoupled from code, you are never vendor-locked. With
ZenML you have the freedom to switch to a different tooling stack whenever it 
suites you:

```bash
zenml stack set gcp
python run.py  # Run your ML workflows in GCP
zenml stack set aws
python run.py  # Now your ML workflow runs in AWS
```

To share your tooling stack with your colleagues, simply set up a ZenML server, 
register your production environment as a ZenML stack, and invite your 
colleagues to run their ML workflows on it:

```bash
zenml deploy  # Deploy ZenML
zenml stack register production ...  # Register your production environment
zenml stack share production  # Make it available to your colleagues
```



### ZenML for Data Scientists

As a Data Scientists, you can develop ML models locally with all your 
favourite tools without having to worry about production issues at all. Once you
are happy with your results, you can switch to a production environment via a
single command and all the code you wrote will just work:

```bash
python run.py  # develop your code locally with all your favourite tools
zenml stack set production
python run.py  # run your code in production without any code changes
```

ZenML is designed to be as unintrusive as possible. Adding a ZenML `@step` or
`@pipeline` decorator to your Python functions is enough to turn your current 
code into ZenML pipelines:

```python
@step
def step_1() -> str:
  return "world"

@step
def step_2(input_one: str, input_two: str) -> None:
  combined_str = input_one + ' ' + input_two
  print(combined_str)

@pipeline
def my_pipeline():
  output_step_one = step_1()
  step_2(input_one="hello", input_two=output_step_one)

my_pipeline()
```

Let's see it in action and use ZenML to bring an LLM training and deployment
workflow into production. As an example, we will first train and deploy a
simple LLM model locally. Then we will switch the entire workflow to a
production environment in the cloud that will automatically train the
model on GPU-enabled hardware and deploy it to a scalable Kubernetes cluster.

## 1. Install Requirements

In [None]:
!pip install gradio

In [None]:
!zenml integration install pytorch mlflow -y

In [None]:
!zenml hub install mingpt_example mlflow_steps -y

## 2. Define the Local Training Environment

In [None]:
# Register the MLflow experiment tracker
!zenml experiment-tracker register mlflow_tracker --flavor=mlflow

# Register the MLflow model registry
!zenml model-registry register mlflow_registry --flavor=mlflow

# Register the MLflow model deployer
!zenml model-deployer register mlflow_deployer --flavor=mlflow

# Register a new stack with the new stack components
!zenml stack register quickstart_stack -a default\
                                       -o default\
                                       -d mlflow_deployer\
                                       -e mlflow_tracker\
                                       -r mlflow_registry\

!zenml stack set quickstart_stack

In [None]:
!zenml stack describe

## 3. Train and deploy GPT-nano locally
TODO: show source code of steps here and motivate the task

In [None]:
# autreload
%load_ext autoreload
%autoreload 2

In [None]:
from zenml.hub.mingpt_example import url_dataset_loader_step, gpt_nano_loader_step, pretrained_gpt_xl_loader_step, mingpt_trainer_step
from zenml.hub.mlflow_steps.mlflow_deployer import mlflow_model_deployer_step
# from zenml.integrations.mlflow.steps import mlflow_model_deployer_step

In [None]:
from zenml.client import Client

experiment_tracker = Client().active_stack.experiment_tracker

mingpt_trainer = mingpt_trainer_step()

mingpt_trainer.configure(experiment_tracker=experiment_tracker.name)

### TODO: Delete step below once new pipeline definition is life and simply set decision=True in deployer step

In [None]:
from zenml.steps import step

@step
def deployment_trigger() -> bool:
    return True

### TODO: Switch to new pipeline definition style

In [None]:
from zenml.pipelines import pipeline


@pipeline(enable_cache=False)
def training_pipeline(
    load_training_data,
    model_definition,
    trainer,
    deployment_trigger,
    model_deployer,
):
    """Train, evaluate, and deploy a model."""
    dataset = load_training_data()
    model = model_definition()
    model = trainer(dataset, model)
    deployment_decision = deployment_trigger()
    model_deployer(deployment_decision, model)

In [None]:
pip = training_pipeline(
    load_training_data=url_dataset_loader_step(),
    model_definition=gpt_nano_loader_step(),
    trainer=mingpt_trainer,
    deployment_trigger=deployment_trigger(),
    model_deployer=mlflow_model_deployer_step(),
)
pip.run(enable_cache=False)

In [None]:
!zenml model-deployer models list

## 4. Show run in dashboard TODO

## 5. Spin up gradio app

### TODO: use deployed model

In [None]:
pipeline_run = pip.get_runs()[0]
deployer_step = pipeline_run.get_step("model_deployer")
deployed_model_url = deployer_step.metadata["deployed_model_url"].value
deployed_model_url

In [None]:
deployed_model_url = "http://127.0.0.1:8002/invocations"

In [None]:
import json
import requests
import torch

def send_request_to_deployed_model(data):
    response = requests.post(
        headers={"Content-Type": "application/json"},
        url=deployed_model_url,
        data=json.dumps({"instances": data.numpy().tolist()})
    ).json()
    if "predictions" in response:
        return torch.tensor(response["predictions"])
    else:
        raise Exception(response)

In [None]:
import torch
from zenml.hub.mingpt_example.mingpt.bpe import BPETokenizer

def generate(prompt, steps):
    tokenizer = BPETokenizer()
    # TODO: If prompt empty, ask for input
    tokens = tokenizer(prompt)
    for _ in range(steps):
        logits = send_request_to_deployed_model(tokens)
        _, idx_next = torch.topk(logits, k=1, dim=-1)
        tokens = torch.cat((tokens, idx_next.reshape(tokens.shape)), dim=1)
    return tokenizer.decode(tokens.flatten())

In [None]:
def question_answerer(question):
    prompt = "Question: " + question + "\nAnswer: "
    result = generate(prompt=prompt, steps=2)
    answer = result#.split("\n")[1][8:].split(".")[0] + "."
    return answer

In [None]:
import gradio as gr

gr.Interface(
    title="My ZenML Chatbot", 
    fn=question_answerer,
    inputs=["text"], 
    outputs=["text"]
).launch()

## 7. Train GPT-XL on remote stack

In [None]:
# from zenml.integrations.seldon.steps import seldon_model_deployer_step

%load_ext autoreload
%autoreload 2

from zenml.integrations.mlflow.steps import mlflow_model_deployer_step
from zenml.hub.quickstart_example pretrained_gpt_xl_loader_step

training_pipeline(
    load_training_data=load_url_dataset(),
    model_definition=load_pretrained_gpt_xl(),
    trainer=train_llm(),
    deployment_trigger=deployment_trigger(),
    model_deployer=mlflow_model_deployer_step(),
).run()

This quickstart helps you get your first practical experience with ZenML and gives you a brief overview of various MLOps terms. 

Throughout this quickstart, we will:
- Train a model, evaluate it, register the model version, deploy it, and embed it in an inference pipeline,
- Automatically version, track, and cache data, models, and other artifacts,
- Track model hyperparameters and metrics in an experiment tracking tool,
- Measure and visualize train-test skew, training-serving skew, and data drift.

# Introduction

Before we dive into the code, let us briefly introduce you to some of the 
fundamental concepts of ZenML that we will use in this quickstart. If you are 
already familiar with these concepts, feel free to skip to the next section.

#### Steps

The first concept that we will cover is the ZenML **Step**. In 
ZenML, a step provides a simple python interface to our users to design a 
stand-alone process in an ML workflow. They consume input artifacts 
and generate output artifacts. As an example, we can take a closer look at a 
simple step example:

```python
from zenml.steps import step

@step
def my_dataset_loader() -> pd.DataFrame:
    """My dataset loader step."""
    # Implement logic here and return the dataset...
    return ...
```

#### Pipelines

Following the steps, you will go over the concepts of **Pipelines**. These 
pipelines provide our users a simple python interface to design their ML 
workflows by linking different steps together. For instance, a very 
simple pipeline might look like this:

```python
from zenml.pipelines import pipeline

@pipeline
def my_pipeline(
    my_data_loader,
    my_model_trainer,
):
    """Load the dataset and train a model."""
    dataset = my_data_loader()
    model = my_model_trainer(dataset=dataset)
```

#### Stacks & Stack Components

As for the execution of these pipelines, you need a **stack**. In ZenML, 
a stack stands for a set of configurations of your MLOps tools and 
infrastructure. Each stack consists of multiple **stack components** and
depending on their type, these components serve different purposes.

If you look at some examples of different flavors of stack components, you 
will see examples such as:

- [Airflow**Orchestrator**](https://docs.zenml.io/component-gallery/orchestrators/airflow) which orchestrates your ML workflows on Airflow 
- [MLflow**ExperimentTracker**](https://docs.zenml.io/component-gallery/experiment-trackers/mlflow) which can track your experiments with MLFlow
- [Evidently**DataValidator**](https://docs.zenml.io/component-gallery/data-validators/evidently) which can help you validate your data

Any such combination of tools and infrastructure can be registered as a 
separate stack in ZenML. Since ZenML code is tooling-independent, you can 
switch between stacks with a single command and then automatically execute your
ML workflows on the desired stack without having to modify your code.

#### Integrations

Finally, ZenML comes equipped with a wide variety of stack components flavors. 
While some of these flavors come built-in with the ZenML package, the others 
are implemented as a part of one of our integrations. Since our quickstart 
features some of these integrations, you will see a practical example on how 
to use these integrations in the upcoming sections.

## Dependencies

Now, for the quickstart, we need to install some dependencies. Once you have ZenML installed, you can use our CLI to install the required integrations.

In [None]:
%pip install "zenml[server]"  # install ZenML
!zenml integration install sklearn mlflow evidently -y  # install ZenML integrations
!zenml init  # Initialize a ZenML repository
%pip install pyparsing==2.4.2  # required for Colab

import IPython

# automatically restart kernel
IPython.Application.instance().kernel.do_shutdown(restart=True)

Please wait for the installation to complete before running subsequent cells. At the end of the installation, the notebook kernel will automatically restart.

## Using Google Colab

If you follow this quickstart in Google's Colab, you will need an [ngrok account](https://dashboard.ngrok.com/signup) to view some of the visualizations later. Please set up an account, then set your user token below:

In [None]:
NGROK_TOKEN = ""  # TODO: set your ngrok token if you are working on Colab

In [None]:
from zenml.environment import Environment

if Environment.in_google_colab():  # Colab only setup
    # install ngrok and set auth token
    !pip install pyngrok
    !ngrok authtoken {NGROK_TOKEN}

## Create an MLOps Stack

ZenML decouples your code from the infrastructure and tooling you use.
This enables you to quickly take your code from experimentation to production.
Furthermore, using ZenML prevents vendor lock-in by allowing you to switch out any part of your MLOps stack easily.
See the [ZenML Integrations](https://zenml.io/integrations) page for a list of all tools we currently support.

Throughout this quickstart, we will use the following MLOps stack: A local orchestrator, a local artifact store, [MLFlow](https://mlflow.org/) experiment tracker and model deployer, and an [Evidently](https://evidentlyai.com/) data validator.

![Quickstart MLOps Stack Overview](_assets/stack_overview_2.png)

Before we start, we need to register all stack components that require configuration into our ZenML MLOps stack:

In [None]:
# Register the MLflow experiment tracker
!zenml experiment-tracker register mlflow_tracker --flavor=mlflow

# Register the MLflow model registry
!zenml model-registry register mlflow_registry --flavor=mlflow

# Register the MLflow model deployer
!zenml model-deployer register mlflow_deployer --flavor=mlflow

# Register the Evidently data validator
!zenml data-validator register evidently_validator --flavor=evidently

# Register a new stack with the new stack components
!zenml stack register quickstart_stack -a default\
                                       -o default\
                                       -d mlflow_deployer\
                                       -e mlflow_tracker\
                                       -r mlflow_registry\
                                       -dv evidently_validator\
                                       --set

# Visualize the current ZenML stack
!zenml stack describe

## Define ML Pipelines
Let us now use ZenML to write two ML pipelines for continuous training and serving.

The training pipeline will:
- Load the [iris flower classification dataset](https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html),
- Train a model on the training data (and track hyperparameters using [MLFlow](https://mlflow.org/)),
- Test the model on the test data,
- Register the model (with [MLFlow](https://mlflow.org/))

The inference pipeline will:
- Load inference data,
- Deploy a chosen version of registered model,
- Run model inference on the inference data,
- Check for data drift (with [Evidently](https://evidentlyai.com/)).

You can see a visualization of the two pipelines below:

![Overview of Quickstart Pipelines](_assets/quickstart_pipelines.png)

Let's now define those pipelines with ZenML. To do so, we simply write a Python function that defines how the data will move through the different steps and decorate it with ZenML's `@pipeline` decorator. Under the hood, ZenML will build a [DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph) that determines the order in which the steps need to be executed. 

In [None]:
from zenml.pipelines import pipeline


@pipeline
def training_pipeline(
    training_data_loader,
    trainer,
    evaluator,
    model_register,
):
    """Train, evaluate, and deploy a model."""
    X_train, X_test, y_train, y_test = training_data_loader()
    model = trainer(X_train=X_train, y_train=y_train)
    test_acc = evaluator(X_test=X_test, y_test=y_test, model=model)
    model_register(model)


@pipeline
def inference_pipeline(
    inference_data_loader,
    mlflow_model_deployer,
    predictor,
    training_data_loader,
    drift_detector,
):
    """Inference pipeline with skew and drift detection."""
    inference_data = inference_data_loader()
    model_deployment_service = mlflow_model_deployer()
    predictor(model_deployment_service, inference_data)
    training_data, _, _, _ = training_data_loader()
    drift_detector(training_data, inference_data)

## Implement Pipeline Steps

Next, we need to implement the steps that make up these pipelines. 
Again, we can do this by writing simple Python functions and decorating them with ZenML's `@step` decorator.

In total, we will need ten steps:
- Training data loader
- Inference data loader
- Model trainer
- Model evaluator
- Model registerer
- Inference data loader
- Registered model deployer
- Predictor
- Skew comparison
- Drift detection

### Data Loaders
Let's start with data loading. We load the iris dataset for training and, for simplicity, use some random samples for inference.

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

from zenml.steps import Output, step


@step
def training_data_loader() -> Output(
    X_train=pd.DataFrame,
    X_test=pd.DataFrame,
    y_train=pd.Series,
    y_test=pd.Series,
):
    """Load the iris dataset as tuple of Pandas DataFrame / Series."""
    iris = load_iris(as_frame=True)
    X_train, X_test, y_train, y_test = train_test_split(
        iris.data, iris.target, test_size=0.2, shuffle=True, random_state=42
    )
    return X_train, X_test, y_train, y_test

In [None]:
@step
def inference_data_loader() -> pd.DataFrame:
    """Load some (random) inference data."""
    return pd.DataFrame(
        data=np.random.rand(10, 4) * 10,  # assume range [0, 10]
        columns=load_iris(as_frame=True).data.columns,
    )

### Model Trainer
To train our model, we define two steps that use the [sklearn SVC](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html) model and [Decision Tree](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html) classifier and fit them on the given training data. Additionally, we log all model hyperparameters and metrics to [MLFlow](https://mlflow.org/).

Note that we do not need to save the model within the step explicitly; ZenML is automatically taking care of this for us. Under the hood, ZenML persists all step inputs and outputs in an [Artifact Store](https://docs.zenml.io/component-gallery/artifact-stores). This also means that all of our data and models are automatically versioned and tracked.

In [None]:
import mlflow

from sklearn.base import ClassifierMixin
from sklearn.svm import SVC

from zenml.client import Client

experiment_tracker = Client().active_stack.experiment_tracker

@step(enable_cache=False, experiment_tracker=experiment_tracker.name)
def svc_trainer_mlflow(
    X_train: pd.DataFrame,
    y_train: pd.Series,
) -> ClassifierMixin:
    """Train a sklearn SVC classifier and log to MLflow."""
    mlflow.sklearn.autolog()  # log all model hparams and metrics to MLflow
    model = SVC(gamma=0.01)
    model.fit(X_train.to_numpy(), y_train.to_numpy())
    train_acc = model.score(X_train.to_numpy(), y_train.to_numpy())
    print(f"Train accuracy: {train_acc}")
    return model

In [None]:
import mlflow

from sklearn.base import ClassifierMixin
from sklearn.tree import DecisionTreeClassifier

from zenml.client import Client
from zenml.integrations.mlflow.steps import mlflow_model_deployer_step, MLFlowDeployerParameters

experiment_tracker = Client().active_stack.experiment_tracker

@step(enable_cache=False, experiment_tracker=experiment_tracker.name)
def tree_trainer_mlflow(
    X_train: pd.DataFrame,
    y_train: pd.Series,
) -> ClassifierMixin:
    """Train a decision tree classifier and log to MLflow."""
    mlflow.sklearn.autolog()  # log all model hparams and metrics to MLflow
    model = DecisionTreeClassifier()
    model.fit(X_train.to_numpy(), y_train.to_numpy())
    train_acc = model.score(X_train.to_numpy(), y_train.to_numpy())
    print(f"Train accuracy: {train_acc}")
    return model

### Model Evaluator and Deployment Trigger

Since our model is a [sklearn Model](https://scikit-learn.org/stable/developers/develop.html), we can simply call `model.score` to compute its test accuracy.

We then use the output of this step to only trigger deployment for models that achieved >90% test accuracy.

In [None]:
@step
def evaluator(
    X_test: pd.DataFrame,
    y_test: pd.Series,
    model: ClassifierMixin,
) -> float:
    """Calculate the accuracy on the test set"""
    test_acc = model.score(X_test.to_numpy(), y_test.to_numpy())
    print(f"Test accuracy: {test_acc}")
    return test_acc

In [None]:
@step
def deployment_trigger(test_acc: float) -> bool:
    """Only deploy if the test accuracy > 90%."""
    return test_acc > 0.9

### Model Registry, Deployer and Drift Detection

ZenML provides default steps for MLflow model registry, deployment and Evidently drift detection, which we can simply import:

In [None]:
from zenml.integrations.mlflow.steps.mlflow_deployer import MLFlowDeployerParameters, mlflow_model_registry_deployer_step
from zenml.integrations.mlflow.steps.mlflow_registry import MLFlowRegistryParameters, mlflow_register_model_step
from zenml.model_registries.base_model_registry import ModelRegistryModelMetadata

In [None]:
from zenml.integrations.evidently.steps import (
    EvidentlyProfileParameters,
    evidently_profile_step,
)

evidently_profile_params = EvidentlyProfileParameters(
    profile_sections=["datadrift"]
)
drift_detector = evidently_profile_step(
    step_name="drift_detector", params=evidently_profile_params
)

### Prediction Service Loader and Predictor

Lastly, we need to write the inference pipeline steps for loading a deployed model and computing its prediction on the test data.

To load the deployed model, we query ZenML's artifact store to find a model deployed with our current MLOps stack and the given training pipeline and deployment step names (more on this later):

In [None]:
from zenml.services import BaseService
from zenml.client import Client


@step(enable_cache=False)
def prediction_service_loader() -> BaseService:
    """Load the model service of our train_evaluate_deploy_pipeline."""
    client = Client()
    model_deployer = client.active_stack.model_deployer
    services = model_deployer.find_model_server(
        pipeline_name="training_pipeline",
        pipeline_step_name="model_deployer",
        running=True,
    )
    service = services[0]
    return service

To inference the deployed model, we simply call its `predict()` method to get logits and compute the `argmax` to obtain the final prediction:

In [None]:
@step
def predictor(
    service: BaseService,
    data: pd.DataFrame,
) -> Output(predictions=list):
    """Run a inference request against a prediction service"""
    service.start(timeout=10)  # should be a NOP if already started
    prediction = service.predict(data.to_numpy())
    prediction = prediction.argmax(axis=-1)
    print(f"Prediction is: {[prediction.tolist()]}")
    return [prediction.tolist()]

## Run the pipeline and continuously deploy with caching

Running pipelines is as simple as calling the `run()` method on an instance of the defined pipeline. Let's connect the concrete step functions to our defined pipeline.

In [None]:
training_pipeline(
    training_data_loader=training_data_loader(),
    trainer=svc_trainer_mlflow(),
    evaluator=evaluator(),
    model_register=mlflow_register_model_step(
            params=MLFlowRegistryParameters(
                name="zenml-quickstart-model",
                metadata=ModelRegistryModelMetadata(
                    gamma=0.01, arch="svc"
                ),
                description=f"The first run of the Quickstart pipeline.",
            )
        ),
).run(unlisted=True)

And now let's replace the SVC trainer with the Tree trainer.

In [None]:
training_pipeline(
    training_data_loader=training_data_loader(),
    trainer=tree_trainer_mlflow(),
    evaluator=evaluator(),
    model_register=mlflow_register_model_step(
            params=MLFlowRegistryParameters(
                name="zenml-quickstart-model",
                metadata=ModelRegistryModelMetadata(
                    arch="decision_tree"
                ),
                description=f"The second run of the Quickstart pipeline.",
            )
        ),
).run(unlisted=True)

Notice that the second pipeline ran slightly faster than the first? That's because ZenML understands that the `data_loader` step of your pipeline is unchanged, so it just reloads the output from your previous run and goes straight to the trainer part. This saves valuable time as you iterate on your pipeline.

## Run inference pipeline to deploy and inference on the registered model

After the training pipeline runs have finished, the trained model will have been registered using MLflow Model registry. We can use `zenml model-registry models list` to get an overview of all currently registered models and `zenml model-registry models list-versions` to get an overview of all versions of a specific model.

In [None]:
!zenml model-registry models list

!zenml model-registry models list-versions zenml-quickstart-model

When we run the inference pipeline, the `mlflow_model_registry_deployer_step` will load the given model version and deploy it locally. After that, the `predictor` step will use the deployed model service to make predictions on the inference data.

In [None]:
inference_pipeline(
        inference_data_loader=inference_data_loader(),
        mlflow_model_deployer=mlflow_model_registry_deployer_step(
            params=MLFlowDeployerParameters(
                registry_model_name="zenml-quickstart-model",
                registry_model_version="1",
                # or you can use the model stage if you have set it in the MLflow registry
                # registered_model_stage="None" # "Staging", "Production", "Archived"
            )
        ),
        predictor=predictor(),
        training_data_loader=training_data_loader(),
        drift_detector=drift_detector,
    ).run()

You can run `zenml model-deployer models list` to get an overview of all currently deployed models.

In [None]:
!zenml model-deployer models list

# Inspecting the outcomes

## ZenML dashboard

Once the pipeline runs have completed, we can visualize all of our ZenML 
resources in the ZenML dashboard. 
In order to spin up the dashboard, please execute the following code cell.

**Colab Note:** On Colab, you can access the ZenML dashboard via the 
`...ngrok.io` URL that will be shown in the first line of the output of the 
following code cell.
Please wait for the server to fully start up before accessing the dashboard URL, 
otherwise some resources might not have been fully loaded yet.

In [None]:
from zenml.environment import Environment
from zenml.integrations.mlflow.mlflow_utils import get_tracking_uri


def start_zenml_dashboard(port=8237):
    if Environment.in_google_colab():
        from pyngrok import ngrok

        public_url = ngrok.connect(port)
        print(f"\x1b[31mIn Colab, use this URL instead: {public_url}!\x1b[0m")
        !zenml up --blocking --port {port}

    else:
        !zenml up --port {port}

start_zenml_dashboard()

This will create a local ZenML server and connect you to it. Once connected, 
the dashboard will be available for you at the URL displayed in the command
output above. You can login with username `default` and an empty password.

![ZenML Server Up](_assets/zenml-up.gif)

On this dashboard, you will be able to manage your pipelines and the corresponding pipeline runs, your stacks and stack components and your personal settings.

## Visualize Data Skew and Data Drift

ZenML provides a variety of visualization tools in addition dashboard shown above. E.g., using the `EvidentlyVisualizer` we can visualize data drift:

In [None]:
from zenml.integrations.evidently.visualizers import EvidentlyVisualizer

inference_run = inference_pipeline.get_runs()[0]
drift_detection_step = inference_run.get_step(step="drift_detector")

EvidentlyVisualizer().visualize(drift_detection_step)

Accordingly, Evidently will also detect data drift for all four features:

<img src="_assets/data_drift.png" alt="Evidently Data Drift Visualization" width="50%"/>

## MLflow Experiment Tracking

Lastly, remember how we added MLflow experiment tracking to our `svc_trainer_mlflow` step before?
Those two simple lines of code automatically configured and initialized MLflow and logged all hyperparameters and metrics there.

Let's start up the MLflow UI and check it out!

**Colab Note:** On Colab, you can access the MLflow UI via the `...ngrok.io` URL
that will be shown in the first line of the output of the following code cell.

In [None]:
from zenml.environment import Environment
from zenml.integrations.mlflow.mlflow_utils import get_tracking_uri


def open_mlflow_ui(port=4997):
    if Environment.in_google_colab():
        from pyngrok import ngrok

        public_url = ngrok.connect(port)
        print(f"\x1b[31mIn Colab, use this URL instead: {public_url}!\x1b[0m")

    !mlflow ui --backend-store-uri="{get_tracking_uri()}" --port={port}


open_mlflow_ui()

![MLflow UI](_assets/mlflow_ui.png)

## Congratulations!

You just built your first ML Pipeline! You not only trained a model, you also deployed it, served it, and learned how to monitor and visualize everything that's going on. Did you notice how easy it was to bring all of the different components together using ZenML's abstractions? And that is just the tip of the iceberg of what ZenML can do; check out the [**Integrations**](https://zenml.io/integrations) page for a list of all the cool MLOps tools that ZenML supports!

To improve upon the ML workflows we built in this quickstart, you could, for instance:
- [Deploy ZenML on the Cloud]() to collaborate with your teammates,
- Experiment with more sophisticated models, such as [XGBoost](https://zenml.io/integrations/xgboost),
- Set up automated [Slack alerts](https://zenml.io/integrations/zen-ml-slack-integration) to get notified when data drift happens,
- Run the pipelines on scalable, distributed stacks like [Kubeflow](https://zenml.io/integrations/kubeflow).

## Where to go next

* If you have questions or feedback... 
  * Join our [**Slack Community**](https://zenml.io/slack-invite) and become part of the ZenML family!
* If this quickstart was a bit too quick for you... 
  * Check out [**ZenBytes**](https://github.com/zenml-io/zenbytes), our lesson series on practical MLOps, where we cover each MLOps concept in much more detail.
* If you want to learn more about using or extending ZenML...
  * Check out our [**Docs**](https://docs.zenml.io/) or read through our code on [**Github**](https://github.com/zenml-io/zenml).
* If you want to quickly learn how to use a specific tool with ZenML...
  * Check out our collection of [**Examples**](https://github.com/zenml-io/zenml/tree/doc/hamza-misc-updates/examples).
* If you want to see some advanced ZenML use cases... 
  * Check out [**ZenML Projects**](https://github.com/zenml-io/zenml-projects), our collection of production-grade ML use-cases.