# Intro to MLOps using ZenML

## 🌍 Overview

This repository is a minimalistic MLOps project intended as a starting point to learn how to put ML workflows in production.

Within this notebook we will show you how simple it is to switch from running code locally to running it remotely. You will then be able to explore all the metadata of your run in the ZenML Dashboard.

<img src=".assets/Overview.png" width="50%" alt="Quickstart Overview">

Follow along this notebook to understand how you can use ZenML to productionalize your ML workflows!

## Run on Colab

You can use Google Colab to run this notebook, no local installation
required!

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](
https://colab.research.google.com/github/zenml-io/zenml/blob/main/examples/quickstart/quickstart.ipynb)

# 👶 Step 0. Install Requirements

Let's install ZenML and all requirement to get started.

In [1]:
!pip install "zenml[server]"

Collecting zenml[server]
  Using cached zenml-0.63.0-py3-none-any.whl.metadata (21 kB)
Collecting alembic<1.9.0,>=1.8.1 (from zenml[server])
  Using cached alembic-1.8.1-py3-none-any.whl.metadata (7.2 kB)
Collecting bcrypt==4.0.1 (from zenml[server])
  Using cached bcrypt-4.0.1-cp36-abi3-manylinux_2_28_x86_64.whl.metadata (9.0 kB)
Collecting click<8.1.4,>=8.0.1 (from zenml[server])
  Using cached click-8.1.3-py3-none-any.whl.metadata (3.2 kB)
Collecting click-params<0.4.0,>=0.3.0 (from zenml[server])
  Using cached click_params-0.3.0-py3-none-any.whl.metadata (3.0 kB)
Collecting cloudpickle<3,>=2.0.0 (from zenml[server])
  Using cached cloudpickle-2.2.1-py3-none-any.whl.metadata (6.9 kB)
Collecting distro<2.0.0,>=1.6.0 (from zenml[server])
  Using cached distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting docker<7.2.0,>=7.1.0 (from zenml[server])
  Using cached docker-7.1.0-py3-none-any.whl.metadata (3.8 kB)
Collecting fastapi<=0.110,>=0.100 (from zenml[server])
  Using cached f

In [2]:
from zenml.environment import Environment

# In case we are in a google colab, clone all additional relevant files
if Environment.in_google_colab():
    # Pull required modules from this example
    !git clone -b main https://github.com/zenml-io/zenml
    !cp -r zenml/examples/quickstart/* .
    !rm -rf zenml

!pip install -r requirements.txt

Collecting pyarrow (from -r requirements.txt (line 3))
  Using cached pyarrow-17.0.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting datasets (from -r requirements.txt (line 4))
  Using cached datasets-2.20.0-py3-none-any.whl.metadata (19 kB)
Collecting transformers (from -r requirements.txt (line 5))
  Downloading transformers-4.44.0-py3-none-any.whl.metadata (43 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.7/43.7 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting torch (from -r requirements.txt (line 7))
  Using cached torch-2.4.0-cp311-cp311-manylinux1_x86_64.whl.metadata (26 kB)
Collecting sentencepiece (from -r requirements.txt (line 8))
  Using cached sentencepiece-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Collecting filelock (from datasets->-r requirements.txt (line 4))
  Using cached filelock-3.15.4-py3-none-any.whl.metadata (2.9 kB)
Collecting pyarrow-hotfix (from dat

In [5]:
# Restart Kernel to ensure all libraries are properly loaded
import IPython
IPython.Application.instance().kernel.do_shutdown(restart=True)

{'status': 'ok', 'restart': True}


Please wait for the installation to complete before running subsequent cells. At
the end of the installation, the notebook kernel will restart.

## ☁️ Step 1: Connect to your ZenML Server
To run this quickstart you need to connect to a ZenML Server. You can deploy it [yourself on your own infrastructure](https://docs.zenml.io/getting-started/deploying-zenml) or try it out for free, no credit-card required in our [ZenML Pro managed service](https://zenml.io/pro).

In [3]:
zenml_server_url = "INSERT_YOUR_SERVER_URL_HERE"  # in the form "https://URL_TO_SERVER"

!zenml connect --url $zenml_server_url

[2;36mConnecting to: [0m[2;32m'INSERT_YOUR_SERVER_URL_HERE'[0m[2;33m...[0m
[31m╭─[0m[31m────────────────────[0m[31m [0m[1;31mTraceback [0m[1;2;31m(most recent call last)[0m[31m [0m[31m─────────────────────[0m[31m─╮[0m
[31m│[0m [2;33m/home/alexej/.pyenv/versions/clean3/bin/[0m[1;33mzenml[0m:[94m8[0m in [92m<module>[0m                  [31m│[0m
[31m│[0m                                                                              [31m│[0m
[31m│[0m   [2m5 [0m[94mfrom[0m [4;96mzenml[0m[4;96m.[0m[4;96mcli[0m[4;96m.[0m[4;96mcli[0m [94mimport[0m cli                                            [31m│[0m
[31m│[0m   [2m6 [0m[94mif[0m [91m__name__[0m == [33m'[0m[33m__main__[0m[33m'[0m:                                               [31m│[0m
[31m│[0m   [2m7 [0m[2m│   [0msys.argv[[94m0[0m] = re.sub([33mr[0m[33m'[0m[33m(-script[0m[33m\[0m[33m.pyw|[0m[33m\[0m[33m.exe)?$[0m[33m'[0m, [33m'[0m[33m'[0m, sys.

In [2]:
# Initialize ZenML and define the root for imports and docker builds
!zenml init

[?25l[2;36mFound existing ZenML repository at path [0m
[2;32m'/home/alexej/PycharmProjects/zenml/examples/quickstart'[0m[2;36m.[0m
[2;32m⠋[0m[2;36m Initializing ZenML repository at [0m
[2;36m/home/alexej/PycharmProjects/zenml/examples/quickstart.[0m
[2K[1A[2K[1A[2K[32m⠋[0m Initializing ZenML repository at 
/home/alexej/PycharmProjects/zenml/examples/quickstart.

[1A[2K[1A[2K[1A[2K

## 🥇 Step 2: Build and run your first pipeline

In this quickstart we'll be working with a small dataset of sentences in old english paired with more modern formulations. The task is a text-to-text transformation.

When you're getting started with a machine learning problem you'll want to break down your code into distinct functions that load your data, bring it into the correct shape and finally produce a model. This is the experimentation phase where we try to massage our data into the right format and feed it into our model training.

<img src=".assets/Experiment.png" width="50%" alt="Experiment">

In [5]:
from zenml import step
from datasets import Dataset
from typing_extensions import Annotated


@step
def load_data() -> Annotated[Dataset, "raw_dataset"]:
    """Load and prepare the dataset."""

    def read_data(file_path):
        inputs = []
        targets = []

        with open(file_path, "r", encoding="utf-8") as file:
            for line in file:
                old, modern = line.strip().split("|")
                inputs.append(f"Translate Old English to Modern English: {old}")
                targets.append(modern)

        return {"input": inputs, "target": targets}

    # Assuming your file is named 'translations.txt'
    data = read_data("translations.txt")
    return Dataset.from_dict(data)

ZenML is built in a way that allows you to experiment with your data and build
your pipelines one step at a time.  If you want to call this function to see how it
works, you can just call it directly. Here we take a look at the first few rows
of your training dataset.

In [11]:
dataset = load_data()
dataset[0]

{'input': "translate Old English to Modern English: Shall I compare thee to a summer's day?",
 'target': 'Should I compare you to a summer day?'}

Everything looks as we'd expect and the input/output pair looks to be in the right format 🥳.

For the sake of this quickstart we have prepared a few steps in the steps-directory. We'll now connect these together into a pipeline. To do this simply plug multiple steps together through their inputs and outputs. Then just add the `@pipeline` decorator to the function that connects the steps.

In [14]:
from zenml import pipeline
from zenml.client import Client

from steps import load_data, tokenize_data, train_model, evaluate_model, test_random_sentences
from steps.model_trainer import T5_Model

# Initialize the ZenML client to fetch objects from the ZenML Server
client = Client()

Client().activate_stack("default") # We will start by using the default stack which is local

@pipeline(enable_cache=True)
def english_translation_pipeline(
    model_type: T5_Model,
    num_train_epochs: int,
    per_device_train_batch_size: int,
    gradient_accumulation_steps: int,
    dataloader_num_workers: int,
):
    """Define a pipeline that connects the steps."""
    dataset = load_data()
    tokenized_dataset = tokenize_data(dataset)
    model, tokenizer = train_model(
        tokenized_dataset,
        model_type,
        num_train_epochs,
        per_device_train_batch_size,
        gradient_accumulation_steps,
        dataloader_num_workers,
    )
    evaluate_model(model, tokenized_dataset)
    test_random_sentences(model, tokenizer)

We're ready to run the pipeline now, which we can do just as with the step - by calling the
pipeline function itself:

In [15]:
# Run the pipeline and configure some parameters at runtime
pipeline_run = english_translation_pipeline(
    model_type="t5-small",
    num_train_epochs=3,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    dataloader_num_workers=4
)

[1;35mInitiating a new run for the pipeline: [0m[1;36menglish_translation_pipeline[1;35m.[0m
[1;35mExecuting a new run.[0m
[1;35mCaching is disabled by default for [0m[1;36menglish_translation_pipeline[1;35m.[0m
[1;35mUsing user: [0m[1;36malexej@zenml.io[1;35m[0m
[1;35mUsing stack: [0m[1;36mdefault[1;35m[0m
[1;35m  artifact_store: [0m[1;36mdefault[1;35m[0m
[1;35m  orchestrator: [0m[1;36mdefault[1;35m[0m
[1;35mDashboard URL: https://cloud.zenml.io/organizations/fc992c14-d960-4db7-812e-8f070c99c6f0/tenants/8a462fb6-b1fe-48df-9677-edc76bc8352d/runs/6fefd974-20bf-4cde-a7e1-977b8617c6d6[0m
[1;35mStep [0m[1;36mload_data[1;35m has started.[0m
[33mNo materializer is registered for type [0m[1;36m<class 'datasets.arrow_dataset.Dataset'>[33m, so the default Pickle materializer was used. Pickle is not production ready and should only be used for prototyping as the artifacts cannot be loaded when running with a different Python version. Please consider imp

Map:   0%|          | 0/456 [00:00<?, ? examples/s]

[33mNo materializer is registered for type [0m[1;36m<class 'datasets.arrow_dataset.Dataset'>[33m, so the default Pickle materializer was used. Pickle is not production ready and should only be used for prototyping as the artifacts cannot be loaded when running with a different Python version. Please consider implementing a custom materializer for type [0m[1;36m<class 'datasets.arrow_dataset.Dataset'>[33m according to the instructions at https://docs.zenml.io/how-to/handle-data-artifacts/handle-custom-data-types[0m
[1;35mStep [0m[1;36mtokenize_data[1;35m has finished in [0m[1;36m1.795s[1;35m.[0m
[1;35mStep [0m[1;36mtokenize_data[1;35m completed successfully.[0m
[1;35mCaching [0m[1;36mdisabled[1;35m explicitly for [0m[1;36mtrain_model[1;35m.[0m
[1;35mStep [0m[1;36mtrain_model[1;35m has started.[0m


Step,Training Loss
10,10.4456
20,6.8383
30,4.886
40,3.0748
50,2.0462
60,1.5721
70,1.3931
80,1.2386
90,1.0514
100,0.9951


Test translation: Translat Translate Old English Translate English Old
Model training completed and saved.
[1;35mStep [0m[1;36mtrain_model[1;35m has finished in [0m[1;36m6m5s[1;35m.[0m
[1;35mStep [0m[1;36mtrain_model[1;35m completed successfully.[0m
[1;35mStep [0m[1;36mevaluate_model[1;35m has started.[0m
Average loss on the dataset: 0.40942612290382385
[1;35mStep [0m[1;36mevaluate_model[1;35m has finished in [0m[1;36m30.768s[1;35m.[0m
[1;35mStep [0m[1;36mevaluate_model[1;35m completed successfully.[0m
[1;35mStep [0m[1;36mtest_random_sentences[1;35m has started.[0m
Generated Old English: Zounds mew'd wicked trouble Colossus perfumes hungry Tartar's stirring Yond
Model Translation: Englisch: Vom Englischen zu Englisch  Die Übersetzungen: Englisch - Anglais: Die Sprache Deutsch: Spanisch: Von Englisch zu Sprachen: Sprache: English: Anglo-Zeit: Das ganze Jahr über:

Generated Old English: thee Tongue mew'd Yond Bestride Brave doth Valiant mew'd
Model Tr

As you can see the pipeline has run succesfully. It also printed out some examples - however it seems the model did not perform well.

Lets check have a look at this Pipeline in the ZenML Dashboard. You can find the URL above in the logs. 

We can also fetch the pipeline from the server and view the results directly in the notebook:

In [None]:
client = Client()
run = client.get_pipeline("english_translation_pipeline").last_run
print(run.name)

We can also access the trained model directly by accessing the run. The cool thing here is, this direct access is gonna be available not just now, within this notebook session, but at any later point when you or your colleaugues might need it.

In [None]:
# load the model object
model = run.steps["train_model"].outputs["model"].load()
tokenizer = run.steps["train_model"].outputs["tokenizer"].load()

With these in hand we can now play around with the model directly and try out some examples ourselves:

In [None]:
test_text = "I do desire we may be better strangers"

input_ids = tokenizer(
    test_text,
    return_tensors="pt",
    max_length=128,
    truncation=True,
    padding="max_length",
).input_ids

with torch.no_grad():
    outputs = model.generate(
        input_ids,
        max_length=128,
        num_return_sequences=1,
        no_repeat_ngram_size=2,
        top_k=50,
        top_p=0.95,
        temperature=0.7,
    )

decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(decoded_output)

## Lets recap what we've done so far

1) We have created a pipeline that takes in a dataset and trains a small Model on it
2) This pipeline is broken down in such a way that we can easily iterate on the individual parts without breaking the whole

As expected, the performance of this model is not good. To train a model that can solve our task well, we wopuld have to train a larger model. For this, we'll need to move onto the next section.

# ⌚ Step 3: Scale it up in the cloud

The model we have trained on our local machine is proof that our pipeline works. However, we want to train a much more powerful model. For this we'll need to scale onto more powerful machines.

So lets take this to the next level and run the pipeline in the environment of your choice. In ZenML we use the word "stack" to describe the environment for a pipline run. A stack consists of different components, however you'll only really need to care about the compute and data storage for this example.

For you to be able to try this step, you will need to have acess to some cloud compute and cloud storage somewhere (aws, gcp, azure, etc...). ZenML wrapps around all the major cloud providers and orchestration tools and lets you easily plug your code into them.

To do this lets head over to the Stack section of your ZenML Dashboard. Here you'll be able to either connect to an existing or deploy a new environment. Choose on of the options presented to you there and come back when you have a stack ready to go. 

<img src=".assets/stack_creation.png" width="50%" alt="Pipelines Overview">

## GCP

In [None]:
!zenml integration install gcp -y

from zenml.client import Client
from zenml.integrations.gcp.flavors.vertex_orchestrator_flavor import VertexOrchestratorSettings

# Set the name of your stack here
stack_name = "alexej-gcp-quickstart-stack2"

Client().activate_stack(stack_name)

configured_english_translation_pipeline = english_translation_pipeline.with_options(
    settings={
        "docker": DockerSettings(
            parent_image="pytorch/pytorch:2.4.0-cuda11.8-cudnn9-runtime",
            requirements=["zenml==0.63.0","pyarrow","datasets","transformers[torch]","sentencepiece"],
            environment={"ZENML_DISABLE_STEP_LOGS_STORAGE": True}
        ),
        "resources": ResourceSettings(memory="32GB"),
        "orchestrator.vertex": VertexOrchestratorSettings(node_selector_constraint=("cloud.google.com/gke-accelerator", "NVIDIA_TESLA_P4"))
    }
)

## AWS

In [None]:
!zenml integration install aws s3 -y

from zenml.client import Client
from zenml.integrations.aws.flavors.sagemaker_orchestrator_flavor import SagemakerOrchestratorSettings

# Set the name of your stack here
stack_name = "alexej-aws-quickstart-stack"

Client().activate_stack(stack_name)

configured_english_translation_pipeline = english_translation_pipeline.with_options(
    settings={
        "docker": DockerSettings(
            parent_image="pytorch/pytorch:2.4.0-cuda11.8-cudnn9-runtime",
            requirements=["zenml==0.63.0","pyarrow","datasets","transformers[torch]","sentencepiece"],
            environment={"ZENML_DISABLE_STEP_LOGS_STORAGE": True}
        ),
        "resources": ResourceSettings(memory="32GB"),
        "orchestrator.sagemaker": SagemakerOrchestratorSettings(instance_type="ml.p2.xlarge")
    }
)

## Azure

In [None]:
!zenml integration install azure -y

from zenml.client import Client
from zenml.integrations.skypilot.flavors.skypilot_orchestrator_base_vm_config import SkypilotBaseOrchestratorSettings

# Set the name of your stack here
stack_name = "azure-quickstart-stack"

Client().activate_stack(stack_name)

configured_english_translation_pipeline = english_translation_pipeline.with_options(
    settings={
        "docker": DockerSettings(
            parent_image="pytorch/pytorch:2.4.0-cuda11.8-cudnn9-runtime",
            requirements=["zenml==0.63.0","pyarrow","datasets","transformers[torch]","sentencepiece"],
            environment={"ZENML_DISABLE_STEP_LOGS_STORAGE": True}
        ),
        "orchestrator.sagemaker": SkypilotBaseOrchestratorSettings(accelerators='V100', memory="32+", cpus="8+")
    }
)

## Ready to launch

We now have configured zenml to use your very own cloud infrastructure for the next pipeline run, lets see this in action by running the pipeline again on the smaller t5 model (`t5_small`).

Note: The whole process may take a bit longer the first time around, as your pipeline code needs to be built into docker containers to be run in the orchestration environment of your stack. Any consecutive run of the pipeline, even with different parameters set, will not take as long again thanks to docker caching.

In [None]:
pipeline_run = configured_english_translation_pipeline(
    model_type="t5-small",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    gradient_accumulation_steps=4,
    dataloader_num_workers=0
)

Once this pipeline has run through fully, lets check how this model on our example:

## Now its Up to you

...

## Congratulations!

You're a legit MLOps engineer now! You have created a training pipeline and you
have deployed it into a production-ready environment with the compute of your 
choice. You also have gotten a hang of the ZenML Dashboard.

## Further exploration

This was just the tip of the iceberg of what ZenML can do; check out the [**docs**](https://docs.zenml.io/) to learn more
about the capabilities of ZenML. For example, you might want to:

- [Deploy ZenML](https://docs.zenml.io/user-guide/production-guide/connect-deployed-zenml) to collaborate with your colleagues.
- Run the same pipeline on a [cloud MLOps stack in production](https://docs.zenml.io/user-guide/production-guide/cloud-stack).
- Track your metrics in an experiment tracker like [MLflow](https://docs.zenml.io/stacks-and-components/component-guide/experiment-trackers/mlflow).

## What next?

* If you have questions or feedback... join our [**Slack Community**](https://zenml.io/slack) and become part of the ZenML family!
* If you want to quickly get started with ZenML, check out [ZenML Pro](https://zenml.io/pro).