### Introduction to the Project: Rapidly Ramping Up on ZenML

#### Overview

In the fast-evolving field of machine learning (ML), the ability to efficiently operationalize models is crucial. This project is an exploration into ZenML, an MLOps framework designed to simplify and streamline the process of building and managing ML workflows. The objective is to quickly ramp up on ZenML, demonstrating its utility and effectiveness in a practical context.

#### Project Context

Machine Learning engineers often face challenges related to the scalability, reproducibility, and deployment of ML models. Traditional approaches can lead to cumbersome and disjointed workflows. ZenML addresses these issues, providing an elegant and powerful solution for ML operations (MLOps). This project serves as a hands-on introduction to ZenML, showcasing its capabilities through a concrete example.

#### Learning Objectives

The key learning objectives of this project include:

- Gaining practical experience with ZenML as an MLOps tool.
- Understanding how to transition from traditional ML workflows to those managed by ZenML.
- Demonstrating the ease of building, running, and monitoring ML pipelines with ZenML.
- Highlighting the advantages of using an MLOps framework in terms of scalability, reproducibility, and efficiency.

## Simple ML Model for Handwriting Recognition
   - **Data Loading and Preprocessing**: It begins with loading the `load_digits` dataset from `sklearn`, consisting of hand-written digits. The data is reshaped to fit the model's requirements.
   - **Train-Test Split**: The dataset is divided into training and testing sets, a crucial step for evaluating model performance on unseen data.
   - **Model Training**: A Support Vector Classifier (SVC) from Scikit-learn is employed. SVC is a widely used algorithm for classification tasks, and a specific gamma value (0.001) is selected for the model.
   - **Model Evaluation**: Post-training, the model's performance is evaluated on the test set, and the test accuracy is reported. This step assesses the effectiveness of the model.

In [20]:
import numpy as np
from sklearn.base import ClassifierMixin
from sklearn.svm import SVC
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

def train_test() -> None:
    """Train and test a Scikit-learn SVC classifier on digits"""

    digits = load_digits()
    data = digits.images.reshape((len(digits.images), -1))
    X_train, X_test, y_train, y_test = train_test_split(
        data, digits.target, test_size=0.2, shuffle=False
    )

    model = SVC(gamma=0.001)
    model.fit(X_train, y_train)
    test_acc = model.score(X_test, y_test)
    print(f"Test accuracy: {test_acc}")
train_test()

Test accuracy: 0.9583333333333334


## Creating ML Pipeline with ZenML
   - **Defining Steps with Decorators**:
     - `importer`: A function for importing and preprocessing data.
     - `svc_trainer`: This function takes training data to train the SVC model.
     - `evaluator`: It evaluates the trained model's performance on the test dataset.
   - These functions are decorated with `@step`, making them reproducible and isolated steps in an ML pipeline.
   - **Pipeline Definition**: A `digits_pipeline` is defined using the ZenML `@pipeline` decorator, orchestrating the data importing, model training, and evaluation steps.
   - **Pipeline Execution**: The pipeline is instantiated and executed, demonstrating the simplicity and efficiency of using ZenML for ML pipelines.


In [4]:
from zenml import step
from typing import Annotated
import pandas as pd
from typing import Tuple

@step
def importer() -> Tuple[
    Annotated[np.ndarray, "X_train"],
    Annotated[np.ndarray, "X_test"],
    Annotated[np.ndarray, "y_train"],
    Annotated[np.ndarray, "y_test"]]:
    digits = load_digits()
    data = digits.images.reshape((len(digits.images), -1))
    X_train, X_test, y_train, y_test = train_test_split(
        data, digits.target, test_size=0.2, shuffle=False
    )
    return X_train, X_test, y_train, y_test

@step
def svc_trainer(X_train:np.ndarray, y_train:np.ndarray)-> ClassifierMixin:
    model = SVC(gamma=0.001)
    model.fit(X_train, y_train)
    return model

@step
def evaluator(X_test:np.ndarray, y_test:np.ndarray,model:ClassifierMixin)-> float:
    test_acc = model.score(X_test, y_test)
    print(f"Test accuracy: {test_acc}")
    return test_acc

In [6]:
from zenml import pipeline

@pipeline
def digits_pipeline():
     X_train, X_test, y_train, y_test = importer()
     model = svc_trainer(X_train=X_train, y_train=y_train)
     evaluator(X_test=X_test, y_test=y_test, model=model)

In [19]:
digits_svc_pipeline = digits_pipeline()

[1;35mInitiating a new run for the pipeline: [0m[1;36mdigits_pipeline[1;35m.[0m
[1;35mReusing registered version: [0m[1;36m(version: 1)[1;35m.[0m
[1;35mExecuting a new run.[0m
[1;35mUsing user: [0m[1;36mdefault[1;35m[0m
[1;35mUsing stack: [0m[1;36mdefault[1;35m[0m
[1;35m  orchestrator: [0m[1;36mdefault[1;35m[0m
[1;35m  artifact_store: [0m[1;36mdefault[1;35m[0m
[1;35mUsing cached version of [0m[1;36mimporter[1;35m.[0m
[1;35mStep [0m[1;36mimporter[1;35m has started.[0m
[1;35mUsing cached version of [0m[1;36msvc_trainer[1;35m.[0m
[1;35mLinking artifact [0m[1;36moutput[1;35m to model [0m[1;36mNone[1;35m version [0m[1;36mNone[1;35m implicitly.[0m
[1;35mStep [0m[1;36msvc_trainer[1;35m has started.[0m
[1;35mUsing cached version of [0m[1;36mevaluator[1;35m.[0m
[1;35mLinking artifact [0m[1;36moutput[1;35m to model [0m[1;36mNone[1;35m version [0m[1;36mNone[1;35m implicitly.[0m
[1;35mStep [0m[1;36mevaluator[1;35m h

In [15]:
!zenml up --blocking --port {port}