### **Importing Libraries**

In [2]:
import IPython
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.base import ClassifierMixin
from sklearn.svm import SVC
from sklearn.datasets import load_digits

from zenml import step, pipeline
from typing_extensions import Annotated
import pandas as pd
from typing import Tuple


### **1. Creating and training the model**

We will wain a **dummy** Scikit-learn SVC classifier to classify images of handwritten digits. We load the data, train a model on the training set, then test it on the test set.

In [3]:
def train_test() -> None:
    """Train and test a SKLearn SVC classifier on the digits dataset."""
    digits = load_digits()
    data = digits.images.reshape((len(digits.images), -1))
    X_train, X_test, y_train, y_test = train_test_split(
        data, digits.target, test_size=0.2, shuffle=False
    )
    
    model: ClassifierMixin = SVC(gamma="scale")
    model.fit(X_train, y_train)
    test_acc = model.score(X_test, y_test)
    print(f"Test accuracy: {test_acc:.3f}")

### **2. Turning experiments into ML pipelines (with ZenML)**

ML workflows generally will be much more complex than the simple example above, involving steps like data preprocessing, feature engineering, model training, evaluation, and deployment.

ML pipelines come into play allowing to define workflows in modular steps that can be easily reused, modified, and shared. ZenML is a framework that simplifies the creation, management, and deployment of ML pipelines.

![zenml_pipeline_overview.png](src/importer-svc-evaluator.png)

In [4]:
@step
def importer() -> Tuple[
    Annotated[np.ndarray, "X_train"],
    Annotated[np.ndarray, "X_test"],
    Annotated[np.ndarray, "y_train"],
    Annotated[np.ndarray, "y_test"],
]:
    """Load and split the digits dataset."""
    digits = load_digits()
    data = digits.images.reshape((len(digits.images), -1))
    X_train, X_test, y_train, y_test = train_test_split(
        data, digits.target, test_size=0.2, shuffle=False
    )
    return X_train, X_test, y_train, y_test

@step
def svc_trainer(
    X_train: np.ndarray,
    y_train: np.ndarray,
) -> ClassifierMixin:
    """Train an SVC classifier."""
    model: ClassifierMixin = SVC(gamma="scale")
    model.fit(X_train, y_train)
    return model

@step
def evaluator(
    model: ClassifierMixin,
    X_test: np.ndarray,
    y_test: np.ndarray,
) -> float:
    """Evaluate the trained model."""
    test_acc = model.score(X_test, y_test)
    print(f"Test accuracy: {test_acc:.3f}")
    return test_acc



This same way, we can create a ZenML pipeline that encapsulates the entire workflow of loading data, training a model, and evaluating it. 

Each step in the pipeline can be defined as a separate function, making it easy to manage and modify.

In [5]:
@pipeline
def digits_pipeline():
    """Pipeline for training and evaluating an SVC on the digits dataset."""
    X_train, X_test, y_train, y_test = importer()
    model = svc_trainer(X_train, y_train)
    evaluator(model, X_test, y_test)

### **Running Pipelines**

In [6]:
digits_svc_pipeline = digits_pipeline()
# digits_svc_pipeline.run()

[37mInitiating a new run for the pipeline: [0m[38;5;105mdigits_pipeline[37m.[0m
[37mUsing user: [0m[38;5;105mdefault[37m[0m
[37mUsing stack: [0m[38;5;105mdefault[37m[0m
[37m  artifact_store: [0m[38;5;105mdefault[37m[0m
[37m  orchestrator: [0m[38;5;105mdefault[37m[0m
[37mYou can visualize your pipeline runs in the [0m[38;5;105mZenML Dashboard[37m. In order to try it locally, please run [0m[38;5;105mzenml login --local[37m.[0m
[37mUsing cached version of step [0m[38;5;105mimporter[37m.[0m
[37mUsing cached version of step [0m[38;5;105msvc_trainer[37m.[0m
[37mUsing cached version of step [0m[38;5;105mevaluator[37m.[0m
[37mAll steps of the pipeline run were cached.[0m
