### Lesson 1.1: ML Pipelines with ZenML   

Let's see how to easily convert existing ML code into ML Pipelines using ZenML

Since we will build our model with `Sklearn`, we need to have ZenML Sklearn integration insalled.

In [3]:
# pip install "zenml[server]" 

After run the following command in bash

`!zenml integration install sklearn -y`

### ZnML Setup  

We will define our ML pipelines using `ZenML`. It is an excellent tool for this task, as it is straightforward and intuitive to use and has `integrations` with most of the advanced MLOps tools we will want to use later. Make sur you have installed ZenML (via pip install zenml). Next, let's run some commands to make sure we start with a fresh ML stack

In [4]:
!zen -rf .zen
!zenml init 

'zen' n'est pas reconnu en tant que commande interne
ou externe, un programme ex�cutable ou un fichier de commandes.


[1;35mInitializing the ZenML global configuration version to 0.66.0[0m
⠋ Initializing ZenML repository at 
c:\Users\balde\OneDrive\Bureau\DA_DS\customer-satisfaction-mlops-main\Project_M
Lops_Customer_Satisfaction.

⠙ Initializing ZenML repository at 
c:\Users\balde\OneDrive\Bureau\DA_DS\customer-satisfaction-mlops-main\Project_M
Lops_Customer_Satisfaction.

⠹ Initializing ZenML repository at 
c:\Users\balde\OneDrive\Bureau\DA_DS\customer-satisfaction-mlops-main\Project_M
Lops_Customer_Satisfaction.
[1;35mCreating database tables[0m

⠸ Initializing ZenML repository at 
c:\Users\balde\OneDrive\Bureau\DA_DS\customer-satisfaction-mlops-main\Project_M
Lops_Customer_Satisfaction.

⠼ Initializing ZenML repository at 
c:\Users\balde\OneDrive\Bureau\DA_DS\customer-satisfaction-mlops-main\Project_M
Lops_Customer_Satisfaction.

⠧ Initializing ZenML repository at 
c:\Users\balde\OneDrive\Bureau\DA_DS\customer-satisfaction-mlops-main\Project_M
Lops_Customer_Satisfaction.

⠇ Initializing ZenML 

### Example Experimentation ML Code  

Let us get started with some simple exemplary ML code. In the following, we train a Sklearn SVC classifier to classify images of handwritten digits. We load the data, train a model on the training set, then test it on the test set.

In [5]:
import numpy as np 
from sklearn.base import ClassifierMixin
from sklearn.svm import SVC
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

In [6]:
def train_test() -> None:
    """Train and Test a sklearn SVC classifier on digits"""
    digits = load_digits()
    data = digits.images.reshape((len(digits.images), -1))
    X_train, X_test, y_train, y_test = train_test_split(
        data, digits.target, test_size=0.2, shuffle=False
    )
    model = SVC(gamma=0.001)
    model.fit(X_train, y_train)
    test_acc = model.score(X_test, y_test)
    print(f"Test accuracy: {test_acc}")
    
train_test()

Test accuracy: 0.9583333333333334


#### Turning experiments into ML pipelines with ZenML   

In practice, our ML workflows will, of course, be much more complicated than that. We might have complex preprocessing that we do not want to redo every time we train a model, we will need to compare the performance of different models, deploy them in a production setting, and much more. Here ML pipelines come into play, allowing us to define our workflows in modular steps that we can then mix and match.

We can identify three distinct steps in our example: Data Loading, model training, and model evaluation. Let us now define each of them as a ZenML Pipeline Step simply by moving each step its own function and decorating them with ZenML's @step Python decorator.

In [7]:
from zenml import step
from typing_extensions import Annotated
import pandas as pd
from typing import Tuple

In [8]:
@step
def importer() -> Tuple[
    Annotated[np.ndarray, "X_train"],
    Annotated[np.ndarray, "X_test"],
    Annotated[np.ndarray, "y_train"],
    Annotated[np.ndarray, "y_test"],
]:
    """Train and Test a sklearn SVC classifier on digits"""
    digits = load_digits()
    data = digits.images.reshape((len(digits.images), -1))
    X_train, X_test, y_train, y_test = train_test_split(
        data, digits.target, test_size=0.2, shuffle=False
    )
    return X_train, X_test, y_train, y_test


@step
def svc_trainer(
    X_train: np.ndarray,
    y_train: np.ndarray,
) -> ClassifierMixin:
    """Train an Sklearn SVC Classifier."""
    model = SVC(gamma=0.001)
    model.fit(X_train, y_train)
    return model


@step
def evaluator(
    X_test: np.ndarray,
    y_test: np.ndarray,
    model: ClassifierMixin,
) -> float:
    """Calculate the test set accuracy of an Sklearn model."""
    test_acc = model.score(X_test, y_test)
    print(f"Test accuracy: {test_acc}")
    return test_acc

Similarly, we can use ZenML's @pipeline decorator to connect all of our steps into an ML pipeline.  

Note That the pipeline definition does not depend on the concrete step function we defined above; it merely establishes a recipe for how data moves through the steps. This means we can replace steps as we wish, eg, to run the same pipeline with different models to compare their performances.

In [9]:
from zenml import pipeline

@pipeline
def digits_pipeline():
    """Links all the steps together in a pipeline"""
    X_train, X_test, y_train, y_test = importer()
    model = svc_trainer(X_train=X_train, y_train=y_train)
    evaluator(X_test=X_test, y_test=y_test, model=model)

#### Runnin ZenML Pipelines  

Finally, we initialize our Pipelines with concrete step functions and call the `run()` method to run it.

In [13]:
digits_svc_pipeline = digits_pipeline()
# digits_svc_pipeline.run(unlisted=True)


[1;35mInitiating a new run for the pipeline: [0m[1;36mdigits_pipeline[1;35m.[0m
[1;35mExecuting a new run.[0m
[1;35mUsing user: [0m[1;36mdefault[1;35m[0m
[1;35mUsing stack: [0m[1;36mdefault[1;35m[0m
[1;35m  orchestrator: [0m[1;36mdefault[1;35m[0m
[1;35m  artifact_store: [0m[1;36mdefault[1;35m[0m
[1;35mYou can visualize your pipeline runs in the [0m[1;36mZenML Dashboard[1;35m. In order to try it locally, please run [0m[1;36mzenml up[1;35m.[0m
[1;35mUsing cached version of [0m[1;36mimporter[1;35m.[0m
[1;35mStep [0m[1;36mimporter[1;35m has started.[0m
[1;35mUsing cached version of [0m[1;36msvc_trainer[1;35m.[0m
[1;35mStep [0m[1;36msvc_trainer[1;35m has started.[0m
[1;35mUsing cached version of [0m[1;36mevaluator[1;35m.[0m
[1;35mStep [0m[1;36mevaluator[1;35m has started.[0m
[1;35mPipeline run has finished in [0m[1;36m0.648s[1;35m.[0m


AttributeError: 'PipelineRunResponse' object has no attribute 'run'

In [14]:
from zenml.environment import Environment

def start_zenml_dashboard(port=8237):
    if Environment.in_google_colab():
        !zenml up --blocking --port {port}
    else:
        !zenml up --port {port}
        
start_zenml_dashboard()

Error: Running the ZenML server locally as a background process is not supported on Windows. Please use the `--blocking` flag to run the server in blocking mode, or run the server in a Docker container by setting `--docker` instead.
