# Fundamentals of zenML

ZenML is one of the simplest and more powrful MLOps packages. 
In this notebook I will do a basic zenML pipeline to get familiar with zenML operation. 
This notebook is inspired by freeCodeCamp [MLOps Course](https://www.youtube.com/watch?v=-dJPoLm_gtE).

One of the adventages of zenML is that we can versionize and track our history of models and data.
To do so, we need to create a .zen file.
To create it, go to the console and write **zenml init**.

In [None]:
# modules

# first we check the .zen file exists
from pathlib import Path
p = Path('D:/Projects/chambas/MLOps-practice/fundamentals/.zen')

if p.exists():
    from zenml import step
    from typing_extensions import Annotated
    import numpy as np
    from typing import Tuple
    from sklearn.datasets import load_digits
    from sklearn.base import ClassifierMixin
    from sklearn.svm import SVC
    from sklearn.model_selection import train_test_split
    from zenml import pipeline
    from zenml.environment import Environment
    print('modules imported')
else: print('Error: .zen file does not exist. Create it with zenml init command in console.')

ZenML works with **steps**, wich are the parts of our pipeline. 
Technically, steps are reusable units of computation.

In [None]:
# our first step will be importing the data
# the step functions need input/output specifications for a better workflow
@step
def importer() -> Tuple[Annotated[np.ndarray, 'X_train'],
                        Annotated[np.ndarray, 'X_test'],
                        Annotated[np.ndarray, 'y_train'],
                        Annotated[np.ndarray, 'y_test']]:
    '''
    Documentation:
    Train & test a SVM classifier on digits with sklearn
    '''
    # digits is a dataset incluided in sklearn that contains 1,797 8x8 images of numbers.
    digits = load_digits()

    # access each image and 'unfold' it in a single row.
    data = digits.images.reshape((len(digits.images), -1)) # the -1 argument means automatic value

    # train test split
    X_train, X_test, y_train, y_test = train_test_split(data,
                                                        digits.target,
                                                        test_size=0.2,
                                                        shuffle=False)

    # return objects congruent to specification
    return X_train, X_test, y_train, y_test

In [None]:
# our second step will be training

@step
def svc_trainer(X_train:np.ndarray,y_train:np.ndarray) -> ClassifierMixin:
    '''
    Train a sklearn SVC classifier
    '''
    model = SVC(gamma=0.001) # gamma is a hyperparameter
    model.fit(X_train, y_train) # fit model with training data
    return model

In [None]:
# our last step will be evaluation
@step
def evaluator(X_test:np.ndarray,
              y_test:np.ndarray,
              model: ClassifierMixin) -> float:
    '''
    Calculate acurracy of model in test set
    '''

    test_acc = model.score(X_test, y_test)
    print(f'Test_acurracy: {test_acc}')
    return test_acc

In [None]:
# now lets join every step in a pipeline

@pipeline
def digits_pipeline():
    '''
    Pipeline for digit image detection using a SVM classifier with sklearn.
    '''
    X_train, X_test, y_train, y_test = importer()
    model = svc_trainer(X_train, y_train)
    evaluator(X_test, y_test, model)

To execute our pipeline we simply execute our pipeline function.

In [None]:
digits_svc_pipeline = digits_pipeline()

We can visualize a diagram of our pipeline with this code.

In [None]:
def start_zenml_dashboard(port=8237):
    !zenml up --port {port} --blocking

print('go to http://localhost:8237')
start_zenml_dashboard()