# Exercice 1 : evalpipeline : Build a KFP Evaluation Pipeline

**In this exercice, you will have to create a pipeline that :**
- Load data from : `bucket = "exam-assets" minio_path = 'datasets/wine/wine.parquet'`
- Load model from : `bucket = "exam-assets" minio_path = 'model/wine/winemodel.joblib'`
- Test the model on some sample to get, export and print the accurary into the KFP experiment view.

**This exercice can give you 5 points, here is how :** 


| criteria  | description  | score  |  
|---|---|---|
|  A pipeline with the right name has been submitted into your namespace.  | it should be `{{username-pipeline-exotitle-date}}` Example for this exercice : `john-doe-evalpipeline-2023-02-06T09:08:09`  |  1 |
|  Pipeline has run successfully at least once |  all components of the final pipeline should have ended successfully at least once  --> green checkbox | 2  |
|  The accuracy has been exported as pipeline metric | Looking at experiment view, we can see accurary metric related to your run  | 2  |

![evalpipeline](./images/evalpipeline.png)

In [4]:
# Here is what depandancies you'll need
import kfp as kfp
import kfp.dsl as dsl
from kfp import components
import os
from kfp.components import InputPath, OutputPath, create_component_from_func
from minio import Minio
import urllib3
import datetime as dt
import pandas as pd

## Components definition

**You will need at least 2 component**, one that load data from minio bucket, and the second that load the model from minio bucket, and predict a sample of the data loaded by the first component. You are free to try any other pipeline architecture.

You can start from components from datapipeline 2 technical session.

For the minio access_key and secret key : use :

```
access_key="minio-kserve",
secret_key="minio-kserve",
```

### Get Data component

In [None]:
def get_data_from_minio(
    minio_path: str,
    bucket: str,
    file_path: OutputPath(),
    ):
    
    import numpy
    from io import BytesIO
    import pandas as pd
    import urllib3
    from minio import Minio
    import os
    import pyarrow

    ...

    try:
        ...
    finally:
        ...
        
    ### pass dataset to component output
    ...

### Build the get_data component component

In [None]:
create_component_from_func(
    ...,
    output_component_file='components/get_data.yaml',
    base_image='python:3.8',
    packages_to_install=[
        'numpy==1.22.3',
        'minio==6.0.2',
        'pandas==1.0.5',
        'pyarrow==10.0.1'
    ],
)

### Eval model component

For the wine classification dataset, the already trained model is a : 
  `sklearn.tree.DecisionTreeClassifier()`. 
  
It has been persisted in `.joblib` file, so you can load it like : 

```python
from joblib import load
model = load(model_object)
```

In [None]:
def sklearn_read_predict(
    training_data_path: InputPath('CSV'),  # Also supports LibSVM
    minio_path: str,
    bucket: str,
    mlpipeline_metrics_path: OutputPath('Metrics'),
    label_column: int = 0,
):
    from sklearn import metrics
    import urllib3
    from sklearn.model_selection import train_test_split
    from sklearn import tree
    import pandas as pd
    import numpy
    import pyarrow
    from io import BytesIO
    from minio import Minio
    import os
    from joblib import load
    import json
    
    # Evaluating
    from sklearn.metrics import accuracy_score

    minio_path = 'model/wine/winemodel.joblib'
    bucket = "exam-assets"

    ...

    try:
        ...
    finally:
        ...
    

    
    ## Load dataset
    ...
    
    ### split data ###

    ...

    # Evaluating the model
    ...
        
        
    ### Save and exports metrics ###
    ...


### Build model eval component

In [None]:
create_component_from_func(
    ...,
    output_component_file='components/sklearn_read_predict.yaml',
    base_image='python:3.8',
    packages_to_install=[
        'urllib3==1.26.5',
        'minio==6.0.2',
        'scikit-learn==0.24.2',
        'numpy==1.22.3',
        'pandas==1.0.5',
        'pyarrow'
    ],
)

### Pipeline

In [None]:
@dsl.pipeline(name='evaluate_model_performance')
def evaluate_model_pipeline(namespace=namespace):
    import datetime
    from kfp.onprem import use_k8s_secret
    
    bucket='exam-assets'
    
    data ...
    
    model ...

#### Create KFP client

In [None]:
### a token has been automatically provided in the KF_PIPELINES_SA_TOKEN_PATH variable. This token allow accès to only your namespace
token_file = os.getenv("KF_PIPELINES_SA_TOKEN_PATH")
with open(token_file) as f:
    token = f.readline()
client = kfp.Client(host='http://ml-pipeline.kubeflow.svc.cluster.local:8888',
               existing_token=token)

In [None]:
EXPERIMENT_NAME = ''
username=''

### Submit the pipeline

In [None]:
run_id = client.create_run_from_pipeline_func(
    pipeline_func=..., 
    namespace=namespace, 
    experiment_name=EXPERIMENT_NAME,
    run_name=...,
    arguments={},
).run_id
print("Run ID: ", run_id)

## End of the exercice 