### Prerequisites
check to see if kfp is installed:

In [91]:
! pip3 show kfp

Name: kfp
Version: 1.6.6
Summary: KubeFlow Pipelines SDK
Home-page: https://github.com/kubeflow/pipelines
Author: The Kubeflow Authors
Author-email: 
License: UNKNOWN
Location: /home/jovyan/.local/lib/python3.8/site-packages
Requires: click, kubernetes, absl-py, PyYAML, tabulate, fire, google-cloud-storage, protobuf, cloudpickle, docstring-parser, jsonschema, Deprecated, google-auth, strip-hints, kfp-server-api, google-api-python-client, kfp-pipeline-spec, requests-toolbelt
Required-by: 


### 1. Configure Credentials
In order for KFServing to access MinIO, the credentials must be added to the default service account.

KFServing is imported as a pipeline component (ContainerOp) in this notebook. Consequently, it does not allow configuration of custom service accounts.

In [92]:
%%writefile minio_secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: minio-s3-secret
  annotations:
     serving.kubeflow.org/s3-endpoint: minio-service.kubeflow:9000
     serving.kubeflow.org/s3-usehttps: "0" # Default: 1. Must be 0 when testing with MinIO!
type: Opaque
data:
  AWS_ACCESS_KEY_ID: bWluaW8=
  AWS_SECRET_ACCESS_KEY: bWluaW8xMjM=
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: default
secrets:
  - name: minio-s3-secret

Overwriting minio_secret.yaml


In [93]:
! kubectl apply -f minio_secret.yaml

secret/minio-s3-secret unchanged
serviceaccount/default configured


### 2. Configure access Minio
Upload your Dataset to Minio
First, we configure credentials for mc, the MinIO command line client. We then use it to create a bucket, upload the dataset to it, and set access policy so that the pipeline can download it from MinIO.

Follow the steps below to download minio client


    wget https://dl.min.io/client/mc/release/linux-amd64/mc
    chmod +x mc
    ./mc --help
    

In [94]:
! wget https://dl.min.io/client/mc/release/linux-amd64/mc
! chmod +x mc
! ./mc --help

--2021-07-27 09:34:55--  https://dl.min.io/client/mc/release/linux-amd64/mc
Resolving dl.min.io (dl.min.io)... 178.128.69.202
Connecting to dl.min.io (dl.min.io)|178.128.69.202|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21876736 (21M) [application/octet-stream]
Saving to: ‘mc.2’


2021-07-27 09:34:56 (33.9 MB/s) - ‘mc.2’ saved [21876736/21876736]

NAME:
  mc - MinIO Client for cloud storage and filesystems.

USAGE:
  mc [FLAGS] COMMAND [COMMAND FLAGS | -h] [ARGUMENTS...]

COMMANDS:
  alias      set, remove and list aliases in configuration file
  ls         list buckets and objects
  mb         make a bucket
  rb         remove a bucket
  cp         copy objects
  mirror     synchronize object(s) to a remote site
  cat        display object contents
  head       display first 'n' lines of an object
  pipe       stream STDIN to an object
  share      generate URL for temporary access to an object
  find       search for objects
  sql        run sql queries

#### a. Connect to the Minio Server

In [95]:
! ./mc alias set minio http://minio-service.kubeflow:9000 minio minio123

[m[32mAdded `minio` successfully.[0m
[0m

#### b. Create a bucket to store your data and export your model to Minio
Make sure you clear this bucket once you are cone running your pipeline

In [96]:
! ./mc mb minio/airlinecust2

[m[32;1mBucket created successfully `minio/airlinecust2`.[0m
[0m

#### c. Upload the dataset to your bucket in Minio.
Note: Make sure you have your dataset in a folder like we have here as datasets.

In [97]:
! tar --dereference -czf datasets.tar.gz ./datasets
! ./mc cp datasets.tar.gz minio/airlinecust2/datasets.tar.gz
! ./mc policy set download minio/airlinecust2

...ts.tar.gz:  1.55 MiB / 1.55 MiB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 27.88 MiB/s 0s[0m[0m[m[32;1m[m[32;1mAccess permission for `minio/airlinecust2` is set to `download`[0m
[0m

#### If you have downloaded your data too many times while testing, use the following code to clear out your bucket

In [98]:
#! ./mc rm --recursive --force minio/airlinecust2

#### Minio Server URL and Credentials

In [99]:
MINIO_SERVER='minio-service.kubeflow:9000'
MINIO_ACCESS_KEY='minio'
MINIO_SECRET_KEY='minio123'

#### Implement Kubeflow Pipelines Components
In this pipeline, we have the following components:

Dataset download component
Preprocess the dataset component
Train the model component
Make predictions component
Export the trained model component

In [100]:
from typing import NamedTuple
import kfp
import kfp.components as components
import kfp.dsl as dsl
from kfp.components import InputPath, OutputPath #helps define the input & output between the components
NAMESPACE = 'sooter'

#### Component 1: Download the Data Set

In [101]:
def download_dataset(minio_server: str, data_dir: OutputPath(str)):
    """Download the data set to the KFP volume to share it among all steps"""
    import urllib.request
    import tarfile
    import os
    import subprocess

    if not os.path.exists(data_dir):
        os.makedirs(data_dir)

    url = f'http://{minio_server}/airlinecust2/datasets.tar.gz'
    print(url)
    stream = urllib.request.urlopen(url)
    print('done downloading')
    tar = tarfile.open(fileobj=stream, mode="r|gz")
    tar.extractall(path=data_dir)
    print('done extracting')
    
    
    subprocess.call(["ls", "-dlha", data_dir])

#### Component 2: Preprocess Dataset

In [102]:
#df = pd.read_csv("https://raw.githubusercontent.com/charlesa101/KubeflowUseCases/draft/Airline%20Customer%20Satisfaction/data/raw/Invistico_Airline.csv?token=AOWDH2M6SCIG4L7PTKFANZDBBECXM")

In [103]:
def preprocess(data_dir: InputPath(str), clean_data_dir: OutputPath(str)):
    
    import numpy as np
    import sys, subprocess;
    subprocess.run([sys.executable, '-m', 'pip', 'install', 'pandas'])
    subprocess.run([sys.executable, '-m', 'pip', 'install', 'scikit-learn'])
    from sklearn.model_selection import KFold
    from sklearn.model_selection import train_test_split  # splitting the data
    import pandas as pd
    import pickle
    import os
    
    # Get data
    
    #df = pd.read_csv(f"{data_dir}/datasets/Invistico_Airline.csv")
    #df = pd.read_csv("datasets/Invistico_Airline.csv")
    df = pd.read_csv("https://raw.githubusercontent.com/charlesa101/KubeflowUseCases/draft/Airline%20Customer%20Satisfaction/data/raw/Invistico_Airline.csv?token=AOWDH2M6SCIG4L7PTKFANZDBBECXM")
    
    #print(data)
    
    #drop rows with missing values
    df.dropna(inplace=True)
    #new column total delay
    df['total_delay'] = df['Departure Delay in Minutes'] + df['Arrival Delay in Minutes']
    
    #drop 'Departure Delay in Minutes',and 'Arrival Delay in Minutes'
    df.drop(columns=['Departure Delay in Minutes','Arrival Delay in Minutes'], inplace=True)
    
        #satisfied and dissatisfied in number 
    satisfaction_map = {"satisfied": 1,"dissatisfied": 0 }
    df['satisfaction']  = df['satisfaction'].map(satisfaction_map)

    #Male and Female in number 
    Gender_map = {"Male": 1,"Female": 2 }
    df['Gender']  = df['Gender'].map(Gender_map)

    #Loyal and disloyal in number 
    Customer_Type_map = {"Loyal Customer": 1,"disloyal Customer": 0 }
    df['Customer Type']  = df['Customer Type'].map(Customer_Type_map)

    #Business travel and Business travel in number 
    Type_of_Travel_map = {"Business travel": 1,"Personal Travel": 2 }
    df['Type of Travel']  = df['Type of Travel'].map(Type_of_Travel_map)

    #Business and Eco and Eco plus in number 
    Class_map = {"Business": 1,"Eco": 3, "Eco Plus": 2 }
    df['Class']  = df['Class'].map(Class_map)

    cols = ['Flight Distance', 'total_delay', 'Checkin service', 'On-board service']

    Q1 = df[cols].quantile(0.25)
    Q3 = df[cols].quantile(0.75)
    IQR = Q3 - Q1

    df = df[~((df[cols] < (Q1 - 1.5 * IQR)) |(df[cols] > (Q3 + 1.5 * IQR))).any(axis=1)]   
    
    #Split dataset
    
    X = df.drop('satisfaction',axis=1)
    y = df['satisfaction'] 
    X_train,X_test, y_train, y_test = train_test_split(X,y, test_size = 0.3, random_state = 111)
    
   
    data = {"X_train": X_train,"X_test": X_test, "y_train": y_train,"y_test": y_test}
    
    os.makedirs(clean_data_dir, exist_ok=True)

    with open(os.path.join(clean_data_dir,'clean_data.pickle'), 'wb') as f:
        pickle.dump(data, f)
    
    print(f"clean_data.pickle {clean_data_dir}")
    
    print(os.listdir(clean_data_dir))
    
    print("Preprocessing Done")

#### Component 3: Training the data with Tensorflow Model

In [104]:
def train_model(clean_data_dir: InputPath(str), model_dir: OutputPath(str)):
    
    # Install all the dependencies inside the function
    import numpy as np
    import pickle
    import os
    import sys, subprocess;
    subprocess.run([sys.executable, '-m', 'pip', 'install', 'catboost==0.24.2'])
    import pandas as pd
    # import libraries for training

    from numpy.random import seed

    import tensorflow as tf
    tf.random.set_seed(221)
    from tensorflow import keras
    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import Dense, Dropout, BatchNormalization
    from tensorflow.keras.optimizers import SGD, Adam, RMSprop
    
    #load the preprocessed data
    
    print(clean_data_dir)
    with open(os.path.join(clean_data_dir,'clean_data.pickle'), 'rb') as f:
        data = pickle.load(f)
        
    print(data)
    
    X_train = data['X_train']
    y_train = data['y_train']
    
    seed(1)
    model = Sequential()
    model.add(Dense(100, activation='relu', input_dim=21))
    model.add(BatchNormalization())
    model.add(Dense(40, activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(1, activation='sigmoid'))
    
    model.summary()
    #opt = args.optimizer
    model.compile(loss='binary_crossentropy',
                optimizer='adam',
                metrics=['accuracy'])
    
    # Fit the model to the training data
    model.fit(X_train, y_train, epochs=30)
    
    #Save the model to the designated 
    os.makedirs(model_dir, exist_ok=True)
    
    #with open(os.path.join(model_dir,'model.pickle'), 'wb') as f:
        #pickle.dump(model, f)
    
    
    model.save(model_dir)
    
    print(f"Model saved {model_dir}")
    
    print(os.listdir(model_dir))

#### Component 4:Evaluate Model

In [105]:
def prediction(clean_data_dir: InputPath(str), model_dir: InputPath(str), metrics_path: OutputPath(str)
) -> NamedTuple("EvaluationOutput", [("mlpipeline_metrics", "Metrics")]):   
    import pickle
    import os
    import sys, subprocess;
    import numpy as np
    
    import json
    import tensorflow as tf
    import tensorflow_datasets as tfds
    from collections import namedtuple


    #Evaluate the model and print the results
    print(model_dir)
    model = tf.keras.models.load_model(model_dir)
    
    print(model)
    
    print(clean_data_dir)
    with open(os.path.join(clean_data_dir,'clean_data.pickle'), 'rb') as f:
        data = pickle.load(f)     
    print(data)
 
    X_test = data['X_test']
    y_test = data['y_test']
    X_train = data['X_train']
    y_train = data['y_train']
    

    (loss, accuracy) = model.evaluate(X_test,y_test, verbose=0) 
    
    metrics = {
        "metrics": [
            {"name": "loss", "numberValue": str(loss), "format": "PERCENTAGE"},
            {"name": "accuracy", "numberValue": str(accuracy), "format": "PERCENTAGE"},
        ]
    }

    with open(metrics_path, "w") as f:
        json.dump(metrics, f)

    out_tuple = namedtuple("EvaluationOutput", ["mlpipeline_metrics"])

    return out_tuple(json.dumps(metrics))

#### Component 5: Export the Model

In [106]:
def export_model(
    model_dir: InputPath(str),
    metrics: InputPath(str),
    export_bucket: str,
    model_name: str,
    model_version: int,
    minio_server: str,
    minio_access_key: str,
    minio_secret_key: str,
):
    import os
    import boto3
    from botocore.client import Config

    s3 = boto3.client(
        "s3",
        endpoint_url=f'http://{minio_server}',
        aws_access_key_id=minio_access_key,
        aws_secret_access_key=minio_secret_key,
        config=Config(signature_version="s3v4"),
    )

    # Create export bucket if it does not yet exist
    response = s3.list_buckets()
    export_bucket_exists = False

    for bucket in response["Buckets"]:
        if bucket["Name"] == export_bucket:
            export_bucket_exists = True

    if not export_bucket_exists:
        s3.create_bucket(ACL="public-read-write", Bucket=export_bucket)

    # Save model files to S3
    for root, dirs, files in os.walk(model_dir):
        for filename in files:
            local_path = os.path.join(root, filename)
            s3_path = os.path.relpath(local_path, model_dir)

            s3.upload_file(
                local_path,
                export_bucket,
                f"{model_name}/{model_version}/{s3_path}",
                ExtraArgs={"ACL": "public-read"},
            )

    response = s3.list_objects(Bucket=export_bucket)
    print(f"All objects in {export_bucket}:")
    for file in response["Contents"]:
        print("{}/{}".format(export_bucket, file["Key"]))

#### 5. Component: Serve Model
Kubeflow Pipelines comes with [a pre-defined KFServing component](https://raw.githubusercontent.com/kubeflow/pipelines/f21e0fe726f8aec86165beca061f64fa730e0ac7/components/kubeflow/kfserving/component.yaml) which can be imported from GitHub repo and reused across the pipelines without the need to define it every time. We include a copy with the tutorial to make it work in an air-gapped environment. Here's what the import looks like:

In [107]:
kfserving = components.load_component_from_file("kfserving-component.yaml")

#### Combine the Components into a Pipeline

In [108]:
def train_model_pipeline(
    data_dir: str,
    clean_data_dir: str,
    model_dir: str,
    export_bucket: str,
    model_name: str,
    model_version: int,
    minio_server: str,
    minio_access_key: str,
    minio_secret_key: str,
):
    # For GPU support, please add the "-gpu" suffix to the base image
    BASE_IMAGE = "mavencodev/minio:v.0.1"

    downloadOp = components.func_to_container_op(
        download_dataset, base_image=BASE_IMAGE
    )(minio_server)

    preprocessOp = components.func_to_container_op(preprocess, base_image=BASE_IMAGE)(
        downloadOp.output
    )
        
    trainOp = components.func_to_container_op(train_model, base_image=BASE_IMAGE)(
        preprocessOp.output
    )

    predictionOp = components.func_to_container_op(prediction, base_image=BASE_IMAGE)(
        preprocessOp.output, trainOp.output
    )

    exportOp = components.func_to_container_op(export_model, base_image=BASE_IMAGE)(
        trainOp.output, predictionOp.output, export_bucket, 
        model_name, model_version, minio_server, minio_access_key, minio_secret_key
    )
    
    kfservingOp = kfserving(
        action="apply",
        default_model_uri=f"s3://{export_bucket}/{model_name}",
        model_name="airlinecust2",
        namespace= NAMESPACE,
        framework="tensorflow",
    )
    kfservingOp.after(exportOp)

In [109]:
def op_transformer(op):
    op.add_pod_annotation(name="sidecar.istio.io/inject", value="false")
    return op


@dsl.pipeline(
    name="Serving Customer Satisfaction Prediction model",
    description="A KFServing pipeline",
)
def satisfaction_pipeline(
    model_dir: str = "/train/model",
    data_dir: str = "/train/data",
    clean_data_dir: str= "/train/data",
    export_bucket: str = "airlinecust2",
    model_name: str = "airlinecust2",
    model_version: int = 1,
):
    MINIO_SERVER='minio-service.kubeflow:9000'
    MINIO_ACCESS_KEY='minio'
    MINIO_SECRET_KEY='minio123'
    
    
    train_model_pipeline(
        data_dir=data_dir,
        clean_data_dir=clean_data_dir,
        model_dir=model_dir,
        export_bucket=export_bucket,
        model_name=model_name,
        model_version=model_version,
        minio_server=MINIO_SERVER,
        minio_access_key=MINIO_ACCESS_KEY,
        minio_secret_key=MINIO_SECRET_KEY,
    )
    
    dsl.get_pipeline_conf().add_op_transformer(op_transformer)

##### With that in place, let's submit the pipeline directly from our notebook:

In [110]:
pipeline_func = satisfaction_pipeline
run_name = pipeline_func.__name__ + " run"
experiment_name = "End-to-End-Demo"

kfp.compiler.Compiler().compile(pipeline_func,  'airline26.yaml')

######  Upload the generated yaml file to create a pipeline in Kubeflow UI¶
###### Now delete your bucket when you have run the pipeline successfully in the Kubeflow UI.

In [111]:
#! ./mc rb minio/airlinecust2 --force