# Install necessary packages

We can install the necessary packages by either running `pip install --user <package_name>` or include everything in a `requirements.txt` file and run `pip install --user -r requirements.txt`.

> NOTE: Do not forget to use the `--user` argument. It is necessary if you want to use Kale to transform this notebook into a Kubeflow pipeline

In [1]:
!pip3 install --user -r requirements.txt

Collecting scikit-learn==0.20.3
  Downloading scikit_learn-0.20.3-cp36-cp36m-manylinux1_x86_64.whl (5.4 MB)
[K     |████████████████████████████████| 5.4 MB 2.8 MB/s eta 0:00:01
Installing collected packages: scikit-learn
  Attempting uninstall: scikit-learn
    Found existing installation: scikit-learn 0.22.2
    Uninstalling scikit-learn-0.22.2:
      Successfully uninstalled scikit-learn-0.22.2
Successfully installed scikit-learn-0.20.3


# Imports

In this section we import the packages we need for this example. Make it a habbit to gather your imports in a single place. It will make your life easier if you are going to transform this notebook into a Kubeflow pipeline using Kale.

In [2]:
import joblib
import sklearn
import numpy as np

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score

In [6]:
assert sklearn.__version__ == "0.20.3"

# Project hyper-parameters

In this cell, we define the different hyper-parameters variables. Defining them in one place makes it easier to experiment with their values and also facilitates the execution of HP Tuning experiments using Kale and Katib.

In [7]:
N_ESTIMATORS = 500
MAX_DEPTH = 2

# Load and preprocess data

In this section, we load and process the dataset to get it in a ready-to-use form by the model.

In [8]:
x, y = datasets.load_iris(return_X_y=True)

In [9]:
x_trn, x_tst, y_trn, y_tst = train_test_split(x, y, test_size=.2)

# Define and train the model

We are now ready to define our model. In this example, we use the scikit-learn implementation of Random Forest.

In [10]:
model = RandomForestClassifier(n_estimators=N_ESTIMATORS,
                               max_depth=MAX_DEPTH)

In [11]:
model.fit(x_trn, y_trn)

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=2, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=500, n_jobs=None,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)

In [12]:
with open("model.joblib", "wb") as f:
    joblib.dump(model, f)

# Evaluate the model

Finally, we are ready to evaluate the model using the test set.

In [13]:
preds = model.predict(x_tst)

In [14]:
precision = precision_score(y_tst, preds, average='macro')
recall = recall_score(y_tst, preds, average='macro')
f1 = f1_score(y_tst, preds, average='macro')
accuracy = accuracy_score(y_tst, preds)

# Serving

We can deploy the model and use it just like any other web service using KFServing. All we need is to create the necessary configuration file. Take a look in the `sklearn.yaml` file. We need to specify an `sklearn` predictor and make the `storageUri` parameter point to the location where the `.joblib` model file is stored.

In [15]:
!kubectl apply -f sklearn.yaml

inferenceservice.serving.kubeflow.org/sklearn-iris created


In [18]:
import requests

data = {
    "instances": [
        [6.8, 2.8, 4.8, 1.4],
        [5.1, 3.5, 1.4, 0.2]
    ]
}

headers = {"content-type": "application/json", "Host": "sklearn-iris.kubeflow-user.svc.cluster.local"}
response = requests.post("http://cluster-local-gateway.istio-system/v1/models/sklearn-iris:predict", json=data, headers=headers)

In [19]:
response.text

'{"predictions": [1, 0]}'

# Pipeline metrics

In the last cell of the Notebook, we print the pipeline metrics. These will be picked up by Kubeflow Pipelines, which will make them available through its UI.

In [22]:
print(precision)
print(recall)
print(f1)
print(accuracy)

0.923076923076923
0.9285714285714285
0.9165217391304349
0.9
