# Install necessary packages

We can install the necessary packages by either running `pip install --user <package_name>` or include everything in a `requirements.txt` file and run `pip install --user -r requirements.txt`.

> NOTE: Do not forget to use the `--user` argument. It is necessary if you want to use Kale to transform this notebook into a Kubeflow pipeline

In [4]:
!pip3 install --user -r requirements.txt



# Imports

In this section we import the packages we need for this example. Make it a habbit to gather your imports in a single place. It will make your life easier if you are going to transform this notebook into a Kubeflow pipeline using Kale.

In [29]:
import sklearn
import numpy as np

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score

In [30]:
pip install sklearn

Note: you may need to restart the kernel to use updated packages.


In [31]:
import sklearn


# Project hyper-parameters

In this cell, we define the different hyper-parameters variables. Defining them in one place makes it easier to experiment with their values and also facilitates the execution of HP Tuning experiments using Kale and Katib.

In [32]:
N_ESTIMATORS = 500
MAX_DEPTH = 2

# Load and preprocess data

In this section, we load and process the dataset to get it in a ready-to-use form by the model.

In [33]:
x, y = datasets.load_iris(return_X_y=True)

In [37]:
x_trn, x_tst, y_trn, y_tst = train_test_split(x, y, test_size=.2)

# Define and train the model

We are now ready to define our model. In this example, we use the scikit-learn implementation of Random Forest.

In [35]:
model = RandomForestClassifier(n_estimators=N_ESTIMATORS,
                               max_depth=MAX_DEPTH)

In [36]:
model.fit(x_trn, y_trn)

KeyError: 'print_changed_only'

KeyError: 'print_changed_only'

# Evaluate the model

Finally, we are ready to evaluate the model using the test set.

In [26]:
preds = model.predict(x_tst)

In [7]:
precision = precision_score(y_tst, preds, average='macro')
recall = recall_score(y_tst, preds, average='macro')
f1 = f1_score(y_tst, preds, average='macro')
accuracy = accuracy_score(y_tst, preds)

NameError: name 'precision_score' is not defined

# Serving

We can deploy the model as an inference server to KFServing, using the Kale `serve` API.

In [8]:
from kale.common.serveutils import serve
kfserver = serve(model)

TypeError: expected str, bytes or os.PathLike object, not NoneType

In [None]:
import json
data = {"instances": [[6.8, 2.8, 4.8, 1.4], [5.1, 3.5, 1.4, 0.2]]}
res = kfserver.predict(json.dumps(data))

In [None]:
print(res)

# Pipeline metrics

In the last cell of the Notebook, we print the pipeline metrics. These will be picked up by Kubeflow Pipelines, which will make them available through its UI.

In [None]:
print(precision)
print(recall)
print(f1)
print(accuracy)