# Install necessary packages

We can install the necessary packages by either running `pip install --user <package_name>` or include everything in a `requirements.txt` file and run `pip install --user -r requirements.txt`.

> NOTE: Do not forget to use the `--user` argument. It is necessary if you want to use Kale to transform this notebook into a Kubeflow pipeline

In [1]:
!pip3 install --user -r requirements.txt



# Imports

In this section we import the packages we need for this example. Make it a habbit to gather your imports in a single place. It will make your life easier if you are going to transform this notebook into a Kubeflow pipeline using Kale.

In [2]:
import numpy as np
import xgboost as xgb

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score

In [3]:
assert float(xgb.__version__) < 1.

# Project hyper-parameters

In this cell, we define the different hyper-parameters variables. Defining them in one place makes it easier to experiment with their values and also facilitates the execution of HP Tuning experiments using Kale and Katib.

In [4]:
ETA = .3
MAX_DEPTH = 3
OBJECTIVE = "multi:softprob"
STEPS = 20

# Load and preprocess data

In this section, we load and process the dataset to get it in a ready-to-use form by the model.

In [5]:
x, y = datasets.load_iris(return_X_y=True)

In [6]:
x_trn, x_tst, y_trn, y_tst = train_test_split(x, y, test_size=.2)

In [7]:
D_trn = xgb.DMatrix(x_trn, label=y_trn)
D_tst = xgb.DMatrix(x_tst, label=y_tst)

# Define and train the model

We are now ready to define our model. In this example, we use the Extreme Gradient Boosting algorithm inmplemented by [XGBoost](https://xgboost.ai/).

In [8]:
param = {"eta": ETA, 
         "max_depth": MAX_DEPTH,  
         "objective": OBJECTIVE,  
         "num_class": 3} 

steps = STEPS

In [9]:
model = xgb.train(param, D_trn, steps)

In [10]:
model.save_model("model.bst")

# Evaluate the model

Finally, we are ready to evaluate the model using the test set.

In [11]:
preds = model.predict(D_tst)
max_preds = np.asarray([np.argmax(line) for line in preds])

In [12]:
precision = precision_score(y_tst, max_preds, average='macro')
recall = recall_score(y_tst, max_preds, average='macro')
f1 = f1_score(y_tst, max_preds, average='macro')
accuracy = accuracy_score(y_tst, max_preds)

# Serving

We can deploy the model and use it just like any other web service using KFServing. All we need is to create the necessary configuration file. Take a look in the `xgboost.yaml` file. We need to specify an `xgboost` predictor and make the `storageUri` parameter point to the folder when the `.bst` model file is stored.

> Note: The name of the file should be `model.bst`

In [18]:
!kubectl apply -f xgboost.yaml

inferenceservice.serving.kubeflow.org/xgboost-example unchanged


In [14]:
import json

data = {
    "instances": [
        [6.8, 2.8, 4.8, 1.4],
        [5.1, 3.5, 1.4, 0.2]
    ]
}

headers = {"content-type": "application/json", "Host": "xgboost-example.kubeflow-user.svc.cluster.local"}
response = requests.post("http://cluster-local-gateway.istio-system/v1/models/xgboost-example:predict", json=data, headers=headers)

In [15]:
response.text

'{"predictions": [[0.007211383432149887, 0.9620232582092285, 0.030765343457460403], [0.9892104864120483, 0.006796461995691061, 0.003993045538663864]]}'

# Pipeline metrics

In the last cell of the Notebook, we print the pipeline metrics. These will be picked up by Kubeflow Pipelines, which will make them available through its UI.

In [16]:
print(precision)
print(recall)
print(f1)
print(accuracy)

0.8925925925925925
0.8977272727272728
0.8935574229691877
0.9
