## Prerequisites

* An installed Wallaroo instance.
* The following Python libraries installed:
  * `os`
  * [`numpy`](https://pypi.org/project/numpy/)
  * [`pandas`](https://pypi.org/project/pandas/)
  * `json`
  * [`wallaroo`](https://pypi.org/project/wallaroo/): The Wallaroo SDK. Included with the Wallaroo JupyterHub service by default.
  * [`scikit-learn`](https://pypi.org/project/scikit-learn/) Version 1.1.1
  * [`xgboost`](https://pypi.org/project/xgboost/) Version 1.6.2
  * `pickle`

In [1]:
import numpy as np
import pandas as pd

import sklearn
import sklearn.datasets

import xgboost as xgb

import pickle
import json

## XGB Generation

The following demonstrates how to create the XGBoost Regression and XGBoost Classification models used in the XGBoost Autoconversion demonstrations.

Wallaroo supports the following model versions:

* XGBoost:  Version 1.6.2
* SKLearn: 1.1.2

In [2]:
print(xgb.__version__)
print(sklearn.__version__)

1.6.2
1.1.2


# Regression


Generate some random data, use it to fit an xgboost regression model (XGBRegressor) and a matching evaluation data set for later testing.


In [3]:
# create data
Ntrain = 1000
Neval = 5
N = Ntrain+Neval

NF = 25
Ninformative = 10

X, Y = sklearn.datasets.make_regression(n_samples=N, n_features=NF, n_informative=Ninformative)

row_use = np.array(['train']*Ntrain + ['eval']*Neval)

Xtrain = X[row_use=='train', :]
Ytrain = Y[row_use=='train']

Xeval = X[row_use=='eval', :]
Yeval = Y[row_use=='eval']

print(Xtrain.shape)
print(Xeval.shape)

(1000, 25)
(5, 25)


In [4]:
# create and fit model
xgb_reg = xgb.XGBRegressor(nthread=2)
xgb_reg.fit(
    Xtrain,
    Ytrain,
    verbose=False,
)

In [5]:
# predict locally
xgb_reg.predict(Xeval)

array([197.56361 , 289.90128 , -25.926506, -73.84874 , -88.240814],
      dtype=float32)

In [6]:
# save the model
with open('xgb_reg.pickle', 'wb') as f:
    pickle.dump(xgb_reg, f)

In [7]:
# save the data

input_dict = {
    'tensor': Xeval.tolist()
}

with open('xgb_regression_eval.json', 'w') as f:
    json.dump(input_dict, f)


## Classification

In [8]:
# create data
Ntrain = 1000
Neval = 5
N = Ntrain+Neval

NF = 25
Ninformative = 10

X, Y = sklearn.datasets.make_classification(n_samples=N, n_features=NF, n_informative=Ninformative, n_classes=2)

row_use = np.array(['train']*Ntrain + ['eval']*Neval)


Xtrain = X[row_use=='train', :]
Ytrain = Y[row_use=='train']

Xeval = X[row_use=='eval', :]
Yeval = Y[row_use=='eval']

print(Xtrain.shape)
print(Xeval.shape)

(1000, 25)
(5, 25)


In [9]:
# create and fit model
xgb_class = xgb.XGBClassifier(nthread=2, use_label_encoder=False, eval_metric='logloss')
xgb_class.fit(
    Xtrain,
    Ytrain,
    verbose=False,
)

In [10]:
# predict locally
xgb_class.predict(Xeval)

array([0, 0, 0, 1, 0])

In [11]:
xgb_class.predict_proba(Xeval)

array([[9.9977589e-01, 2.2410569e-04],
       [5.4563290e-01, 4.5436710e-01],
       [8.6630678e-01, 1.3369322e-01],
       [8.3226383e-02, 9.1677362e-01],
       [9.8913258e-01, 1.0867418e-02]], dtype=float32)

In [12]:
with open('xgb_class.pickle', 'wb') as f:
    pickle.dump(xgb_class, f)

In [13]:
# save the data

input_dict = {
    'tensor': Xeval.tolist()
}

with open('xgb_class_eval.json', 'w') as f:
    json.dump(input_dict, f)
