# Create a XGBoost Model for Deployment

In [7]:
import numpy
from numpy import loadtxt
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import os
import signal
import subprocess
import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')

In [8]:
# Generate dummy data to perform binary classification
seed = 7
features = 9 # number of sample features
samples = 10000 # number of samples
X = numpy.random.rand(samples, features).astype('float32')
Y = numpy.random.randint(2, size=samples)

test_size = 0.33
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)

In [9]:
model = XGBClassifier()
model.fit(X_train, y_train)



XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=16,
              num_parallel_tree=1, predictor='auto', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

In [10]:
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Test Accuracy: {:.2f}".format(accuracy * 100.0))

Test Accuracy: 49.64


# Export, load and deploy XGBoost model in Triton Inference Server

To deploy a trained model via Triton,  you need to set up a directory structure where you have a model repository directory, a model directory containing your model and a deployment configuration file.  

The structure of the model repository should look like the following:

<img src="triton_model_repository_layout.png" width=400 height=400 />


### Save model to model location

In [11]:
model.save_model('/home/ubuntu/model_repository/fil/1/xgboost.model')

### Create and save config.pbtxt

To deploy the model in Triton Inference Server, we need to create and save a protobuf config file called `config.pbtxt` under `/home/ubuntu/model_repository/fil/` directory that contains information about the model and the deployment. 

Triton server looks for this configuration file before deploying XGBoost model for inference. It'll setup the server parameters as per the configuration passed within `config.pbtxt`

In [12]:
%%bash
# Writing config to file
cat > /home/ubuntu/model_repository/fil/config.pbtxt <<EOL 
name: "fil"                              # Name of the model directory (fil in our case)
backend: "fil"                           # Triton FIL backend for deploying forest models
max_batch_size: 8192
input [
 {
    name: "input__0"
    data_type: TYPE_FP32
    dims: [ 9 ]                          # Input feature dimensions, in our sample case it's 9
  }
]
output [
 {
    name: "output__0"
    data_type: TYPE_FP32
    dims: [ 1 ]                          # Output 2 for binary classification model
  }
]
instance_group [{ kind: KIND_GPU }]
parameters [
  {
    key: "model_type"
    value: { string_value: "xgboost" }
  },
  {
    key: "predict_proba"
    value: { string_value: "false" }
  },
  {
    key: "output_class"
    value: { string_value: "true" }
  },
  {
    key: "threshold"
    value: { string_value: "0.5" }
  },
  {
    key: "algo"
    value: { string_value: "ALGO_AUTO" }
  },
  {
    key: "storage_type"
    value: { string_value: "AUTO" }
  },
  {
    key: "blocks_per_sm"
    value: { string_value: "0" }
  }
]

EOL

### Inference via the Triton Client

Test the inference by sending real inference request from Triton Client and checking the accuracy of responses.

Start the Triton Server container:\
`$ sudo docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/model_repository:/models nvcr.io/nvidia/tritonserver:22.01-py3 tritonserver --model-repository=/models`

Check the status of the server connection by running the following curl command:\
`curl -v <IP of machine>:8000/v2/health/ready`\
which should return `HTTP/1.1 200 OK`

In [13]:
! curl -v localhost:8000/v2/health/ready

*   Trying 127.0.0.1:8000...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET /v2/health/ready HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain
< 
* Connection #0 to host localhost left intact


You can run the Triton Server client either as a container or embeded in your code. To run in your code, install the client via pip:

`pip3 install tritonclient[http]`

and import the library:

In [14]:
import tritonclient.http as triton_http

In [15]:
# Set up HTTP client.
http_client = triton_http.InferenceServerClient(
    url='localhost:8000',
    verbose=False,
    concurrency=1
)

# Set up Triton input and output objects for both HTTP and GRPC
triton_input_http = triton_http.InferInput(
    'input__0',
    (X_test.shape[0], X_test.shape[1]),
    'FP32'
)

triton_input_http.set_data_from_numpy(X_test, binary_data=True)

triton_output_http = triton_http.InferRequestedOutput(
    'output__0',
    binary_data=True
)

# Submit inference requests 
request_http = http_client.infer(
    'fil',
    model_version='1',
    inputs=[triton_input_http],
    outputs=[triton_output_http]
)

# Get results as numpy arrays
result_http = request_http.as_numpy('output__0')

# Check that we got the same accuracy as previously
accuracy = accuracy_score(y_test, result_http)
print("Accuracy: {:.2f}".format(accuracy * 100.0))

Accuracy: 49.79


The above test accuracy score of the model deployed in Triton using FIL backend approximately matches with the one previously computed using XGBoost library's predict function.