Triton Inference Server FIL Backend

Triton is a machine learning inference server for easy and highly optimized deployment of models trained in almost any major framework. This backend specifically facilitates use of tree models in Triton (including models trained with XGBoost, LightGBM, Scikit-Learn, and cuML).

If you want to deploy a tree-based model for optimized real-time or batched inference in production, the FIL backend for Triton will allow you to do just that.

Not sure where to start?

If you aren't sure where to start with this documentation, consider one of the following paths:

I currently use XGBoost/LightGBM or other tree models and am trying to assess if Triton is the right solution for production deployment of my models

Check out the FIL backend's blog post announcement
Make sure your model is supported by looking at the model support section
Look over the introductory example
Try deploying your own model locally by consulting the FAQ notebook.
Check out the main Triton documentation for additional features and helpful tips on deployment (including example Helm charts).

I am familiar with Triton, but I am using it to deploy an XGBoost/LightGBM model for the first time.

Look over the introductory example
Try deploying your own model locally by consulting the FAQ notebook. Note that it includes specific example code for serialization of XGBoost and LightGBM models.
Review the FAQ notebook's tips for optimizing model performance.

I am familiar with Triton and the FIL backend, but I am using it to deploy a Scikit-Learn or cuML tree model for the first time

Look at the section on preparing Scikit-Learn/cuML models for Triton.
Try deploying your model by consulting the FAQ notebook, especially the sections on Scikit-Learn and cuML.

I am a data scientist familiar with tree model training, and I am trying to understand how Triton might be used with my models.

Take a glance at the Triton product page to get a sense of what Triton is used for.
Download and run the introductory example for yourself. If you do not have access to a GPU locally, you can just look over this notebook and then jump to the FAQ notebook which has specific information on CPU-only training and deployment. I have never worked with tree models before.
Take a look at XGBoost's documentation.
Download and run the introductory example for yourself.
Try deploying your own model locally by consulting the FAQ notebook.

I don't like reading docs.

Look at the Quickstart below
Open the FAQs notebook in a browser.
Try deploying your model. If you get stuck, Ctrl-F for keywords on the FAQ page.

Quickstart: Deploying a tree model in 3 steps

Copy your model into the following directory structure. In this example, we show an XGBoost json file, but XGBoost binary files, LightGBM text files, and Treelite checkpoint files are also supported.

model_repository/
├─ example/
│  ├─ 1/
│  │  ├─ model.json
│  ├─ config.pbtxt

Fill out config.pbtxt as follows, replacing $NUM_FEATURES with the number of input features, $MODEL_TYPE with xgboost, xgboost_json, lightgbm or treelite_checkpoint, and $IS_A_CLASSIFIER with true or false depending on whether this is a classifier or regressor.

backend: "fil"
max_batch_size: 32768
input [
 {
    name: "input__0"
    data_type: TYPE_FP32
    dims: [ $NUM_FEATURES ]
  }
]
output [
 {
    name: "output__0"
    data_type: TYPE_FP32
    dims: [ 1 ]
  }
]
instance_group [{ kind: KIND_AUTO }]
parameters [
  {
    key: "model_type"
    value: { string_value: "$MODEL_TYPE" }
  },
  {
    key: "output_class"
    value: { string_value: "$IS_A_CLASSIFIER" }
  }
]

dynamic_batching {}

Start the server:

docker run -p 8000:8000 -p 8001:8001 --gpus all \
  -v ${PWD}/model_repository:/models \
  nvcr.io/nvidia/tritonserver:23.09-py3 \
  tritonserver --model-repository=/models

The Triton server will now be serving your model over both HTTP (port 8000) and GRPC (port 8001) using NVIDIA GPUs if they are available or the CPU if they are not. For information on how to submit inference requests, how to deploy other tree model types, or advanced configuration options, check out the FAQ notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 272 Commits
.github/workflows		.github/workflows
ci		ci
cmake		cmake
conda/environments		conda/environments
docs		docs
notebooks		notebooks
ops		ops
qa		qa
scripts		scripts
src		src
.clang-format		.clang-format
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
Example_Models.md		Example_Models.md
LICENSE		LICENSE
README.md		README.md
SKLearn_and_cuML.md		SKLearn_and_cuML.md
build.sh		build.sh
build_conda_env_container.sh		build_conda_env_container.sh
pyproject.toml		pyproject.toml

License

ahjdzx/fil_backend

Folders and files

Latest commit

History

Repository files navigation

Triton Inference Server FIL Backend

Table of Contents

Usage Information

Contributor Docs

Not sure where to start?

Quickstart: Deploying a tree model in 3 steps

About

Resources

License

Stars

Watchers

Forks

Languages