Check the v2_api branch for codes using newer TRITON API.
This repository hosts codes for the Healthcare on Tap series webinar titled "Deeper Dive into TensorRT and TRITON" recorded on 08/06/2020.
All tests were performed using
- Docker version 19.03
- NVIDIA GPUS (RTX 8000 and V100) with driver 450.57
Use the startDocker.sh script as follows to mount a data directory and choose GPU 2 for your tests. Current setup uses nvcr.io/nvidian/pytorch:20.06-py3
as the base image.
./startDocker.sh 2 <PATH_TO_DATA>
Once inside the container, please use the following script to enable GPU dashboards and start jupyterlab
./start_jupyter_lab.sh
A separate container for the server needs to be launched using the script
./start_triton_server.sh 2 <PATH_TO_MODEL_REPO>
To launch Grafana dashboards for monitoring of metrics, please run docker-compose up
from the monitoring folder and navigate to localhost:3000/. Additional steps here.
The three notebooks in this repository walkthrough the example steps for using
- TensorRT NB1_PyTorch_TRT_ONNX_Inference
- TRITON NB2_TRITON_ClientInference
- NB3_lung_segmentation_3d walks through a simple 3D example with a graphdef backend.
- For replicating the experiments, additional clients can be launched to test inference with multiple models. For ex.
python sim_inference_req_triton.py --model model_cxr_onnx
This project is being distributed under the MIT License
The following tools were used as part of this code base and are governed by their respective license agreements. These are in addition to tools distributed within the NGC Docker containers (Pytorch / TRITON).
Any contributions to this repository are subject to the Contributor License Agreement