Skip to content
Branch: master
Find file History
AnushreeDixit fix(//MultiDeviceInteferencePipeline):Changed bounding box color
Signed-off-by: Anushree Dixit <>
Latest commit e5ab07c Jul 9, 2019

Multi-Device Inference Pipeline


This is an end-to-end DriveOS application in the scope of autonomous driving on Nvidia DRIVE AGX platform. In this app, we illustrate the entire workflow from image decoding, preprocessing to inferenece and visualization using DNN engines built by TensorRT. Two DNN applications are showcased, i.e., object detection and ego-lane segmentation.
Detection and segmentation

Currently the demo is image-based using a JPEG image as input. The TensorRT generated DNN engines are able to run concurrently on different devices, i.e., Xavier's integrated GPU (iGPU) and Deep Learning Accelerators (DLA). at various precisions (i.e. for DLA - INT8 and FP16 and for iGPU FP32, FP16 and INT8). In addition, we also provide training recipes on the networks we used such that the entire workflow is reproducible. To measure the performance, profiling is implemented in the inference module.

Sample result

Detection and segmentation

Target platform

The application is designed for NVIDIA DRIVE AGX Developer Kit running either Linux or QNX, and it can also be compiled and executed on x86_64 Linux.


  • CUDA: 10.1
  • CUDNN: 7.5
  • TensorRT:
  • DriveOS PDK: or Ubuntu 16.04 on x86
  • DALI: 0.9dev (aarch64 build)
  • OpenCV: 3.4.2 (for visualization)

How to compile

It is recommended to cross compile this program instead of natively compiling it on the Development Kit. In setting up this project you should have gone through the instructions to create a Docker image (if you have not done this refer to //docker/ which will have the cross compiling toolchain and necessary libraries setup.

There are two components to compile, the applications (i.e., preprocessing pipeline, engine creator, and inference pipeline) as well as the TensorRT plugin (i.e., Flatten Concatenation) for the object detector. All the applications and plugins can be compiled in one step.

For aarch64-linux:

dazel build //MultiDeviceInferencePipeline/... //plugins/... --config=D5L-toolchain

For aarch64-qnx:

dazel build //MultiDeviceInferencePipeline/... //plugins/... --config=D5Q-toolchain

For x86_64-linux:

dazel build //MultiDeviceInferencePipeline/... //plugins/...

Then after making sure all dependencies are on the target and available for linking, copy the contents of bazel-out/aarch64-fastbuild to the target and you should be ready to run.


1. Model training and conversion

We provide training recipes based on Tensorflow for the two applications in this app: object detection and ego-lane segmentation.

Object detection

The object detection model is trained on KITTI object detection dataset. The object categories include car, pedestrian, and cyclist. Details on training can be found here.

Ego-Lane segmentation

The ego-lane segmentation model is trained on KITTI road dataset. Details on training can be found here.

Convert models to UFF format

The trained models need to be converted to UFF format that is compatible with TensorRT. The python scripts for converting both models are provided along with the training recipes.

2. Build preprocessing pipeline

This application demonstrates how to build a serialized DALI pipeline. The pipeline reads images in JPEG format, followed by processing steps such as resizing and normalization. The serialized pipeline is saved in binary format, and it can be loaded for other applications. We demonstrate the usage of the serialized pipeline in the application of int 8 engine creation, where where image preprocessing is required for reading calibration images. For details, please refer to pipeline creator.

3. Create TensorRT engine

This step creates TensorRT optimized engines for the DNNs, the input file is the trained model in UFF format. The engine builder supports fp32 and fp16 precision as well as int8 precision by utilizing a calibration image set. The output is a serialized engine that can be used for inference. For object detector, this application also provides the option ot write the detection results in COCO standard JSON format, which can be used to calculate the performance metrics. Note that the engine has to be created on the same target where the inference is supposed to run. For details, please refer to engine creator.

4. Run models on DDPX

This is the inference part of the application. It loads two engines, an object detection engine and an ego-lane segmentation engie and takes a image as input. The who inference pipeline is configured via a TOML file. In the provided configuration, inference for both engines are performed concurrently on the iGPU and DLA using DALI pipelines. Afterwards the results are postprocessed and the input image is annotated with bounding boxes and segmentation masks.

We provide a simple profiling functionality here by reporting the FPS of the concurrent execution. For details, please refer to inference pipeline.

5. Generate profiling report (future work)

The inference pipeline module also integrates Nvidia's CUDA Profiling Tools Interface (CUPTI) for profiling the inference performance. The profiling metrics include sm efficiency, flop count and DRAM utilization.

You can’t perform that action at this time.