The TensorRT Laboratory (trtlab) is a general purpose set of tools to build customer inference applications and services.
Triton is a professional grade production inference server.
This project is broken into 4 primary components:
-
memory
is based on foonathan/memory thememory
module was designed to write custom allocators for both host and gpu memory. Several custom allocators are included. -
core
contains host/cpu-side tools for common components such as thread pools, resource pool, and userspace threading based on boost fibers. -
cuda
extendsmemory
with a new memory_type for CUDA device memory. All custom allocators inmemory
can be used withdevice_memory
,device_managed_memory
orhost_pinned_memory
. -
nvrpc
is an abstraction layer for building asynchronous microservices. The current implementation is based on grpc. -
tensorrt
provides an opinionated runtime built on the TensorRT API.
The easiest way to manage the external NVIDIA dependencies is to leverage the containers hosted on
NGC. For bare metal installs, use the Dockerfile
as a template for
which NVIDIA libraries to install.
docker build -t trtlab .
For development purposes, the following set of commands first builds the base image, then maps the source code on the host into a running container.
docker build -t trtlab:dev --target base .
docker run --rm -ti --gpus=all -v $PWD:/work --workdir=/work --net=host trtlab:dev bash
This project is released under the BSD 3-clause license.
- Please let us know by filing a new issue
- You can contribute by opening a pull request
Pull requests with changes of 10 lines or more will require a Contributor License Agreement.