TritonForge is a fullstack Rust web application for converting ONNX deep learning models into TensorRT engines. It provides a practical browser-based workflow for model upload, TensorRT configuration, GPU selection, job tracking, log inspection, and output management.
The project is designed for developers and machine learning engineers who want a repeatable way to build optimized TensorRT artifacts without managing every conversion manually from the command line.
Building TensorRT engines often involves a sequence of manual steps: choosing the right TensorRT Docker image, selecting a GPU, preparing shape options, running trtexec, watching terminal output, collecting generated files, and keeping track of which model was built with which configuration.
TritonForge turns that workflow into an observable application flow. Each conversion is submitted as a job, executed inside a Docker container, tracked through its lifecycle, and stored with useful metadata for later inspection.
- Convert ONNX models into TensorRT engines.
- Upload ONNX files from the browser or select ONNX files from a server path.
- Select locally available or configured TensorRT Docker images.
- Detect and select NVIDIA GPUs through
nvidia-smi. - Configure TensorRT options such as dynamic shapes, workspace size, timing iterations, explicit batch, and FP16.
- Track conversion jobs with status, progress, timestamps, and container logs.
- Download completed model outputs.
- Organize completed models into deployment-oriented model groups.
- View completed models grouped by TensorRT image tag when building model groups.
At a high level, TritonForge follows this flow:
- The user provides an ONNX model.
- The user selects a TensorRT Docker image and target GPU.
- TritonForge creates a conversion job.
- The server stages the model and starts a Docker container.
- The container runs TensorRT conversion with
trtexec. - TritonForge records progress, logs, and job state.
- The generated TensorRT output is stored for download or grouping.
The conversion lifecycle is intentionally explicit: pending, preparing, converting, finalizing, completed, or failed.
- Focused ONNX workflow: the application is built around ONNX-to-TensorRT conversion.
- Docker-based isolation: conversion jobs run inside TensorRT Docker containers instead of relying on host-installed TensorRT tooling.
- GPU-aware execution: users can choose the GPU used for each conversion job.
- Observable jobs: progress, logs, status, and metadata are available from the web UI.
- Repeatable outputs: completed jobs and model groups make it easier to compare builds and prepare deployment experiments.
- Rust fullstack foundation: Dioxus, Tokio, SQLx, SQLite, Docker/Bollard, and structured tracing provide a reliable async application stack.
- Rust
1.89.0or newer compatible with the repo toolchain. - Dioxus CLI (
dx). - SQLx CLI for database migrations.
- Docker daemon access.
- NVIDIA GPU driver and
nvidia-smifor GPU detection. - TensorRT Docker images, for example
nvcr.io/nvidia/tensorrt:*. - SQLite database configured through
DATABASE_URL. - Node.js/npm only if you need to update Tailwind-related frontend tooling.
Create a local .env file for development. The minimal required value is:
DATABASE_URL=sqlite://data/converter.dbSupported runtime configuration:
| Variable | Purpose | Default |
|---|---|---|
DATABASE_URL |
SQLite database URL used by SQLx. | sqlite://data/converter.db in local .env |
UPLOAD_DIR |
Directory for staged uploaded ONNX files. | /tmp/tensorrt-converter/uploads |
OUTPUT_DIR |
Directory for completed TensorRT outputs. | /tmp/tensorrt-converter/outputs |
GROUPS_DIR |
Directory for model group outputs. | /tmp/tensorrt-converter/groups |
MAX_UPLOAD_SIZE_MB |
Maximum upload size in MiB. | 2048 |
CONVERSION_TIMEOUT_SECS |
Maximum runtime for one conversion job. | 1800 |
DOCKER_SOCKET |
Docker daemon socket path. | /var/run/docker.sock |
TENSORRT_IMAGES_CONFIG |
Optional TOML file containing known TensorRT image entries. | config/images.toml |
Install the Rust and Dioxus tooling:
cargo install dioxus-cli --lockedPrepare local configuration:
printf 'DATABASE_URL=sqlite://data/converter.db\n' > .envStart the fullstack development server:
dx serve --web --fullstack trueGenerate Tailwind CSS after changing styles:
npx tailwindcss -i ./input.css -o ./assets/tailwind.cssDuring active frontend development, run Tailwind in watch mode in a separate terminal:
npx tailwindcss -i ./input.css -o ./assets/tailwind.css --watchYou can also bind the server explicitly:
dx serve --web --fullstack true --addr 127.0.0.1 --port 8080 --open false# Start the web app with hot reload
dx serve --web --fullstack true
# Generate Tailwind CSS once
npx tailwindcss -i ./input.css -o ./assets/tailwind.css
# Watch Tailwind input changes during UI development
npx tailwindcss -i ./input.css -o ./assets/tailwind.css --watch
# Format Rust code
cargo fmt
# Compile and lint with warnings treated as errors
cargo clippy -- -D warnings
# Run tests
cargo testRecommended pre-commit check:
cargo fmt && cargo clippy -- -D warnings && cargo testRun the fullstack app in release mode:
dx serve --web --fullstack true --releaseWith an explicit bind address and port:
dx serve --web --fullstack true --release --addr 127.0.0.1 --port 8080 --open falseCreate an optimized production build:
dx build --releaseCreate a deployment bundle:
dx bundleRun the full test suite:
cargo testRun library tests only:
cargo test --libRun a specific integration test target:
cargo test --test docker_testDocker and GPU-related checks depend on the local machine environment. A machine without Docker daemon access, NVIDIA drivers, or nvidia-smi may not exercise the full conversion path.
Dat Vo
- Contact:
vtdat58@gmail.com - GitHub: https://github.com/dat58/TritonForge
This project is licensed under the MIT License.