Current: 2026-02-20 — Repository status: actively maintained. Keywords: Current, SAM3, TensorRT, FP16, TensorRT-Python, Gradio, ONNX, NVIDIA
This project provides a complete pipeline to run SAM3 (Segment Anything Model 3) with TensorRT:
- System audit for CUDA / TensorRT readiness (
Check.py) - ONNX export of the SAM3 submodules (
SAM3_PyTorch_To_Onnx.py) - TensorRT engine building from ONNX (
Build_Engines.py) - High‑performance inference with text prompts (
SAM3_TensorRT_Inference.py) - Interactive web UI for easy testing (
UI.py)
The workflow is designed around FP16 TensorRT engines with dynamic shapes and explicit batch, supporting both bounding box detection and mask segmentation modes.
By migrating from native PyTorch to TensorRT (FP16), this pipeline delivers massive efficiency gains.
| Metric | Original PyTorch | TensorRT (FP16) | Improvement |
|---|---|---|---|
| VRAM Usage | ~6-7 GB | ~2.6 GB | ~62% Reduction |
| Inference Time (T4 GPU) | ~1.6 sec | ~0.5 sec | ~3.2x Speedup |
Note: Benchmarks tested on NVIDIA T4 GPU. Performance may vary based on hardware.
Before starting, verify your system is properly configured:
python3 Check.pyThis will check CUDA, TensorRT, and all required dependencies.
Install required Python packages:
pip install -r requirements.txt --upgradeMAKE SURE TENSORRT IS INSTALLED AND ADDED TO PATH
For Linux, install TensorRT 10.14.1.48 with CUDA 12.9:
sudo apt-get install -y --allow-downgrades \
libnvinfer10=10.14.1.48-1+cuda12.9 \
libnvinfer-bin=10.14.1.48-1+cuda12.9 \
libnvinfer-dispatch10=10.14.1.48-1+cuda12.9 \
libnvinfer-lean10=10.14.1.48-1+cuda12.9 \
libnvinfer-plugin10=10.14.1.48-1+cuda12.9 \
libnvinfer-vc-plugin10=10.14.1.48-1+cuda12.9 \
libnvonnxparsers10=10.14.1.48-1+cuda12.9Add TensorRT to your PATH:
echo 'export PATH="/usr/src/tensorrt/bin:$PATH"' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu:/usr/lib/tensorrt:${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"' >> ~/.bashrc
source ~/.bashrcVerify installation:
python3 Check.pyYou have two options: download pre‑exported ONNX or export from PyTorch yourself.
Download prebuilt ONNX models for 1008 resolution:
hf download --local-dir "Onnx-Models" kishanstar2003/SAM3_ONNX_FP16This will create an Onnx-Models directory containing:
vision-encoder.onnxtext-encoder.onnxgeometry-encoder.onnxdecoder.onnxtokenizer.json(auto copied to the engines directory)
If you want to manually export ONNX models:
- Download the original SAM3 PyTorch model:
hf download facebook/sam3 --local-dir sam3- Patch the transformers code:
python3 Patch_Sam3_Interp_Rope.py- Export to ONNX:
python3 SAM3_PyTorch_To_Onnx.py --all --model-path "sam3" --output-dir "Onnx-Models" --device cuda --size 1008Key points:
- The script exports four modules via wrappers:
VisionEncoderWrapper→vision-encoder.onnxTextEncoderWrapper→text-encoder.onnxGeometryEncoderWrapper→geometry-encoder.onnxDecoderWrapper→decoder.onnx
- All exports use opset 20 and dynamic batch / prompt dimensions, compatible with TensorRT.
- The
--size 1008parameter sets the resolution for the exported models.
Changing Resolution:
If you want to use a different resolution (e.g., 644), simply change the --size parameter when exporting ONNX:
python3 SAM3_PyTorch_To_Onnx.py --all --model-path "sam3" --output-dir "Onnx-Models" --device cuda --size 644Then rebuild the engines using the same command as before:
python3 Build_Engines.py --onnx "Onnx-Models" --engine "Engines"Once you have the ONNX models in Onnx-Models, build TensorRT engines using Build_Engines.py.
python3 Build_Engines.py --onnx "Onnx-Models" --engine "Engines"Arguments:
--base(optional): base directory (default: current working directory).--onnx: directory containing.onnxmodels (default:BASE/Onnx-Models).--engine: output directory for.enginefiles (default:BASE/Engines).
The script:
- Runs
trtexecwith FP16 and appropriate min/opt/max shapes for each module:vision-encodertext-encodergeometry-encoderdecoder
- Skips engines that already exist.
Use Check.py to audit your environment:
python3 Check.pyIt reports:
- GPU hardware and driver via
nvidia-smi - NVCC presence and version
- PyTorch, CUDA version, and ONNX Runtime
- TensorRT Python bindings and builder creation
trtexecavailability- Available ONNX Runtime providers (CUDA / TensorRT, etc.)
With engines and tokenizer in place, you can run inference in two ways: command line or interactive web UI.
Run the end‑to‑end inference script:
Bounding Box Detection Mode:
python3 SAM3_TensorRT_Inference.py --input "Assets/Test.jpg" --prompt "person,bus" --conf 0.8 --output Results/Result-TextPrompt-BoundingBox-Mode.jpg --models "Engines"Mask Segmentation Mode:
python3 SAM3_TensorRT_Inference.py --input "Assets/Test.jpg" --prompt "person,bus" --conf 0.8 --output Results/Result-TextPrompt-MaskSegmentation-Mode.jpg --models "Engines" --segmentWith Bounding Box Prompts:
python3 SAM3_TensorRT_Inference.py --input "Assets/Test.jpg" --boxes "pos:414,655,53,115" --conf 0.7 --output Results/Result-BoxPrompt-BoundingBox-Mode.jpg --models "Engines"With Bounding Box Prompts (Mask Segmentation Mode):
python3 SAM3_TensorRT_Inference.py --input "Assets/Test.jpg" --boxes "pos:414,655,53,115" --conf 0.7 --output Results/Result-BoxPrompt-MaskSegmentation-Mode.jpg --models "Engines" --segmentWith Bounding Box Prompts (Group Similar):
python3 SAM3_TensorRT_Inference.py --input "Assets/Test.jpg" --boxes "pos:414,655,53,115" --conf 0.7 --output Results/Result-BoxPrompt-GroupSimilar-BoundingBox-Mode.jpg --models "Engines" --group-similarWith Bounding Box Prompts (Group Similar, Mask Segmentation Mode):
python3 SAM3_TensorRT_Inference.py --input "Assets/Test.jpg" --boxes "pos:414,655,53,115" --conf 0.7 --output Results/Result-BoxPrompt-GroupSimilar-MaskSegmentation-Mode.jpg --models "Engines" --segment --group-similarArguments:
--input: path to input image file.--prompt: text prompt for detection. Supports multiple prompts separated by commas (e.g., "person,bus,car").--boxes: box prompts inxywhformat (xandyare the top-left coordinates,wis width,his height). Prefix withpos:for positive prompts andneg:for negative prompts. Separate multiple boxes with a semicolon, and coordinates with commas (e.g.,pos:x,y,w,h;neg:x,y,w,h).--group-similar: (optional) when using box prompts, detects all objects similar to the marked box. If omitted, detects only the specific object at that position.--conf: confidence threshold (0.0–1.0) applied on box scores.--output: path to save the annotated image.--models: directory containing.enginefiles andtokenizer.json(typicallyEngines).--segment: (optional) enable mask segmentation mode. If omitted, uses bounding box detection.
For easier testing and experimentation, use the Gradio web interface:
python3 UI.pyThis launches a web interface where you can:
- Upload images directly
- Enter text prompts interactively (supports multiple prompts separated by commas, e.g., "person,bus")
- Draw rectangles directly on the image to pick bounding box prompts using the visual picker
- Toggle "Group Similar" to detect all objects similar to the marked box(es)
- Adjust confidence thresholds with sliders
- Toggle between bounding box and segmentation modes
- View results instantly with performance metrics
The UI automatically loads the TensorRT engines from the Engines directory and provides real-time inference.
What the script does:
- Wraps each engine with
TRTModulefor efficient execution using PyTorch CUDA tensors. - Preprocesses the input image:
- Resize to
1008 × 1008 - Normalize to [-1, 1]
- Resize to
- Runs:
- Vision encoder → FPN features + positional encodings
- Text encoder → token embeddings + masks (via
tokenizersandtokenizer.json) - Decoder → predicted boxes, logits, presence logits, and masks
- Computes combined scores from logits and presence logits, filters by
--conf, denormalizes boxes, and draws them onto the original image. - Bounding Box Mode: Draws rectangular boxes around detected objects
- Segmentation Mode: Generates and overlays pixel-accurate masks for detected objects
Output:
- An image with bounding boxes/masks and scores, saved to
--output.
You can run the SAM3 TensorRT pipeline using the prebuilt Docker image while always pulling the latest code from GitHub.
The container mounts your current directory as /workspace.
Before starting, make sure you have downloaded the original SAM3 model from Hugging Face into your current directory:
hf download facebook/sam3 --local-dir sam3Set the port (default: 7860):
export PORT=7860Then run:
docker run --gpus all \
--ipc=host \
-p $PORT:$PORT \
-e GRADIO_SERVER_PORT=$PORT \
-v $(pwd):/workspace \
-it \
kishanstark2003/sam3_demo_gradio:latest \
/bin/bash -c "\
export PATH=\This repository contains both original code and derivative works of Meta's Segment Anything Model 3 (SAM 3).
- Source Code: All Python scripts (
.py), conversion logic, and TensorRT wrappers provided in this repository are licensed under the MIT License. - SAM 3 Materials & Derivatives: The underlying model weights, architectures, and all exported ONNX/TensorRT engines generated by these scripts are subject to the Meta SAM License.
Per the SAM License (Section 1.b.ii), this project acknowledges the use of SAM Materials distributed by Meta Platforms, Inc. for the development and optimization of this TensorRT inference pipeline.

