SAM3-TENSORRT-PYTHON — SAM3 inference pipeline with TensorRT (FP16)

SAM3 TensorRT Pipeline

Current: 2026-02-20 — Repository status: actively maintained. Keywords: Current, SAM3, TensorRT, FP16, TensorRT-Python, Gradio, ONNX, NVIDIA

This project provides a complete pipeline to run SAM3 (Segment Anything Model 3) with TensorRT:

System audit for CUDA / TensorRT readiness (Check.py)
ONNX export of the SAM3 submodules (SAM3_PyTorch_To_Onnx.py)
TensorRT engine building from ONNX (Build_Engines.py)
High‑performance inference with text prompts (SAM3_TensorRT_Inference.py)
Interactive web UI for easy testing (UI.py)

The workflow is designed around FP16 TensorRT engines with dynamic shapes and explicit batch, supporting both bounding box detection and mask segmentation modes.

🚀 Performance Benchmarks

By migrating from native PyTorch to TensorRT (FP16), this pipeline delivers massive efficiency gains.

Metric	Original PyTorch	TensorRT (FP16)	Improvement
VRAM Usage	~6-7 GB	~2.6 GB	~62% Reduction
Inference Time (T4 GPU)	~1.6 sec	~0.5 sec	~3.2x Speedup

Note: Benchmarks tested on NVIDIA T4 GPU. Performance may vary based on hardware.

Quick Start

Check System Readiness

Before starting, verify your system is properly configured:

python3 Check.py

This will check CUDA, TensorRT, and all required dependencies.

1. Environment Setup

Python Packages

Install required Python packages:

pip install -r requirements.txt --upgrade

TensorRT Installation

MAKE SURE TENSORRT IS INSTALLED AND ADDED TO PATH

For Linux, install TensorRT 10.14.1.48 with CUDA 12.9:

sudo apt-get install -y --allow-downgrades \
    libnvinfer10=10.14.1.48-1+cuda12.9 \
    libnvinfer-bin=10.14.1.48-1+cuda12.9 \
    libnvinfer-dispatch10=10.14.1.48-1+cuda12.9 \
    libnvinfer-lean10=10.14.1.48-1+cuda12.9 \
    libnvinfer-plugin10=10.14.1.48-1+cuda12.9 \
    libnvinfer-vc-plugin10=10.14.1.48-1+cuda12.9 \
    libnvonnxparsers10=10.14.1.48-1+cuda12.9

Add TensorRT to your PATH:

echo 'export PATH="/usr/src/tensorrt/bin:$PATH"' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu:/usr/lib/tensorrt:${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"' >> ~/.bashrc
source ~/.bashrc

Verify installation:

python3 Check.py

2. Download or Export ONNX Models

You have two options: download pre‑exported ONNX or export from PyTorch yourself.

2.1. Download pre‑exported ONNX (recommended)

Download prebuilt ONNX models for 1008 resolution:

hf download --local-dir "Onnx-Models" kishanstar2003/SAM3_ONNX_FP16

This will create an Onnx-Models directory containing:

vision-encoder.onnx
text-encoder.onnx
geometry-encoder.onnx
decoder.onnx
tokenizer.json (auto copied to the engines directory)

2.2. Export ONNX from the SAM3 PyTorch model (Manual)

If you want to manually export ONNX models:

Download the original SAM3 PyTorch model:

hf download facebook/sam3 --local-dir sam3

Patch the transformers code:

python3 Patch_Sam3_Interp_Rope.py

Export to ONNX:

python3 SAM3_PyTorch_To_Onnx.py --all --model-path "sam3" --output-dir "Onnx-Models" --device cuda --size 1008

Key points:

The script exports four modules via wrappers:
- VisionEncoderWrapper → vision-encoder.onnx
- TextEncoderWrapper → text-encoder.onnx
- GeometryEncoderWrapper → geometry-encoder.onnx
- DecoderWrapper → decoder.onnx
All exports use opset 20 and dynamic batch / prompt dimensions, compatible with TensorRT.
The --size 1008 parameter sets the resolution for the exported models.

Changing Resolution:

If you want to use a different resolution (e.g., 644), simply change the --size parameter when exporting ONNX:

python3 SAM3_PyTorch_To_Onnx.py --all --model-path "sam3" --output-dir "Onnx-Models" --device cuda --size 644

Then rebuild the engines using the same command as before:

python3 Build_Engines.py --onnx "Onnx-Models" --engine "Engines"

The engine building command remains the same regardless of resolution.

3. Build TensorRT Engines

Once you have the ONNX models in Onnx-Models, build TensorRT engines using Build_Engines.py.

python3 Build_Engines.py --onnx "Onnx-Models" --engine "Engines"

Arguments:

--base (optional): base directory (default: current working directory).
--onnx: directory containing .onnx models (default: BASE/Onnx-Models).
--engine: output directory for .engine files (default: BASE/Engines).

The script:

Runs trtexec with FP16 and appropriate min/opt/max shapes for each module:
- vision-encoder
- text-encoder
- geometry-encoder
- decoder
Skips engines that already exist.

4. Verify System & TensorRT Installation

Use Check.py to audit your environment:

python3 Check.py

It reports:

GPU hardware and driver via nvidia-smi
NVCC presence and version
PyTorch, CUDA version, and ONNX Runtime
TensorRT Python bindings and builder creation
trtexec availability
Available ONNX Runtime providers (CUDA / TensorRT, etc.)

Run this once after setup to confirm everything is wired correctly.

5. Run TensorRT Inference

With engines and tokenizer in place, you can run inference in two ways: command line or interactive web UI.

5.1. Command Line Inference

Run the end‑to‑end inference script:

Bounding Box Detection Mode:

python3 SAM3_TensorRT_Inference.py --input "Assets/Test.jpg" --prompt "person,bus" --conf 0.8 --output Results/Result-TextPrompt-BoundingBox-Mode.jpg --models "Engines"

Mask Segmentation Mode:

python3 SAM3_TensorRT_Inference.py --input "Assets/Test.jpg" --prompt "person,bus" --conf 0.8 --output Results/Result-TextPrompt-MaskSegmentation-Mode.jpg --models "Engines" --segment

With Bounding Box Prompts:

python3 SAM3_TensorRT_Inference.py --input "Assets/Test.jpg" --boxes "pos:414,655,53,115" --conf 0.7 --output Results/Result-BoxPrompt-BoundingBox-Mode.jpg --models "Engines"

With Bounding Box Prompts (Mask Segmentation Mode):

python3 SAM3_TensorRT_Inference.py --input "Assets/Test.jpg" --boxes "pos:414,655,53,115" --conf 0.7 --output Results/Result-BoxPrompt-MaskSegmentation-Mode.jpg --models "Engines" --segment

With Bounding Box Prompts (Group Similar):

python3 SAM3_TensorRT_Inference.py --input "Assets/Test.jpg" --boxes "pos:414,655,53,115" --conf 0.7 --output Results/Result-BoxPrompt-GroupSimilar-BoundingBox-Mode.jpg --models "Engines" --group-similar

With Bounding Box Prompts (Group Similar, Mask Segmentation Mode):

python3 SAM3_TensorRT_Inference.py --input "Assets/Test.jpg" --boxes "pos:414,655,53,115" --conf 0.7 --output Results/Result-BoxPrompt-GroupSimilar-MaskSegmentation-Mode.jpg --models "Engines" --segment --group-similar

Arguments:

--input: path to input image file.
--prompt: text prompt for detection. Supports multiple prompts separated by commas (e.g., "person,bus,car").
--boxes: box prompts in xywh format (x and y are the top-left coordinates, w is width, h is height). Prefix with pos: for positive prompts and neg: for negative prompts. Separate multiple boxes with a semicolon, and coordinates with commas (e.g., pos:x,y,w,h;neg:x,y,w,h).
--group-similar: (optional) when using box prompts, detects all objects similar to the marked box. If omitted, detects only the specific object at that position.
--conf: confidence threshold (0.0–1.0) applied on box scores.
--output: path to save the annotated image.
--models: directory containing .engine files and tokenizer.json (typically Engines).
--segment: (optional) enable mask segmentation mode. If omitted, uses bounding box detection.

5.2. Interactive Web UI

For easier testing and experimentation, use the Gradio web interface:

python3 UI.py

This launches a web interface where you can:

Upload images directly
Enter text prompts interactively (supports multiple prompts separated by commas, e.g., "person,bus")
Draw rectangles directly on the image to pick bounding box prompts using the visual picker
Toggle "Group Similar" to detect all objects similar to the marked box(es)
Adjust confidence thresholds with sliders
Toggle between bounding box and segmentation modes
View results instantly with performance metrics

The UI automatically loads the TensorRT engines from the Engines directory and provides real-time inference.

What the script does:

Wraps each engine with TRTModule for efficient execution using PyTorch CUDA tensors.
Preprocesses the input image:
- Resize to 1008 × 1008
- Normalize to [-1, 1]
Runs:
- Vision encoder → FPN features + positional encodings
- Text encoder → token embeddings + masks (via tokenizers and tokenizer.json)
- Decoder → predicted boxes, logits, presence logits, and masks
Computes combined scores from logits and presence logits, filters by --conf, denormalizes boxes, and draws them onto the original image.
Bounding Box Mode: Draws rectangular boxes around detected objects
Segmentation Mode: Generates and overlays pixel-accurate masks for detected objects

Output:

An image with bounding boxes/masks and scores, saved to --output.

🐳 Docker Image Usage (Always Pull Latest Code)

You can run the SAM3 TensorRT pipeline using the prebuilt Docker image while always pulling the latest code from GitHub.

📦 Important

The container mounts your current directory as /workspace.

Before starting, make sure you have downloaded the original SAM3 model from Hugging Face into your current directory:

Download SAM3

hf download facebook/sam3 --local-dir sam3

🚀 Run Docker Container (Auto-Update Repo)

Set the port (default: 7860):

export PORT=7860

Then run:

docker run --gpus all \
  --ipc=host \
  -p $PORT:$PORT \
  -e GRADIO_SERVER_PORT=$PORT \
  -v $(pwd):/workspace \
  -it \
  kishanstark2003/sam3_demo_gradio:latest \
  /bin/bash -c "\
    export PATH=\

⚠️ Disclaimer This project provides high-performance optimizations for SAM3. Note that TensorRT engine performance and stability are highly dependent on specific hardware (GPU architecture) and software (CUDA/TensorRT versions). Use these optimization scripts at your own risk.

⚖️ Licensing & Acknowledgments

This repository contains both original code and derivative works of Meta's Segment Anything Model 3 (SAM 3).

Source Code: All Python scripts (.py), conversion logic, and TensorRT wrappers provided in this repository are licensed under the MIT License.
SAM 3 Materials & Derivatives: The underlying model weights, architectures, and all exported ONNX/TensorRT engines generated by these scripts are subject to the Meta SAM License.

Research Acknowledgment

Per the SAM License (Section 1.b.ii), this project acknowledges the use of SAM Materials distributed by Meta Platforms, Inc. for the development and optimization of this TensorRT inference pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
Assets		Assets
Onnx-Models		Onnx-Models
Results		Results
Build_Engines.py		Build_Engines.py
Check.py		Check.py
Instructions.txt		Instructions.txt
LICENSE		LICENSE
LICENSE-SAM		LICENSE-SAM
Patch_Sam3_Interp_Rope.py		Patch_Sam3_Interp_Rope.py
README.md		README.md
Requirements_Install_Commands.txt		Requirements_Install_Commands.txt
SAM3_PyTorch_To_Onnx.py		SAM3_PyTorch_To_Onnx.py
SAM3_TensorRT_Inference.py		SAM3_TensorRT_Inference.py
SAM3_UI.png		SAM3_UI.png
UI.py		UI.py
WorkFlow.png		WorkFlow.png
requirements.txt		requirements.txt
test_vision_accuracy.py		test_vision_accuracy.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SAM3-TENSORRT-PYTHON — SAM3 inference pipeline with TensorRT (FP16)

SAM3 TensorRT Pipeline

🚀 Performance Benchmarks

Quick Start

Check System Readiness

1. Environment Setup

Python Packages

TensorRT Installation

2. Download or Export ONNX Models

2.1. Download pre‑exported ONNX (recommended)

2.2. Export ONNX from the SAM3 PyTorch model (Manual)

The engine building command remains the same regardless of resolution.

3. Build TensorRT Engines

4. Verify System & TensorRT Installation

Run this once after setup to confirm everything is wired correctly.

5. Run TensorRT Inference

5.1. Command Line Inference

5.2. Interactive Web UI

🐳 Docker Image Usage (Always Pull Latest Code)

📦 Important

Download SAM3

🚀 Run Docker Container (Auto-Update Repo)

⚖️ Licensing & Acknowledgments

Research Acknowledgment

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SAM3-TENSORRT-PYTHON — SAM3 inference pipeline with TensorRT (FP16)

SAM3 TensorRT Pipeline

🚀 Performance Benchmarks

Quick Start

Check System Readiness

1. Environment Setup

Python Packages

TensorRT Installation

2. Download or Export ONNX Models

2.1. Download pre‑exported ONNX (recommended)

2.2. Export ONNX from the SAM3 PyTorch model (Manual)

The engine building command remains the same regardless of resolution.

3. Build TensorRT Engines

4. Verify System & TensorRT Installation

Run this once after setup to confirm everything is wired correctly.

5. Run TensorRT Inference

5.1. Command Line Inference

5.2. Interactive Web UI

🐳 Docker Image Usage (Always Pull Latest Code)

📦 Important

Download SAM3

🚀 Run Docker Container (Auto-Update Repo)

⚖️ Licensing & Acknowledgments

Research Acknowledgment

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages