Human pose estimation using YOLOv8/YOLO11 pose models with native C++, ONNX Runtime, and Docker deployment.
| Input | Pose Estimation |
|---|---|
โ ๏ธ Note: GIFs may take time to load (~20 MB each). If not loading, check./data/folder.
- Multi-model support: YOLOv8n/YOLO11n pose ONNX models
- Batch Inference: Process multiple frames simultaneously for higher throughput (1-32 frames)
- Dynamic input size: Configurable at runtime (480ร480, 640ร640, 1280ร640, etc.)
- Full pose pipeline: letterbox โ ONNX inference โ NMS โ 17 COCO keypoints + skeleton
- Production CLI: images/videos/webcam + output saving
- Docker: 2GB (optimized, no PyTorch) / 23GB (with ultralytics export)
- Cross-platform: Windows (Visual Studio) + Linux (Docker)
Tested on: Intel Xeon E5-2680 v4 @ 2.4GHz, 32GB RAM, Windows 10/11
git clone https://github.com/Shazy021/yolo-pose-cpp.git
cd yolo-pose-cpp| Parameter | Type | Default | Description |
|---|---|---|---|
-i, --input |
str |
required | Input file path (image/video) or 0 for webcam |
-o, --output |
str |
none | Output file path (optional, saves result if specified) |
-m, --model |
str |
Platform-dependent* | Path to ONNX model file |
-W, --width |
int |
640 |
Input width for inference (must be multiple of 32) |
-H, --height |
int |
640 |
Input height for inference (must be multiple of 32) |
-b, --batch |
int |
1 |
Batch size for video inference (range: 1-32) |
-h, --help |
flag |
- | Show help message |
Default model paths:
- Windows:
models/yolov8n-pose.onnx - Docker:
/app/models/yolov8n-pose.onnx
Note: Input dimensions must be multiples of 32 (YOLO stride requirement). Valid examples: 320, 480, 640, 960, 1280.
| Task | Command |
|---|---|
| Build image | docker compose build |
| Image โ image | docker compose run --rm pose -i /app/data/test_1.jpg -o /app/output/res_1.jpg |
| Image (inference 1280ร1280) | docker compose run --rm pose -i /app/data/test_2.jpg -o /app/output/res_hd.jpg -W 1280 -H 1280 |
| Image (inference 480ร1280) | docker compose run --rm pose -i /app/data/test_2.jpg -o /app/output/res_custom_res.jpg -W 480 -H 1280 |
| Video โ video | docker compose run --rm pose -i /app/data/test_vid.mp4 -o /app/output/res_vid.mp4 |
| Use YOLO11 | docker compose run --rm pose -m /app/models/yolo11n-pose.onnx -i /app/data/test_2.jpg -o /app/output/res_y11.jpg |
| Task | Command (example) |
|---|---|
| Image โ image | PoseEstimation.exe -i data\test_1.jpg -o output\res_1.jpg |
| Image (custom inference size) | PoseEstimation.exe -i data\test_1.jpg -o output\res_480.jpg -W 480 -H 480 |
| Video โ video | PoseEstimation.exe -i data\Test_vid.mp4 -o output\res_vid.mp4 |
| Video (high-res) | PoseEstimation.exe -i data\Test_vid.mp4 -o output\res_hd.mp4 -W 1280 -H 1280 |
| YOLO11 model | PoseEstimation.exe -m models\yolo11n-pose.onnx -i data\test_2.jpg -o output\res_2.jpg |
| Webcam | PoseEstimation.exe -i 0 |
โ ๏ธ Experimental Feature: GPU acceleration is currently tested only on Windows with NVIDIA GPUs. Docker GPU support is not tested YET.
- CUDA Toolkit 12.x
- cuDNN 9.x
- ONNX Runtime GPU 1.23.2+ (included in project)
The application automatically detects and uses GPU if available:
- Attempts TensorRT provider (best performance, not yet fully tested)
- Falls back to CUDA provider (tested and working)
- Falls back to CPU if GPU unavailable
Check console output on startup to see which provider is active:
[ONNX] Available providers: TensorrtExecutionProvider CUDAExecutionProvider CPUExecutionProvider
[ONNX] Using CUDA ExecutionProvider (GPU)
models/
โโโ yolov8n-pose.onnx # default model
โโโ yolo11n-pose.onnx # YOLO11 pose
You can either use these models or export your own from Ultralytics checkpoints.
| Parameter | Type | Default | Description |
|---|---|---|---|
--model |
str |
required | Path to YOLO pose .pt model or model name (e.g., yolov8n-pose.pt, yolo11n-pose.pt) |
--output-dir |
str |
models |
Directory to save exported ONNX model |
--imgsz |
int [int] |
640 |
Image size for export (single value or H W pair) |
--opset |
int |
17 |
ONNX opset version (use 17+ for ONNX Runtime 1.16+) |
--dynamic |
flag |
False |
Enable dynamic input shapes (allows runtime size changes) |
--simplify |
flag |
False |
Run ONNX simplifier to optimize graph |
pip install ultralytics onnxYOLOv8 pose โ ONNX
python scripts/export_yolo_pose_onnx.py
--model yolov8n-pose.pt
--output-dir modelsYOLO11 pose โ ONNX
python scripts/export_yolo_pose_onnx.py
--model yolo11n-pose.pt
--output-dir models
--dynamic
--simplifyThen:
docker compose run --rm pose
-m /app/models/your_model.onnx
-i /app/data/test_1.jpg
-o /app/output/res_custom.jpg
-W 960 -H 960By default the image is lightweight (~2 GB) and does not include Ultralytics / PyTorch.
If you need to export ONNX models inside Docker, there are two options.
In Dockerfile there is an optional block you can enable:
# ============================================
# Optional: export YOLO pose model to ONNX inside Docker
# RECOMMENDED: run scripts/export_yolo_pose_onnx.py on the HOST
# to avoid pulling heavy torch/ultralytics into the image.
#
# If you really need to export inside Docker, uncomment:
#
# COPY scripts/export_yolo_pose_onnx.py /app/scripts/
# RUN pip3 install ultralytics onnx
# RUN python3 /app/scripts/export_yolo_pose_onnx.py \
# --model yolov8n-pose.pt \
# --output-dir /app/models
# ============================================
After uncommenting and rebuilding, the image will:
- install
ultralytics+onnxinside the container - export
yolov8n-pose.ptโyolov8n-pose.onnxinto/app/models
โฆbut the image size will grow to roughly 23 GB.
If you already have a container with Python and Ultralytics available, you can run the export script once.
Example (from project root, using the existing image):
Windows (CMD):
docker run --rm ^
--entrypoint python3 ^
-v "%cd%/models:/app/models" ^
pose-estimation-cpu ^
/app/scripts/export_yolo_pose_onnx.py ^
--model yolov8n-pose.pt ^
--output-dir /app/models ^
--dynamic ^
--simplifyLinux/macOS:
docker run --rm \
--entrypoint python3 \
-v "$(pwd)/models:/app/models" \
pose-estimation-cpu \
/app/scripts/export_yolo_pose_onnx.py \
--model yolov8n-pose.pt \
--output-dir /app/models \
--dynamic \
--simplifyThe script will:
- call
YOLO("yolov8n-pose.pt")inside the container - Ultralytics will download the checkpoint by name (if needed)
- export it to
yolov8n-pose.onnxand move it into/app/models(mapped to your local./models)
You can replace --model with any other Ultralytics pose model name, e.g.:
--model yolo11n-pose.pt
For most users the recommended approach is still:
- export
.onnxon the host, and - keep it in the main Docker image.
[Image/Video/Webcam] โ input_handler โ pipeline โ
โโโ preprocess_letterbox (BGRโNCHW , letterbox scale/pad)
โโโ OnnxEngine (ONNX Runtime CPU, opset 17+, dynamic input)
โโโ yolo_pose_postprocess ( โ Person structs + NMS)
โโโ visualize_results (bbox(green)+keypoints(red)+skeleton(blue))
| Input Size | Speed | Use Case |
|---|---|---|
| 320ร320 | faster | Fast tracking, low-res |
| 480ร480 | faster | Balanced speed/accuracy |
| 480ร960 | faster | Portrait/vertical videos |
| 640ร640 | Default | Standard accuracy |
| 960ร960 | slower | High-res images |
| 1280ร1280 | slower | Maximum accuracy, distant persons |
- GPU support is experimental (Windows only, Docker GPU not tested YET)
- COCO 17-keypoint pose only
- Input dimensions must be multiples of 32 (YOLO architecture requirement)
- Batch size must be between 1 and 32
Pose estimation






