Monocular whole-body 3D human pose estimation using the SOMA body model
2D Keypoint Overlay | In-Camera Mesh | Global Mesh
GEM is a video-based 3D human pose estimation model developed by NVIDIA. It recovers full-body 77-joint motion — including body, hands, and face — from monocular video using the SOMA parametric body model. The pipeline handles dynamic cameras and recovers global motion trajectories. GEM includes a bundled 2D pose estimation model that detects 77 SOMA keypoints, making the system fully self-contained. Licensed under Apache 2.0 for commercial use.
- 77-joint SOMA body model — full body, hands, and face articulation
- Bundled 2D keypoint detector — 2D pose estimator trained for SOMA 77-joint skeleton
- Camera-space motion recovery — camera-space human motion estimation from dynamic monocular video
- World-space motion recovery — world-space human motion estimation from dynamic monocular video
- Apache 2.0 licensed — commercially usable, trained on NVIDIA-owned data only
Looking for multi-modal motion generation (text, audio, music conditioning)? Check out GEM-SMPL, our research model using the SMPL body model that supports both motion estimation and generation from diverse input modalities. Presented at ICCV 2025 (Highlight).
# 1. Clone
git clone --recursive https://github.com/NVlabs/GEM-X.git && cd GEM-X
# 2. Setup environment
pip install uv && uv venv .venv --python 3.10 && source .venv/bin/activate
uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126
uv pip install -e third_party/soma && cd third_party/soma && git lfs pull && cd ../..
bash scripts/install_env.sh
# 3. Run demo
python scripts/demo/demo_soma.py --video path/to/video.mp4 --ckpt inputs/pretrained/gem_soma.ckptSee docs/INSTALL.md for detailed installation instructions.
| Document | Description |
|---|---|
| Installation | Prerequisites, step-by-step setup, Docker, troubleshooting |
| Demo | Full 3D pipeline, 2D keypoint-only demo, output formats |
| Training & Evaluation | Dataset preparation, training commands, config system |
| Model Overview | Architecture, SOMA body model, bundled 2D pose model |
| Related Projects | GENMO, SOMA, ecosystem cross-references |
| Model | Body Model | Joints | Download |
|---|---|---|---|
| GEM (SOMA) | SOMA | 77 (body + hands + face) | gem_soma.ckpt |
Place checkpoints under inputs/pretrained/ or pass the path via --ckpt. The demo scripts will automatically download the checkpoint from HuggingFace if --ckpt is not provided.
GEM is part of a larger effort to enable humanoid motion data for robotics, physical AI, and other applications.
Check out these related works:
@inproceedings{genmo2025,
title = {GENMO: A GENeralist Model for Human MOtion},
author = {Li, Jiefeng and Cao, Jinkun and Zhang, Haotian and Rempe, Davis and Kautz, Jan and Iqbal, Umar and Yuan, Ye},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year = {2025}
}This project is released under Apache 2.0. This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use. See ATTRIBUTIONS.md for specifics.
Use of the source code is governed by the Apache License, Version 2.0. Use of the associated model is governed by the NVIDIA Open Model License Agreement.


