GEM: A Generalist Model for Human Motion

Monocular whole-body 3D human pose estimation using the SOMA body model

_{2D Keypoint Overlay | In-Camera Mesh | Global Mesh}

Overview

GEM is a video-based 3D human pose estimation model developed by NVIDIA. It recovers full-body 77-joint motion — including body, hands, and face — from monocular video using the SOMA parametric body model. The pipeline handles dynamic cameras and recovers global motion trajectories. GEM includes a bundled 2D pose estimation model that detects 77 SOMA keypoints, making the system fully self-contained. Licensed under Apache 2.0 for commercial use.

Key Features

77-joint SOMA body model — full body, hands, and face articulation
Bundled 2D keypoint detector — 2D pose estimator trained for SOMA 77-joint skeleton
Camera-space motion recovery — camera-space human motion estimation from dynamic monocular video
World-space motion recovery — world-space human motion estimation from dynamic monocular video
Apache 2.0 licensed — commercially usable, trained on NVIDIA-owned data only

Research Version: Multi-Modal Conditioning

Looking for multi-modal motion generation (text, audio, music conditioning)? Check out GEM-SMPL, our research model using the SMPL body model that supports both motion estimation and generation from diverse input modalities. Presented at ICCV 2025 (Highlight).

Quick Start

# 1. Clone
git clone --recursive https://github.com/NVlabs/GEM-X.git && cd GEM-X

# 2. Setup environment
pip install uv && uv venv .venv --python 3.10 && source .venv/bin/activate
uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126
uv pip install -e third_party/soma && cd third_party/soma && git lfs pull && cd ../..
bash scripts/install_env.sh

# 3. Run demo
python scripts/demo/demo_soma.py --video path/to/video.mp4 --ckpt inputs/pretrained/gem_soma.ckpt

See docs/INSTALL.md for detailed installation instructions.

Documentation

Document	Description
Installation	Prerequisites, step-by-step setup, Docker, troubleshooting
Demo	Full 3D pipeline, 2D keypoint-only demo, output formats
Training & Evaluation	Dataset preparation, training commands, config system
Model Overview	Architecture, SOMA body model, bundled 2D pose model
Related Projects	GENMO, SOMA, ecosystem cross-references

Pretrained Models

Model	Body Model	Joints	Download
GEM (SOMA)	SOMA	77 (body + hands + face)	gem_soma.ckpt

Place checkpoints under inputs/pretrained/ or pass the path via --ckpt. The demo scripts will automatically download the checkpoint from HuggingFace if --ckpt is not provided.

Related Humanoid Work at NVIDIA

GEM is part of a larger effort to enable humanoid motion data for robotics, physical AI, and other applications.

Check out these related works:

Citation

@inproceedings{genmo2025,
  title     = {GENMO: A GENeralist Model for Human MOtion},
  author    = {Li, Jiefeng and Cao, Jinkun and Zhang, Haotian and Rempe, Davis and Kautz, Jan and Iqbal, Umar and Yuan, Ye},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year      = {2025}
}

License

This project is released under Apache 2.0. This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use. See ATTRIBUTIONS.md for specifics.

GOVERNING TERMS:

Use of the source code is governed by the Apache License, Version 2.0. Use of the associated model is governed by the NVIDIA Open Model License Agreement.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
assets		assets
configs		configs
docs		docs
gem		gem
scripts		scripts
third_party		third_party
tools		tools
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
ATTRIBUTIONS.md		ATTRIBUTIONS.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GEM: A Generalist Model for Human Motion

Overview

Key Features

Research Version: Multi-Modal Conditioning

Quick Start

Documentation

Pretrained Models

Related Humanoid Work at NVIDIA

Citation

License

GOVERNING TERMS:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GEM: A Generalist Model for Human Motion

Overview

Key Features

Research Version: Multi-Modal Conditioning

Quick Start

Documentation

Pretrained Models

Related Humanoid Work at NVIDIA

Citation

License

GOVERNING TERMS:

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages