Part of the NVIDIA Cosmos project family — the training and serving framework repository.
Cosmos-Framework is an end-to-end framework for training and serving world models, including the Cosmos3 model family. Everything lives in a single top-level cosmos_framework/ Python package:
- Training — distributed FSDP / TP / CP / PP trainer, native DCP checkpoints with HuggingFace
safetensorsimport/export, JSONL / WebDataset / LeRobot dataset adapters. Entry point:cosmos_framework.scripts.train. Seedocs/training.md. - Inference — Diffusers / Transformers / vLLM backends with offline batch generation and online serving (Ray + Gradio). Entry point:
cosmos_framework.scripts.inference. Ecosystem-facing shim libraries (lightweight standalone wrappers for downstream projects) live underpackages/.
Cosmos 3 is our newest model family [Report] [Website]. It is a suite of omnimodal world models designed to jointly process and generate language, images, video, audio, and action sequences within a unified Mixture-of-Transformers architecture. By supporting highly flexible input-output configurations, it seamlessly unifies critical modalities for Physical AI — effectively subsuming vision-language models, video generators, world simulators, and world-action models into a single framework. For a guided experience to test out Cosmos3, please visit [Cosmos].
For more details and alternative installation methods, see Setup. Before installing, make sure your machine meets the System Requirements. If you want a curated PyTorch + CUDA environment, start from the recommended NVIDIA NGC base image.
Install system dependencies:
sudo apt-get install -y --no-install-recommends curl ffmpeg git-lfs libx11-dev tree wgetInstall the package with uv (pick the dependency group that matches your CUDA toolkit — see CUDA Variants):
# CUDA 13.0 (recommended)
uv sync --all-extras --group=cu130-train
# Or, for CUDA 12.8:
# uv sync --all-extras --group=cu128-train
source .venv/bin/activate && export LD_LIBRARY_PATH=If you are starting from the recommended NGC image (nvcr.io/nvidia/pytorch:25.09-py3), see the one-shot quickstart.
For the full guide (data preparation, base-checkpoint conversion, parallelism strategies, mixed precision, resuming), see Training. The number of GPUs required depends on the recipe; the shipped recipes under examples/ are 8-GPU configurations (tested on 8× H100 80 GB) launched via their paired launch shells, e.g.:
bash examples/launch_sft_vision_nano.shUsers may adjust the GPU count to match their model and underlying hardware architecture — tune NPROC_PER_NODE and the parallelism degrees (DP/CP/FSDP shard) in the recipe accordingly.
See Inference for the full guide — launch commands, supported modes, parallelism presets, and troubleshooting.
Quick single-GPU launch:
python -m cosmos_framework.scripts.inference \
--parallelism-preset=latency \
-i "inputs/omni/t2v.json" \
-o outputs/omni_nano \
--checkpoint-path Cosmos3-Nano \
--seed=0| Topic | What it covers |
|---|---|
| Setup | Hardware/software prerequisites, uv install paths, CUDA variants, Docker base image, and base-checkpoint downloading. |
| Code Structure | Repository layout and a per-subpackage tour of cosmos_framework/ — where each concern lives and where to add new code. |
| Training | Launching multi-GPU and multi-node runs; parallelism strategies; mixed precision; resuming. |
| Inference (from a trained checkpoint) | Loading a trained checkpoint into one of the inference backends. |
| FAQ | Troubleshooting (OOM, NCCL hangs, slow training), environment variables, and common pitfalls. |
