RHOS Team Β· https://mvig-rhos.com/
OpenD4RT reproduces D4RT-style 4D reconstruction and tracking with released WorldTrack evaluation, visualization tools, and Hugging Face checkpoints.
OpenD4RT is an unofficial open-source PyTorch/GPU implementation of D4RT, developed to reproduce the model architecture, training recipe, evaluation protocols, and implementation details described in the D4RT paper and appendix. The current public repo includes the released Hugging Face checkpoint, the model, WorldTrack evaluation, and Viser visualization tools, with complete training and evaluation code planned for release.
- [2026/06/04] Released the full OpenD4RT training code.
- [2025/05/20] Released the
48CLIP_9Mix_NoCropAUGcheckpoint. - [2026/05/02] Released the OpenD4RT WorldTrack evaluation pipeline, Viser visualization tools, and the first Hugging Face checkpoint.
D4RT is a feedforward video model for reconstructing and tracking dynamic
scenes. It uses a unified transformer architecture to infer depth,
spatio-temporal correspondence, and camera parameters from a single video. Its
query interface probes the 3D position of a source pixel (u, v, t_src) at a
target timestep t_tgt in a selected camera coordinate frame t_cam, enabling
sparse tracking, all-pixel tracking, and 4D scene reconstruction through the
same model interface.
See docs/D4RT_paper.pdf for the local paper PDF included in this repository.
Create the conda environment:
conda env create -f environment.yml
conda activate d4rtOr install into an existing Python environment:
pip install -r requirements.txtThe visualization package builder calls the ffmpeg command-line tool to
write MP4 assets for Viser. The conda environment includes ffmpeg; if you use
pip install -r requirements.txt, install ffmpeg separately if needed.
| Variant | Data | Aug. | Frames | Status | Download |
|---|---|---|---|---|---|
32CLIP_9Dataset_NoAUG |
9Mix | color aug + No crop aug | 32 | Released | HF |
48CLIP_9Mix_NoCropAUG |
9Mix | color aug + No crop aug | 48 | Released | HF |
48CLIP_9Mix_AUG |
9Mix | color aug + crop aug | 48 | Coming | TBD |
32CLIP_10Mix_SynthVerse_NoAUG |
10Mix | color aug + No crop aug | 32 | Coming | TBD |
48CLIP_10Mix_SynthVerse_AUG |
10Mix | color aug + crop aug | 48 | Coming | TBD |
Released checkpoint local path:
checkpoints/OpenD4RT_32CLIP_9Dataset_NoAUG/opend4rt.ckpt.
Additional released checkpoint local path:
checkpoints/OpenD4RT_48CLIP_9Mix_NoCropAUG/opend4rt.ckpt.
Tip: all rows are OpenD4RT variants. The 9Mix setting uses PointOdyssey, Dynamic Replica, Kubric Full, TartanAir, Virtual KITTI 2, ScanNet, BlendedMVS, CO3D, and MVS-Synth. The 10Mix setting additionally includes SynthVerse.
Download the released checkpoint and model config from Lijiaxin0111/OpenD4RT into the default path used by the scripts:
pip install -U huggingface_hub
huggingface-cli download Lijiaxin0111/OpenD4RT \
--repo-type model \
--include "checkpoints/OpenD4RT_32CLIP_9Dataset_NoAUG/opend4rt.ckpt" \
--include "checkpoints/OpenD4RT_32CLIP_9Dataset_NoAUG/model.yaml" \
--include "checkpoints/OpenD4RT_48CLIP_9Mix_NoCropAUG/opend4rt.ckpt" \
--include "checkpoints/OpenD4RT_48CLIP_9Mix_NoCropAUG/model.yaml" \
--local-dir .Expected local files:
checkpoints/OpenD4RT_32CLIP_9Dataset_NoAUG/
opend4rt.ckpt
model.yaml
checkpoints/OpenD4RT_48CLIP_9Mix_NoCropAUG/
opend4rt.ckpt
model.yaml
Download the WorldTrack release from:
https://drive.google.com/drive/folders/1-JW88ru30irMYyFab_4YBQbGbd9tKpXV
Place the .npz files under:
data/worldtrack_release/
adt_mini/*.npz
po_mini/*.npz
pstudio_mini/*.npz
ds_mini/*.npz
The main reproduction entrypoint for the 48-frame 9Mix run is:
VIDEOMAE2_CKPT=/path/to/vit_g_hybrid_pt_1200e.pth \
bash scripts/train_worldtrack_sota_ninemix_clip48_a_query_local_lr4e-6_8gpu.shThis script launches torchrun, loads the reproduction configs under
configs/, initializes from the released 32-frame checkpoint, and runs
the 48-frame training recipe used for the WorldTrack setting.
For a quick preflight without starting training:
DRY_RUN=1 \
VIDEOMAE2_CKPT=/path/to/vit_g_hybrid_pt_1200e.pth \
bash scripts/train_worldtrack_sota_ninemix_clip48_a_query_local_lr4e-6_8gpu.shFull training setup, required checkpoints, dataset root overrides, and smoke test commands are documented in docs/training.md.
Run a quick smoke test on one adt_mini sequence:
LIMIT_SEQS=1 SUBSETS=adt_mini OUTPUT_DIR=tmp/eval_smoke bash run_eval_worldtrack.shRun the full WorldTrack evaluation:
bash run_eval_worldtrack.shEquivalent explicit command:
EXP=checkpoints/OpenD4RT_32CLIP_9Dataset_NoAUG
python eval_track3d_in_worldtrack.py \
--model-config "$EXP/model.yaml" \
--ckpt-path "$EXP/opend4rt.ckpt" \
--data-root data/worldtrack_release \
--subsets adt_mini,po_mini,pstudio_mini,ds_mini \
--num-frames 64 \
--query-chunk-size 4096 \
--output-dir tmp/eval_worldtrack \
--device cuda \
--save-per-sequenceUseful overrides:
QUERY_CHUNK_SIZE=1024 bash run_eval_worldtrack.sh
CUDA_VISIBLE_DEVICES=1 DEVICE=cuda bash run_eval_worldtrack.sh
SUBSETS=adt_mini LIMIT_SEQS=1 NUM_FRAMES=64 bash run_eval_worldtrack.shOpenD4RT_32CLIP_9Dataset_NoAUG detailed WorldTrack results:
| Subset | APD global | EPE global | APD global dyn | EPE global dyn | Queries |
|---|---|---|---|---|---|
adt_mini |
0.6993 | 0.2964 | 0.6975 | 0.3628 | 22187 |
po_mini |
0.6603 | 0.3397 | 0.7333 | 0.2722 | 53468 |
pstudio_mini |
0.7863 | 0.1811 | 0.7863 | 0.1811 | 8720 |
ds_mini |
0.7266 | 0.2944 | 0.7521 | 0.2699 | 52462 |
OpenD4RT_48CLIP_9Mix_NoCropAUG detailed WorldTrack results
(step_0006000, anchor_clip, evaluated with 64 frames):
| Subset | APD global | EPE global | APD global dyn | EPE global dyn | Queries |
|---|---|---|---|---|---|
adt_mini |
0.7220 | 0.2758 | 0.7325 | 0.3199 | 22187 |
po_mini |
0.6799 | 0.3178 | 0.7425 | 0.2593 | 53468 |
pstudio_mini |
0.7960 | 0.1753 | 0.7960 | 0.1753 | 8720 |
ds_mini |
0.7248 | 0.2959 | 0.7488 | 0.2755 | 52462 |
Sparse point tracking comparison on WorldTrack-style subsets. APD is shown as
a percentage, higher APD is better, and lower EPE is better. Recent baseline
numbers are transcribed from the sparse point tracking table in the provided
reference image. OpenD4RT uses this repository's evaluation results, with
ds_mini reported in the DR column.
| Model | PStudio | PO | DR | ADT | ||||
|---|---|---|---|---|---|---|---|---|
| APDΒ β | EPEΒ β | APDΒ β | EPEΒ β | APDΒ β | EPEΒ β | APDΒ β | EPEΒ β | |
| SpaTrackerV2 (2025) | 74.16 | 0.2272 | 69.57 | 0.3780 | 73.43 | 0.2732 | 92.22 | 0.0915 |
| St4RTrack (2025) | 69.67 | 0.2637 | 67.95 | 0.3140 | 73.74 | 0.2682 | 76.01 | 0.2680 |
| TraceAnything (2025) | 71.33 | 0.2727 | 39.83 | 1.0593 | 60.63 | 0.5758 | 75.65 | 0.2511 |
| Any4D (2025) | 60.03 | 0.3344 | 60.86 | 0.4194 | 68.39 | 0.3012 | 56.71 | 0.4320 |
| V-DPM (2026) | 76.36 | 0.1957 | 79.79 | 0.1994 | 76.38 | 0.2378 | 66.06 | 0.3426 |
| 4RC(2026) | 69.04 | 0.2603 | 80.27 | 0.2681 | 82.91 | 0.1889 | 84.28 | 0.1766 |
| OpenD4RT 32CLIP (Ours) | 78.63 | 0.1811 | 66.03 | 0.3397 | 72.66 | 0.2944 | 69.93 | 0.2964 |
| OpenD4RT 48CLIP (Ours) | 79.60 | 0.1753 | 67.99 | 0.3178 | 72.48 | 0.2959 | 72.20 | 0.2758 |
Tip: OpenD4RT has the strongest PStudio result in this comparison.
| Case / Motion | RGB + 2D Tracking | GT vs Pred 3D Tracks |
|---|---|---|
softball_25Softball swing and fast ball motion |
![]() |
![]() |
football_16Football play with player and ball motion |
![]() |
![]() |
Build two example Viser demo packages. Each package uses the first 64 frames:
DEMO_CASE=pstudio_mini/juggle_5.npz OUTPUT_DIR=tmp/worldtrack_demo_juggle bash run_build_worldtrack_demo.sh
DEMO_CASE=pstudio_mini/softball_25.npz OUTPUT_DIR=tmp/worldtrack_demo_softball bash run_build_worldtrack_demo.shOpen a demo package with Viser:
python vis/serve_demo_viser.py --root tmp/worldtrack_demo_juggle --port 8081For a lighter/faster package:
DEMO_CASE=pstudio_mini/juggle_5.npz \
OUTPUT_DIR=tmp/worldtrack_demo_small \
POINT_GRID_COLS=32 POINT_GRID_ROWS=32 POINT_MAX_POINTS=1024 TRACK_MAX_POINTS=96 \
bash run_build_worldtrack_demo.shThe generated demo package contains assets/demo_data.json,
assets/input_video.mp4, rendered diagnostic videos, and manifest.json.
- Release the OpenD4RT model runtime for the 32-frame 9-dataset checkpoint.
- Release WorldTrack evaluation scripts and archived metrics.
- Release Viser-based qualitative visualization tools.
- Release complete training code.
- Release additional checkpoints listed in the Checkpoint Zoo.
- Release SynthVerse evaluation results.
- Release full evaluation code for the benchmarks reported in the D4RT paper and appendix.
OpenD4RT is an unofficial implementation and is not affiliated with or endorsed by the original D4RT authors. The code in this repository is released under the Apache 2.0 license; see LICENSE. The D4RT paper, project page, datasets, third-party assets, and upstream dependencies remain under their respective licenses and terms.
This project is built upon the D4RT paper and official project materials. We thank the original D4RT authors for introducing the D4RT formulation, releasing the project page, and documenting the paper and appendix details that this implementation follows. We also acknowledge the contributors and resources credited on the official D4RT website, including colleagues who supported project advice, manuscript feedback, early development, code review, visualization, baseline comparisons, and data generation. We also thank the splat viewer authors for the WebGL renderer used by the official D4RT visualization pipeline. Please refer to the official D4RT project page for the full original acknowledgements.




