OpenD4RT

An unofficial PyTorch/GPU implementation of D4RT for 4D reconstruction and tracking

OpenD4RT reproduces D4RT-style 4D reconstruction and tracking with released WorldTrack evaluation, visualization tools, and Hugging Face checkpoints.

OpenD4RT is an unofficial open-source PyTorch/GPU implementation of D4RT, developed to reproduce the model architecture, training recipe, evaluation protocols, and implementation details described in the D4RT paper and appendix. The current public repo includes the released Hugging Face checkpoint, the model, WorldTrack evaluation, and Viser visualization tools, with complete training and evaluation code planned for release.

🔥 News

[2026/06/04] Released the full OpenD4RT training code.
[2025/05/20] Released the 48CLIP_9Mix_NoCropAUG checkpoint.
[2026/05/02] Released the OpenD4RT WorldTrack evaluation pipeline, Viser visualization tools, and the first Hugging Face checkpoint.

🧠 What is D4RT?

D4RT is a feedforward video model for reconstructing and tracking dynamic scenes. It uses a unified transformer architecture to infer depth, spatio-temporal correspondence, and camera parameters from a single video. Its query interface probes the 3D position of a source pixel (u, v, t_src) at a target timestep t_tgt in a selected camera coordinate frame t_cam, enabling sparse tracking, all-pixel tracking, and 4D scene reconstruction through the same model interface.

See docs/D4RT_paper.pdf for the local paper PDF included in this repository.

🔧 Installation

Create the conda environment:

conda env create -f environment.yml
conda activate d4rt

Or install into an existing Python environment:

pip install -r requirements.txt

The visualization package builder calls the ffmpeg command-line tool to write MP4 assets for Viser. The conda environment includes ffmpeg; if you use pip install -r requirements.txt, install ffmpeg separately if needed.

📦 Checkpoint Zoo

Variant	Data	Aug.	Frames	Status	Download
`32CLIP_9Dataset_NoAUG`	9Mix	color aug + No crop aug	32	Released	HF
`48CLIP_9Mix_NoCropAUG`	9Mix	color aug + No crop aug	48	Released	HF
`48CLIP_9Mix_AUG`	9Mix	color aug + crop aug	48	Coming	TBD
`32CLIP_10Mix_SynthVerse_NoAUG`	10Mix	color aug + No crop aug	32	Coming	TBD
`48CLIP_10Mix_SynthVerse_AUG`	10Mix	color aug + crop aug	48	Coming	TBD

Released checkpoint local path: checkpoints/OpenD4RT_32CLIP_9Dataset_NoAUG/opend4rt.ckpt.

Additional released checkpoint local path: checkpoints/OpenD4RT_48CLIP_9Mix_NoCropAUG/opend4rt.ckpt.

Tip: all rows are OpenD4RT variants. The 9Mix setting uses PointOdyssey, Dynamic Replica, Kubric Full, TartanAir, Virtual KITTI 2, ScanNet, BlendedMVS, CO3D, and MVS-Synth. The 10Mix setting additionally includes SynthVerse.

⬇️ Checkpoint Download

Download the released checkpoint and model config from Lijiaxin0111/OpenD4RT into the default path used by the scripts:

pip install -U huggingface_hub

huggingface-cli download Lijiaxin0111/OpenD4RT \
  --repo-type model \
  --include "checkpoints/OpenD4RT_32CLIP_9Dataset_NoAUG/opend4rt.ckpt" \
  --include "checkpoints/OpenD4RT_32CLIP_9Dataset_NoAUG/model.yaml" \
  --include "checkpoints/OpenD4RT_48CLIP_9Mix_NoCropAUG/opend4rt.ckpt" \
  --include "checkpoints/OpenD4RT_48CLIP_9Mix_NoCropAUG/model.yaml" \
  --local-dir .

Expected local files:

checkpoints/OpenD4RT_32CLIP_9Dataset_NoAUG/
  opend4rt.ckpt
  model.yaml
checkpoints/OpenD4RT_48CLIP_9Mix_NoCropAUG/
  opend4rt.ckpt
  model.yaml

🌍 WorldTrack Data

Download the WorldTrack release from:

https://drive.google.com/drive/folders/1-JW88ru30irMYyFab_4YBQbGbd9tKpXV

Place the .npz files under:

data/worldtrack_release/
  adt_mini/*.npz
  po_mini/*.npz
  pstudio_mini/*.npz
  ds_mini/*.npz

🏋️ Training

The main reproduction entrypoint for the 48-frame 9Mix run is:

VIDEOMAE2_CKPT=/path/to/vit_g_hybrid_pt_1200e.pth \
bash scripts/train_worldtrack_sota_ninemix_clip48_a_query_local_lr4e-6_8gpu.sh

This script launches torchrun, loads the reproduction configs under configs/, initializes from the released 32-frame checkpoint, and runs the 48-frame training recipe used for the WorldTrack setting.

For a quick preflight without starting training:

DRY_RUN=1 \
VIDEOMAE2_CKPT=/path/to/vit_g_hybrid_pt_1200e.pth \
bash scripts/train_worldtrack_sota_ninemix_clip48_a_query_local_lr4e-6_8gpu.sh

Full training setup, required checkpoints, dataset root overrides, and smoke test commands are documented in docs/training.md.

📊 Evaluation

Run a quick smoke test on one adt_mini sequence:

LIMIT_SEQS=1 SUBSETS=adt_mini OUTPUT_DIR=tmp/eval_smoke bash run_eval_worldtrack.sh

Run the full WorldTrack evaluation:

bash run_eval_worldtrack.sh

Equivalent explicit command:

EXP=checkpoints/OpenD4RT_32CLIP_9Dataset_NoAUG

python eval_track3d_in_worldtrack.py \
  --model-config "$EXP/model.yaml" \
  --ckpt-path "$EXP/opend4rt.ckpt" \
  --data-root data/worldtrack_release \
  --subsets adt_mini,po_mini,pstudio_mini,ds_mini \
  --num-frames 64 \
  --query-chunk-size 4096 \
  --output-dir tmp/eval_worldtrack \
  --device cuda \
  --save-per-sequence

Useful overrides:

QUERY_CHUNK_SIZE=1024 bash run_eval_worldtrack.sh
CUDA_VISIBLE_DEVICES=1 DEVICE=cuda bash run_eval_worldtrack.sh
SUBSETS=adt_mini LIMIT_SEQS=1 NUM_FRAMES=64 bash run_eval_worldtrack.sh

🏆 Results

OpenD4RT_32CLIP_9Dataset_NoAUG detailed WorldTrack results:

Subset	APD global	EPE global	APD global dyn	EPE global dyn	Queries
`adt_mini`	0.6993	0.2964	0.6975	0.3628	22187
`po_mini`	0.6603	0.3397	0.7333	0.2722	53468
`pstudio_mini`	0.7863	0.1811	0.7863	0.1811	8720
`ds_mini`	0.7266	0.2944	0.7521	0.2699	52462

OpenD4RT_48CLIP_9Mix_NoCropAUG detailed WorldTrack results (step_0006000, anchor_clip, evaluated with 64 frames):

Subset	APD global	EPE global	APD global dyn	EPE global dyn	Queries
`adt_mini`	0.7220	0.2758	0.7325	0.3199	22187
`po_mini`	0.6799	0.3178	0.7425	0.2593	53468
`pstudio_mini`	0.7960	0.1753	0.7960	0.1753	8720
`ds_mini`	0.7248	0.2959	0.7488	0.2755	52462

📈 Model Results

Sparse point tracking comparison on WorldTrack-style subsets. APD is shown as a percentage, higher APD is better, and lower EPE is better. Recent baseline numbers are transcribed from the sparse point tracking table in the provided reference image. OpenD4RT uses this repository's evaluation results, with ds_mini reported in the DR column.

Model	PStudio		PO		DR		ADT
Model	APD ↑	EPE ↓	APD ↑	EPE ↓	APD ↑	EPE ↓	APD ↑	EPE ↓
SpaTrackerV2 (2025)	74.16	0.2272	69.57	0.3780	73.43	0.2732	92.22	0.0915
St4RTrack (2025)	69.67	0.2637	67.95	0.3140	73.74	0.2682	76.01	0.2680
TraceAnything (2025)	71.33	0.2727	39.83	1.0593	60.63	0.5758	75.65	0.2511
Any4D (2025)	60.03	0.3344	60.86	0.4194	68.39	0.3012	56.71	0.4320
V-DPM (2026)	76.36	0.1957	79.79	0.1994	76.38	0.2378	66.06	0.3426
4RC(2026)	69.04	0.2603	80.27	0.2681	82.91	0.1889	84.28	0.1766
OpenD4RT 32CLIP (Ours)	78.63	0.1811	66.03	0.3397	72.66	0.2944	69.93	0.2964
OpenD4RT 48CLIP (Ours)	79.60	0.1753	67.99	0.3178	72.48	0.2959	72.20	0.2758

Tip: OpenD4RT has the strongest PStudio result in this comparison.

🎬 Result Gallery

Case / Motion	RGB + 2D Tracking	GT vs Pred 3D Tracks
`softball_25` Softball swing and fast ball motion
`football_16` Football play with player and ball motion

👁️ Viser Demo Visualization

Build two example Viser demo packages. Each package uses the first 64 frames:

DEMO_CASE=pstudio_mini/juggle_5.npz OUTPUT_DIR=tmp/worldtrack_demo_juggle bash run_build_worldtrack_demo.sh
DEMO_CASE=pstudio_mini/softball_25.npz OUTPUT_DIR=tmp/worldtrack_demo_softball bash run_build_worldtrack_demo.sh

Open a demo package with Viser:

python vis/serve_demo_viser.py --root tmp/worldtrack_demo_juggle --port 8081

For a lighter/faster package:

DEMO_CASE=pstudio_mini/juggle_5.npz \
OUTPUT_DIR=tmp/worldtrack_demo_small \
POINT_GRID_COLS=32 POINT_GRID_ROWS=32 POINT_MAX_POINTS=1024 TRACK_MAX_POINTS=96 \
bash run_build_worldtrack_demo.sh

The generated demo package contains assets/demo_data.json, assets/input_video.mp4, rendered diagnostic videos, and manifest.json.

✅ ToDo

Release the OpenD4RT model runtime for the 32-frame 9-dataset checkpoint.
Release WorldTrack evaluation scripts and archived metrics.
Release Viser-based qualitative visualization tools.
Release complete training code.
Release additional checkpoints listed in the Checkpoint Zoo.
Release SynthVerse evaluation results.
Release full evaluation code for the benchmarks reported in the D4RT paper and appendix.

📄 License

OpenD4RT is an unofficial implementation and is not affiliated with or endorsed by the original D4RT authors. The code in this repository is released under the Apache 2.0 license; see LICENSE. The D4RT paper, project page, datasets, third-party assets, and upstream dependencies remain under their respective licenses and terms.

🙏 Acknowledgements

This project is built upon the D4RT paper and official project materials. We thank the original D4RT authors for introducing the D4RT formulation, releasing the project page, and documenting the paper and appendix details that this implementation follows. We also acknowledge the contributors and resources credited on the official D4RT website, including colleagues who supported project advice, manuscript feedback, early development, code review, visualization, baseline comparisons, and data generation. We also thank the splat viewer authors for the WebGL renderer used by the official D4RT visualization pipeline. Please refer to the official D4RT project page for the full original acknowledgements.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenD4RT

An unofficial PyTorch/GPU implementation of D4RT for 4D reconstruction and tracking

🔥 News

🧠 What is D4RT?

🔧 Installation

📦 Checkpoint Zoo

⬇️ Checkpoint Download

🌍 WorldTrack Data

🏋️ Training

📊 Evaluation

🏆 Results

📈 Model Results

🎬 Result Gallery

👁️ Viser Demo Visualization

✅ ToDo

📄 License

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
checkpoints		checkpoints
configs		configs
demo		demo
docs		docs
scripts		scripts
src		src
vis		vis
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
eval_track3d_in_worldtrack.py		eval_track3d_in_worldtrack.py
infer_track_3d.py		infer_track_3d.py
requirements.txt		requirements.txt
run_build_worldtrack_demo.sh		run_build_worldtrack_demo.sh
run_eval_worldtrack.sh		run_eval_worldtrack.sh
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

OpenD4RT

An unofficial PyTorch/GPU implementation of D4RT for 4D reconstruction and tracking

🔥 News

🧠 What is D4RT?

🔧 Installation

📦 Checkpoint Zoo

⬇️ Checkpoint Download

🌍 WorldTrack Data

🏋️ Training

📊 Evaluation

🏆 Results

📈 Model Results

🎬 Result Gallery

👁️ Viser Demo Visualization

✅ ToDo

📄 License

🙏 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages