Skip to content

Lijiaxin0111/Open-d4rt

Repository files navigation

OpenD4RT

An unofficial PyTorch/GPU implementation of D4RT for 4D reconstruction and tracking

RHOS Team Β· https://mvig-rhos.com/

D4RT project page Hugging Face checkpoints License Python PyTorch

OpenD4RT reproduces D4RT-style 4D reconstruction and tracking with released WorldTrack evaluation, visualization tools, and Hugging Face checkpoints.

OpenD4RT is an unofficial open-source PyTorch/GPU implementation of D4RT, developed to reproduce the model architecture, training recipe, evaluation protocols, and implementation details described in the D4RT paper and appendix. The current public repo includes the released Hugging Face checkpoint, the model, WorldTrack evaluation, and Viser visualization tools, with complete training and evaluation code planned for release.

D4RT overview

πŸ”₯ News

  • [2026/06/04] Released the full OpenD4RT training code.
  • [2025/05/20] Released the 48CLIP_9Mix_NoCropAUG checkpoint.
  • [2026/05/02] Released the OpenD4RT WorldTrack evaluation pipeline, Viser visualization tools, and the first Hugging Face checkpoint.

🧠 What is D4RT?

D4RT is a feedforward video model for reconstructing and tracking dynamic scenes. It uses a unified transformer architecture to infer depth, spatio-temporal correspondence, and camera parameters from a single video. Its query interface probes the 3D position of a source pixel (u, v, t_src) at a target timestep t_tgt in a selected camera coordinate frame t_cam, enabling sparse tracking, all-pixel tracking, and 4D scene reconstruction through the same model interface.

See docs/D4RT_paper.pdf for the local paper PDF included in this repository.

πŸ”§ Installation

Create the conda environment:

conda env create -f environment.yml
conda activate d4rt

Or install into an existing Python environment:

pip install -r requirements.txt

The visualization package builder calls the ffmpeg command-line tool to write MP4 assets for Viser. The conda environment includes ffmpeg; if you use pip install -r requirements.txt, install ffmpeg separately if needed.

πŸ“¦ Checkpoint Zoo

Variant Data Aug. Frames Status Download
32CLIP_9Dataset_NoAUG 9Mix color aug + No crop aug 32 Released HF
48CLIP_9Mix_NoCropAUG 9Mix color aug + No crop aug 48 Released HF
48CLIP_9Mix_AUG 9Mix color aug + crop aug 48 Coming TBD
32CLIP_10Mix_SynthVerse_NoAUG 10Mix color aug + No crop aug 32 Coming TBD
48CLIP_10Mix_SynthVerse_AUG 10Mix color aug + crop aug 48 Coming TBD

Released checkpoint local path: checkpoints/OpenD4RT_32CLIP_9Dataset_NoAUG/opend4rt.ckpt.

Additional released checkpoint local path: checkpoints/OpenD4RT_48CLIP_9Mix_NoCropAUG/opend4rt.ckpt.

Tip: all rows are OpenD4RT variants. The 9Mix setting uses PointOdyssey, Dynamic Replica, Kubric Full, TartanAir, Virtual KITTI 2, ScanNet, BlendedMVS, CO3D, and MVS-Synth. The 10Mix setting additionally includes SynthVerse.

⬇️ Checkpoint Download

Download the released checkpoint and model config from Lijiaxin0111/OpenD4RT into the default path used by the scripts:

pip install -U huggingface_hub

huggingface-cli download Lijiaxin0111/OpenD4RT \
  --repo-type model \
  --include "checkpoints/OpenD4RT_32CLIP_9Dataset_NoAUG/opend4rt.ckpt" \
  --include "checkpoints/OpenD4RT_32CLIP_9Dataset_NoAUG/model.yaml" \
  --include "checkpoints/OpenD4RT_48CLIP_9Mix_NoCropAUG/opend4rt.ckpt" \
  --include "checkpoints/OpenD4RT_48CLIP_9Mix_NoCropAUG/model.yaml" \
  --local-dir .

Expected local files:

checkpoints/OpenD4RT_32CLIP_9Dataset_NoAUG/
  opend4rt.ckpt
  model.yaml
checkpoints/OpenD4RT_48CLIP_9Mix_NoCropAUG/
  opend4rt.ckpt
  model.yaml

🌍 WorldTrack Data

Download the WorldTrack release from:

https://drive.google.com/drive/folders/1-JW88ru30irMYyFab_4YBQbGbd9tKpXV

Place the .npz files under:

data/worldtrack_release/
  adt_mini/*.npz
  po_mini/*.npz
  pstudio_mini/*.npz
  ds_mini/*.npz

πŸ‹οΈ Training

The main reproduction entrypoint for the 48-frame 9Mix run is:

VIDEOMAE2_CKPT=/path/to/vit_g_hybrid_pt_1200e.pth \
bash scripts/train_worldtrack_sota_ninemix_clip48_a_query_local_lr4e-6_8gpu.sh

This script launches torchrun, loads the reproduction configs under configs/, initializes from the released 32-frame checkpoint, and runs the 48-frame training recipe used for the WorldTrack setting.

For a quick preflight without starting training:

DRY_RUN=1 \
VIDEOMAE2_CKPT=/path/to/vit_g_hybrid_pt_1200e.pth \
bash scripts/train_worldtrack_sota_ninemix_clip48_a_query_local_lr4e-6_8gpu.sh

Full training setup, required checkpoints, dataset root overrides, and smoke test commands are documented in docs/training.md.

πŸ“Š Evaluation

Run a quick smoke test on one adt_mini sequence:

LIMIT_SEQS=1 SUBSETS=adt_mini OUTPUT_DIR=tmp/eval_smoke bash run_eval_worldtrack.sh

Run the full WorldTrack evaluation:

bash run_eval_worldtrack.sh

Equivalent explicit command:

EXP=checkpoints/OpenD4RT_32CLIP_9Dataset_NoAUG

python eval_track3d_in_worldtrack.py \
  --model-config "$EXP/model.yaml" \
  --ckpt-path "$EXP/opend4rt.ckpt" \
  --data-root data/worldtrack_release \
  --subsets adt_mini,po_mini,pstudio_mini,ds_mini \
  --num-frames 64 \
  --query-chunk-size 4096 \
  --output-dir tmp/eval_worldtrack \
  --device cuda \
  --save-per-sequence

Useful overrides:

QUERY_CHUNK_SIZE=1024 bash run_eval_worldtrack.sh
CUDA_VISIBLE_DEVICES=1 DEVICE=cuda bash run_eval_worldtrack.sh
SUBSETS=adt_mini LIMIT_SEQS=1 NUM_FRAMES=64 bash run_eval_worldtrack.sh

πŸ† Results

OpenD4RT_32CLIP_9Dataset_NoAUG detailed WorldTrack results:

Subset APD global EPE global APD global dyn EPE global dyn Queries
adt_mini 0.6993 0.2964 0.6975 0.3628 22187
po_mini 0.6603 0.3397 0.7333 0.2722 53468
pstudio_mini 0.7863 0.1811 0.7863 0.1811 8720
ds_mini 0.7266 0.2944 0.7521 0.2699 52462

OpenD4RT_48CLIP_9Mix_NoCropAUG detailed WorldTrack results (step_0006000, anchor_clip, evaluated with 64 frames):

Subset APD global EPE global APD global dyn EPE global dyn Queries
adt_mini 0.7220 0.2758 0.7325 0.3199 22187
po_mini 0.6799 0.3178 0.7425 0.2593 53468
pstudio_mini 0.7960 0.1753 0.7960 0.1753 8720
ds_mini 0.7248 0.2959 0.7488 0.2755 52462

πŸ“ˆ Model Results

Sparse point tracking comparison on WorldTrack-style subsets. APD is shown as a percentage, higher APD is better, and lower EPE is better. Recent baseline numbers are transcribed from the sparse point tracking table in the provided reference image. OpenD4RT uses this repository's evaluation results, with ds_mini reported in the DR column.

Model PStudio PO DR ADT
APD ↑EPE ↓ APD ↑EPE ↓ APD ↑EPE ↓ APD ↑EPE ↓
SpaTrackerV2 (2025)74.160.227269.570.378073.430.273292.220.0915
St4RTrack (2025)69.670.263767.950.314073.740.268276.010.2680
TraceAnything (2025)71.330.272739.831.059360.630.575875.650.2511
Any4D (2025)60.030.334460.860.419468.390.301256.710.4320
V-DPM (2026)76.360.195779.790.199476.380.237866.060.3426
4RC(2026)69.040.260380.270.268182.910.188984.280.1766
OpenD4RT 32CLIP (Ours) 78.63 0.1811 66.030.3397 72.660.2944 69.930.2964
OpenD4RT 48CLIP (Ours) 79.60 0.1753 67.990.3178 72.480.2959 72.200.2758

Tip: OpenD4RT has the strongest PStudio result in this comparison.

🎬 Result Gallery

Case / Motion RGB + 2D Tracking GT vs Pred 3D Tracks
softball_25
Softball swing and fast ball motion
Softball RGB video with GT and OpenD4RT 2D tracking overlay Softball GT and OpenD4RT 3D track comparison
football_16
Football play with player and ball motion
Football RGB video with GT and OpenD4RT 2D tracking overlay Football GT and OpenD4RT 3D track comparison

πŸ‘οΈ Viser Demo Visualization

Build two example Viser demo packages. Each package uses the first 64 frames:

DEMO_CASE=pstudio_mini/juggle_5.npz OUTPUT_DIR=tmp/worldtrack_demo_juggle bash run_build_worldtrack_demo.sh
DEMO_CASE=pstudio_mini/softball_25.npz OUTPUT_DIR=tmp/worldtrack_demo_softball bash run_build_worldtrack_demo.sh

Open a demo package with Viser:

python vis/serve_demo_viser.py --root tmp/worldtrack_demo_juggle --port 8081

For a lighter/faster package:

DEMO_CASE=pstudio_mini/juggle_5.npz \
OUTPUT_DIR=tmp/worldtrack_demo_small \
POINT_GRID_COLS=32 POINT_GRID_ROWS=32 POINT_MAX_POINTS=1024 TRACK_MAX_POINTS=96 \
bash run_build_worldtrack_demo.sh

The generated demo package contains assets/demo_data.json, assets/input_video.mp4, rendered diagnostic videos, and manifest.json.

βœ… ToDo

  • Release the OpenD4RT model runtime for the 32-frame 9-dataset checkpoint.
  • Release WorldTrack evaluation scripts and archived metrics.
  • Release Viser-based qualitative visualization tools.
  • Release complete training code.
  • Release additional checkpoints listed in the Checkpoint Zoo.
  • Release SynthVerse evaluation results.
  • Release full evaluation code for the benchmarks reported in the D4RT paper and appendix.

πŸ“„ License

OpenD4RT is an unofficial implementation and is not affiliated with or endorsed by the original D4RT authors. The code in this repository is released under the Apache 2.0 license; see LICENSE. The D4RT paper, project page, datasets, third-party assets, and upstream dependencies remain under their respective licenses and terms.

πŸ™ Acknowledgements

This project is built upon the D4RT paper and official project materials. We thank the original D4RT authors for introducing the D4RT formulation, releasing the project page, and documenting the paper and appendix details that this implementation follows. We also acknowledge the contributors and resources credited on the official D4RT website, including colleagues who supported project advice, manuscript feedback, early development, code review, visualization, baseline comparisons, and data generation. We also thank the splat viewer authors for the WebGL renderer used by the official D4RT visualization pipeline. Please refer to the official D4RT project page for the full original acknowledgements.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors