Skip to content

PolyU-VCLab/DepthMaster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DepthMaster

DepthMaster: Unified Monocular Depth Estimation for Perspective and Panoramic Images.

Paper Project Page Checkpoints License

Pengfei Wang1 | Shihao Wang1 | Liyi Chen1 | Zhiyuan Ma1 | Guowen Zhang1 | Lei Zhang1,†

1 The Hong Kong Polytechnic University

† Corresponding author.

DepthMaster teaser

πŸ“° News

  • 2026-06: Released the paper on arXiv, the project page, the code repository, and the pretrained checkpoint on Hugging Face.
  • TBA: Hugging Face demo.

πŸ“Œ Quick Links


πŸ” Overview

DepthMaster is a unified monocular depth estimator that works on both perspective images and equirectangular panoramas. The same network backbone is applied to both modalities β€” panoramic inputs are projected to a 6-face cubemap on the GPU, processed jointly with the perspective branch, and the predictions are seamlessly re-projected back onto the sphere.

Key features:

  • βœ… Single model, two modalities. Perspective + equirectangular panorama.
  • βœ… Metric depth output. Recovers depth in meters as well as scale-/affine-invariant predictions.
  • πŸ† **State-of-the-art accuracy on both fronts. DepthMaster achieves top results on standard perspective benchmarks (NYUv2, KITTI, ETH3D, iBims-1, Sintel, DDAD, DIODE, Spring, HAMMER, GSO) and on panoramic benchmarks (Stanford2D3DS, Matterport3D, PanoSUNCG).

πŸ”§ Installation

DepthMaster has been tested on Linux with Python 3.10, PyTorch 2.4.0 and CUDA 12.1.

# 1. Create a fresh environment.
conda create -n depthmaster python=3.10 -y
conda activate depthmaster

# 2. Install PyTorch (example: CUDA 12.1; adjust the index URL for your CUDA version).
pip install torch==2.4.0 torchvision --index-url https://download.pytorch.org/whl/cu121

# 3. Install the rest of the dependencies.
pip install -r requirements.txt

# 4. IMPORTANT: install the exact `utils3d` commit pinned by DepthMaster.
#    A different commit can break geometry / panorama utilities at runtime.
pip install --force-reinstall \
    git+https://github.com/EasternJournalist/utils3d.git@3fab839f0be9931dac7c8488eb0e1600c236e183

If you only need inference (e.g. to run the Quick Start examples below), the steps above are sufficient. Training additionally relies on the data preparation step described in Data Preparation.

Pretrained checkpoint

The official DepthMaster checkpoint is hosted on Hugging Face at VCLab-PolyU/DepthMaster. Download it once and reuse the local path everywhere below:

# Option 1: huggingface-cli (recommended).
huggingface-cli login    # one-off, only if not logged in already
huggingface-cli download VCLab-PolyU/DepthMaster depthmaster.pt \
    --local-dir checkpoints --local-dir-use-symlinks False

# Option 2: direct wget.
mkdir -p checkpoints
wget -O checkpoints/depthmaster.pt \
    https://huggingface.co/VCLab-PolyU/DepthMaster/resolve/main/depthmaster.pt

After the command above finishes, the checkpoint will be available at checkpoints/depthmaster.pt. Use this path for the --pretrained argument in Evaluation and the from_pretrained(...) calls in Quick Start.


πŸš€ Quick Start

Perspective image inference
import torch
import cv2
import numpy as np
from depthmaster.model import DepthMasterModel

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# 1. Load a perspective image as a (3, H, W) float tensor in [0, 1].
img_bgr = cv2.imread("path/to/image.jpg")
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
img = torch.from_numpy(img_rgb).permute(2, 0, 1).float().to(device) / 255.0

# 2. Load the model.
model = DepthMasterModel.from_pretrained("path/to/depthmaster.pt").to(device).eval()

# 3. Run inference. `fov_x` is optional; omit it to let the model predict its own FOV.
with torch.inference_mode():
    output = model.infer(img, apply_mask=True, use_fp16=True)

depth      = output["depth"]       # (H, W) metric depth in meters
points     = output["points"]      # (H, W, 3) metric point map
intrinsics = output["intrinsics"]  # (3, 3) normalized intrinsics
mask       = output["mask"]        # (H, W) bool

# 4. Save a colorized depth visualization.
from depthmaster.utils.vis import colorize_depth
depth_np = np.where(mask.cpu().numpy(), depth.cpu().numpy(), np.inf)
cv2.imwrite("depth.png", cv2.cvtColor(colorize_depth(depth_np), cv2.COLOR_RGB2BGR))
Panoramic (equirectangular) inference
import sys, torch, cv2
from depthmaster.model import DepthMasterModel
sys.path.insert(0, "eval_panorama")  # so that the helpers below are importable
from eval_panorama.eval import (
    erp_to_cubemap_gpu,
    cubemap_to_erp_gpu,
    _get_camera_params_cached,
)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# 1. Load an equirectangular panorama as a (1, 3, H, 2H) float tensor in [0, 255].
pano_bgr = cv2.imread("path/to/panorama.jpg")
pano_rgb = cv2.cvtColor(pano_bgr, cv2.COLOR_BGR2RGB)
pano = torch.from_numpy(pano_rgb).permute(2, 0, 1).float()[None].to(device)
H, W = pano.shape[-2:]

# 2. Load the model.
model = DepthMasterModel.from_pretrained("path/to/depthmaster.pt").to(device).eval()

# 3. ERP -> Cubemap (FOV = 95 deg, face resolution 518).
cubemap_size, fov_deg = 518, 95.0
faces = erp_to_cubemap_gpu(pano / 255.0, face_w=cubemap_size, fov_deg=fov_deg)

# 4. Run DepthMaster on the 6 cubemap faces jointly.
W2C, K = _get_camera_params_cached(fov_deg, cubemap_size, device, torch.float32)
with torch.inference_mode(), torch.autocast(device_type=device.type, dtype=torch.float16):
    raw = model.forward(
        faces.to(model.dtype),
        num_tokens=0,
        camera_type="Panorama",
        W2C=W2C[None].to(model.dtype),
        intrinsics=K[None].to(model.dtype),
    )

points = raw["pts3d"].float() * raw["metric_scale"].float()[:, None, None, None]
points = points.squeeze(0)                                       # (6, H, W, 3)
depth_faces = torch.sqrt((points * points).sum(dim=-1))          # (6, H, W) range depth

# 5. Cubemap -> ERP (with soft-blending to remove face seams).
erp_depth = cubemap_to_erp_gpu(depth_faces, pano_h=H, pano_w=W, fov_deg=fov_deg)
erp_depth = erp_depth.clamp_min(1e-6).cpu().numpy()              # (H, W)

For full evaluation pipelines (with alignment, metrics and visualization), use bash eval_perspective.sh or bash eval_panorama.sh β€” see Evaluation.


πŸ—‚οΈ Data Preparation

All paths inside the configuration files are placeholders of the form <DATA_ROOT>/<dataset>. Replace them with the actual locations of the datasets on your machine.

Training data

DepthMaster is trained on a mixture of perspective and panoramic datasets. You can either:

  1. Use a public mixture. Most academic depth datasets (Hypersim, BlendedMVS, ARKitScenes, Structured3D, Matterport3D, ...) are already covered by the data preparation scripts shipped with CUT3R and depth-anything-3. Once a dataset is processed by either of those repositories, simply point a new entry in configs/train/depthmaster_train.json at the resulting directory.
  2. Bring your own data for finetuning. A custom dataset only has to produce (image, depth, intrinsics) triplets (or, for panoramas, ERP (image, depth) pairs). Drop the assets under any folder, register a reader in depthmaster/train/dataset_readers.py following the existing load_hypersim / load_structured3d patterns, and add an entry to configs/train/depthmaster_train.json. No further changes to the training loop are required.

The released configuration ships with a minimal mixture (Hypersim + Structured3D) that exercises both modalities and is sufficient as a starting point for finetuning.

Perspective benchmark data

Our perspective evaluation re-uses the unified benchmark suite released with MoGe. Download the processed datasets from Huggingface Datasets:

mkdir -p data/eval
huggingface-cli login    # one-off, only if not logged in already
huggingface-cli download Ruicheng/monocular-geometry-evaluation \
    --repo-type dataset \
    --local-dir data/eval \
    --local-dir-use-symlinks False

# Unzip every benchmark.
cd data/eval
unzip '*.zip'
# rm *.zip   # optional: drop archives after extraction

Then edit configs/eval/all_benchmarks.json so that the path field of each benchmark points to the corresponding unzipped directory under data/eval/.

Panoramic benchmark data

Our panoramic evaluation follows the protocol of DA-2. Download the processed panoramic benchmarks from Huggingface Datasets:

mkdir -p data/eval_panorama
huggingface-cli login    # one-off, only if not logged in already
hf download --repo-type dataset haodongli/DA-2-Evaluation \
    --local-dir data/eval_panorama

# Unzip every benchmark archive.
cd data/eval_panorama
for f in *.tar.gz; do tar -zxvf "$f"; done

Then update evaluation.datasets_dir in eval_panorama/configs/eval_panorama.json so that it points to your local data/eval_panorama directory. The split files used during evaluation are bundled under eval_panorama/eval/datasets/splits/.


πŸ‹οΈ Training

# Single-node, 8 GPUs.
bash train.sh 0 1 127.0.0.1

# Multi-node training (e.g. 2 nodes):
#   on the master node:
bash train.sh 0 2 192.168.1.1
#   on the worker node:
bash train.sh 1 2 192.168.1.1

# You can override any Hydra field on the command line:
bash train.sh 0 1 127.0.0.1 trainer.max_steps=200000 paths.root_dir=/path/to/output

The default training schedule lives in training/configs/train/depthmaster.yaml (250k steps, cosine decay, 1.5k warmup), which matches the schedule used to produce the numbers reported in the paper. To warm-start from an existing checkpoint, set wrapper.pretrained=/path/to/ckpt.pt on the command line, or edit training/configs/wrapper/depthmaster.yaml.


πŸ“Š Evaluation

Perspective benchmarks

bash eval_perspective.sh /path/to/depthmaster.pt output/eval_perspective.json

This internally runs:

python depthmaster/scripts/eval_baseline.py \
    --baseline baselines/depthmaster.py \
    --config configs/eval/all_benchmarks.json \
    --output output/eval_perspective.json \
    --pretrained /path/to/depthmaster.pt \
    --resolution_level 9 \
    --fp16

Pass --save_per_sample to additionally export per-sample metrics under output/eval_perspective_per_sample/<benchmark>.json.

Panoramic benchmarks

bash eval_panorama.sh /path/to/depthmaster.pt output/eval_panorama

The panoramic evaluator first renders each ERP image into a 6-face cubemap (FOV = 95Β°, face resolution 518Γ—518), runs DepthMaster on each face, and re-projects the predictions back onto the sphere before computing scale-invariant and affine-invariant metrics. The default alignment list is ["dm_scale", "dm_affine"] (configurable in eval_panorama/configs/eval_panorama.json).


πŸ“ Repository Layout

DepthMaster/
β”œβ”€β”€ depthmaster/                          # Core package: model, training utilities, alignment,
β”‚   β”‚                                     # panorama helpers, dataset readers, ...
β”‚   β”œβ”€β”€ model/model.py                    # DepthMaster model definition
β”‚   β”œβ”€β”€ train/                            # Loss functions, equirect <-> cubemap utilities, ...
β”‚   β”œβ”€β”€ test/                             # Evaluation interfaces (baseline / dataloader / metrics)
β”‚   β”œβ”€β”€ utils/                            # Alignment, geometry, panorama helpers, ...
β”‚   └── scripts/eval_baseline.py          # Perspective evaluation entry point
β”œβ”€β”€ baselines/depthmaster.py              # DepthMaster wrapper for `eval_baseline.py`
β”œβ”€β”€ configs/
β”‚   β”œβ”€β”€ train/depthmaster_train.json      # Dataset mixture used during training
β”‚   └── eval/all_benchmarks.json          # Perspective benchmark definitions
β”œβ”€β”€ training/                             # PyTorch Lightning training framework
β”‚   β”œβ”€β”€ launch.py
β”‚   β”œβ”€β”€ wrapper.py                        # Lightning wrapper around DepthMasterModel
β”‚   β”œβ”€β”€ data/datamodule.py
β”‚   └── configs/                          # Hydra configs (train / wrapper / data / paths)
β”œβ”€β”€ eval_panorama/                        # Panoramic evaluation
β”‚   β”œβ”€β”€ eval.py                           # ERP -> Cubemap -> ERP evaluation pipeline (GPU)
β”‚   β”œβ”€β”€ eval/                             # Datasets, alignment, metrics
β”‚   └── configs/eval_panorama.json
β”œβ”€β”€ train.sh                              # Multi-node DDP training launcher
β”œβ”€β”€ eval_perspective.sh                   # Perspective benchmark launcher
β”œβ”€β”€ eval_panorama.sh                      # Panoramic benchmark launcher
β”œβ”€β”€ assets/teaser.png                     # Teaser image used in this README
β”œβ”€β”€ requirements.txt
└── LICENSE

πŸ™ Acknowledgments

This project builds on top of several outstanding open-source efforts. We sincerely thank the authors and maintainers of:

  • MoGe β€” geometry-aware monocular depth/point estimator (training infrastructure and benchmark suite).
  • DA-2 β€” panoramic evaluation pipeline reference and panoramic benchmark release.
  • CUT3R β€” academic dataset preparation pipeline.
  • depth-anything-3 β€” academic dataset preparation pipeline.

πŸ“š Citation

If you find DepthMaster useful for your research, please consider citing:

@article{wang2026depthmaster,
  title         = {DepthMaster: A Unified Perspective and Panoramic Monocular Depth Estimator},
  author        = {Wang, Pengfei and Wang, Shihao and Chen, Liyi and Ma, Zhiyuan and Zhang, Guowen and Zhang, Lei},
  journal       = {arXiv preprint arXiv:2606.12368},
  year          = {2026},
  eprint        = {2606.12368},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV}
}

πŸ“¬ Contact

If you have any questions or suggestions, please feel free to open an issue or contact pengfei.wang@connect.polyu.hk.


License

This project is released under the MIT License.

About

DepthMaster: Unified Monocular Depth Estimation for Perspective and Panoramic Images

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors