DepthMaster: Unified Monocular Depth Estimation for Perspective and Panoramic Images.
Pengfei Wang1 | Shihao Wang1 | Liyi Chen1 | Zhiyuan Ma1 | Guowen Zhang1 | Lei Zhang1,β
1 The Hong Kong Polytechnic University
β Corresponding author.
- 2026-06: Released the paper on arXiv, the project page, the code repository, and the pretrained checkpoint on Hugging Face.
- TBA: Hugging Face demo.
- π° News
- π Overview
- π§ Installation
- π Quick Start
- ποΈ Data Preparation
- ποΈ Training
- π Evaluation
- π Repository Layout
- π Acknowledgments
- π Citation
- π¬ Contact
DepthMaster is a unified monocular depth estimator that works on both perspective images and equirectangular panoramas. The same network backbone is applied to both modalities β panoramic inputs are projected to a 6-face cubemap on the GPU, processed jointly with the perspective branch, and the predictions are seamlessly re-projected back onto the sphere.
Key features:
- β Single model, two modalities. Perspective + equirectangular panorama.
- β Metric depth output. Recovers depth in meters as well as scale-/affine-invariant predictions.
- π **State-of-the-art accuracy on both fronts. DepthMaster achieves top results on standard perspective benchmarks (NYUv2, KITTI, ETH3D, iBims-1, Sintel, DDAD, DIODE, Spring, HAMMER, GSO) and on panoramic benchmarks (Stanford2D3DS, Matterport3D, PanoSUNCG).
DepthMaster has been tested on Linux with Python 3.10, PyTorch 2.4.0 and CUDA 12.1.
# 1. Create a fresh environment.
conda create -n depthmaster python=3.10 -y
conda activate depthmaster
# 2. Install PyTorch (example: CUDA 12.1; adjust the index URL for your CUDA version).
pip install torch==2.4.0 torchvision --index-url https://download.pytorch.org/whl/cu121
# 3. Install the rest of the dependencies.
pip install -r requirements.txt
# 4. IMPORTANT: install the exact `utils3d` commit pinned by DepthMaster.
# A different commit can break geometry / panorama utilities at runtime.
pip install --force-reinstall \
git+https://github.com/EasternJournalist/utils3d.git@3fab839f0be9931dac7c8488eb0e1600c236e183If you only need inference (e.g. to run the Quick Start examples below), the steps above are sufficient. Training additionally relies on the data preparation step described in Data Preparation.
The official DepthMaster checkpoint is hosted on Hugging Face at VCLab-PolyU/DepthMaster. Download it once and reuse the local path everywhere below:
# Option 1: huggingface-cli (recommended).
huggingface-cli login # one-off, only if not logged in already
huggingface-cli download VCLab-PolyU/DepthMaster depthmaster.pt \
--local-dir checkpoints --local-dir-use-symlinks False
# Option 2: direct wget.
mkdir -p checkpoints
wget -O checkpoints/depthmaster.pt \
https://huggingface.co/VCLab-PolyU/DepthMaster/resolve/main/depthmaster.ptAfter the command above finishes, the checkpoint will be available at
checkpoints/depthmaster.pt. Use this path for the --pretrained
argument in Evaluation and the from_pretrained(...) calls
in Quick Start.
Perspective image inference
import torch
import cv2
import numpy as np
from depthmaster.model import DepthMasterModel
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# 1. Load a perspective image as a (3, H, W) float tensor in [0, 1].
img_bgr = cv2.imread("path/to/image.jpg")
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
img = torch.from_numpy(img_rgb).permute(2, 0, 1).float().to(device) / 255.0
# 2. Load the model.
model = DepthMasterModel.from_pretrained("path/to/depthmaster.pt").to(device).eval()
# 3. Run inference. `fov_x` is optional; omit it to let the model predict its own FOV.
with torch.inference_mode():
output = model.infer(img, apply_mask=True, use_fp16=True)
depth = output["depth"] # (H, W) metric depth in meters
points = output["points"] # (H, W, 3) metric point map
intrinsics = output["intrinsics"] # (3, 3) normalized intrinsics
mask = output["mask"] # (H, W) bool
# 4. Save a colorized depth visualization.
from depthmaster.utils.vis import colorize_depth
depth_np = np.where(mask.cpu().numpy(), depth.cpu().numpy(), np.inf)
cv2.imwrite("depth.png", cv2.cvtColor(colorize_depth(depth_np), cv2.COLOR_RGB2BGR))Panoramic (equirectangular) inference
import sys, torch, cv2
from depthmaster.model import DepthMasterModel
sys.path.insert(0, "eval_panorama") # so that the helpers below are importable
from eval_panorama.eval import (
erp_to_cubemap_gpu,
cubemap_to_erp_gpu,
_get_camera_params_cached,
)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# 1. Load an equirectangular panorama as a (1, 3, H, 2H) float tensor in [0, 255].
pano_bgr = cv2.imread("path/to/panorama.jpg")
pano_rgb = cv2.cvtColor(pano_bgr, cv2.COLOR_BGR2RGB)
pano = torch.from_numpy(pano_rgb).permute(2, 0, 1).float()[None].to(device)
H, W = pano.shape[-2:]
# 2. Load the model.
model = DepthMasterModel.from_pretrained("path/to/depthmaster.pt").to(device).eval()
# 3. ERP -> Cubemap (FOV = 95 deg, face resolution 518).
cubemap_size, fov_deg = 518, 95.0
faces = erp_to_cubemap_gpu(pano / 255.0, face_w=cubemap_size, fov_deg=fov_deg)
# 4. Run DepthMaster on the 6 cubemap faces jointly.
W2C, K = _get_camera_params_cached(fov_deg, cubemap_size, device, torch.float32)
with torch.inference_mode(), torch.autocast(device_type=device.type, dtype=torch.float16):
raw = model.forward(
faces.to(model.dtype),
num_tokens=0,
camera_type="Panorama",
W2C=W2C[None].to(model.dtype),
intrinsics=K[None].to(model.dtype),
)
points = raw["pts3d"].float() * raw["metric_scale"].float()[:, None, None, None]
points = points.squeeze(0) # (6, H, W, 3)
depth_faces = torch.sqrt((points * points).sum(dim=-1)) # (6, H, W) range depth
# 5. Cubemap -> ERP (with soft-blending to remove face seams).
erp_depth = cubemap_to_erp_gpu(depth_faces, pano_h=H, pano_w=W, fov_deg=fov_deg)
erp_depth = erp_depth.clamp_min(1e-6).cpu().numpy() # (H, W)For full evaluation pipelines (with alignment, metrics and visualization),
use bash eval_perspective.sh or bash eval_panorama.sh β see
Evaluation.
All paths inside the configuration files are placeholders of the form
<DATA_ROOT>/<dataset>. Replace them with the actual locations of the
datasets on your machine.
Training data
DepthMaster is trained on a mixture of perspective and panoramic datasets. You can either:
- Use a public mixture. Most academic depth datasets (Hypersim,
BlendedMVS, ARKitScenes, Structured3D, Matterport3D, ...) are already
covered by the data preparation scripts shipped with
CUT3R and
depth-anything-3.
Once a dataset is processed by either of those repositories, simply
point a new entry in
configs/train/depthmaster_train.jsonat the resulting directory. - Bring your own data for finetuning. A custom dataset only has to
produce
(image, depth, intrinsics)triplets (or, for panoramas, ERP(image, depth)pairs). Drop the assets under any folder, register a reader indepthmaster/train/dataset_readers.pyfollowing the existingload_hypersim/load_structured3dpatterns, and add an entry toconfigs/train/depthmaster_train.json. No further changes to the training loop are required.
The released configuration ships with a minimal mixture (Hypersim + Structured3D) that exercises both modalities and is sufficient as a starting point for finetuning.
Perspective benchmark data
Our perspective evaluation re-uses the unified benchmark suite released with MoGe. Download the processed datasets from Huggingface Datasets:
mkdir -p data/eval
huggingface-cli login # one-off, only if not logged in already
huggingface-cli download Ruicheng/monocular-geometry-evaluation \
--repo-type dataset \
--local-dir data/eval \
--local-dir-use-symlinks False
# Unzip every benchmark.
cd data/eval
unzip '*.zip'
# rm *.zip # optional: drop archives after extractionThen edit
configs/eval/all_benchmarks.json so
that the path field of each benchmark points to the corresponding
unzipped directory under data/eval/.
Panoramic benchmark data
Our panoramic evaluation follows the protocol of DA-2. Download the processed panoramic benchmarks from Huggingface Datasets:
mkdir -p data/eval_panorama
huggingface-cli login # one-off, only if not logged in already
hf download --repo-type dataset haodongli/DA-2-Evaluation \
--local-dir data/eval_panorama
# Unzip every benchmark archive.
cd data/eval_panorama
for f in *.tar.gz; do tar -zxvf "$f"; doneThen update evaluation.datasets_dir in
eval_panorama/configs/eval_panorama.json
so that it points to your local data/eval_panorama directory. The split
files used during evaluation are bundled under
eval_panorama/eval/datasets/splits/.
# Single-node, 8 GPUs.
bash train.sh 0 1 127.0.0.1
# Multi-node training (e.g. 2 nodes):
# on the master node:
bash train.sh 0 2 192.168.1.1
# on the worker node:
bash train.sh 1 2 192.168.1.1
# You can override any Hydra field on the command line:
bash train.sh 0 1 127.0.0.1 trainer.max_steps=200000 paths.root_dir=/path/to/outputThe default training schedule lives in
training/configs/train/depthmaster.yaml
(250k steps, cosine decay, 1.5k warmup), which matches the schedule used
to produce the numbers reported in the paper. To warm-start from an existing
checkpoint, set wrapper.pretrained=/path/to/ckpt.pt on the command line,
or edit
training/configs/wrapper/depthmaster.yaml.
bash eval_perspective.sh /path/to/depthmaster.pt output/eval_perspective.jsonThis internally runs:
python depthmaster/scripts/eval_baseline.py \
--baseline baselines/depthmaster.py \
--config configs/eval/all_benchmarks.json \
--output output/eval_perspective.json \
--pretrained /path/to/depthmaster.pt \
--resolution_level 9 \
--fp16Pass --save_per_sample to additionally export per-sample metrics under
output/eval_perspective_per_sample/<benchmark>.json.
bash eval_panorama.sh /path/to/depthmaster.pt output/eval_panoramaThe panoramic evaluator first renders each ERP image into a 6-face cubemap
(FOV = 95Β°, face resolution 518Γ518), runs DepthMaster on each face, and
re-projects the predictions back onto the sphere before computing
scale-invariant and affine-invariant metrics. The default alignment list
is ["dm_scale", "dm_affine"] (configurable in
eval_panorama/configs/eval_panorama.json).
DepthMaster/
βββ depthmaster/ # Core package: model, training utilities, alignment,
β β # panorama helpers, dataset readers, ...
β βββ model/model.py # DepthMaster model definition
β βββ train/ # Loss functions, equirect <-> cubemap utilities, ...
β βββ test/ # Evaluation interfaces (baseline / dataloader / metrics)
β βββ utils/ # Alignment, geometry, panorama helpers, ...
β βββ scripts/eval_baseline.py # Perspective evaluation entry point
βββ baselines/depthmaster.py # DepthMaster wrapper for `eval_baseline.py`
βββ configs/
β βββ train/depthmaster_train.json # Dataset mixture used during training
β βββ eval/all_benchmarks.json # Perspective benchmark definitions
βββ training/ # PyTorch Lightning training framework
β βββ launch.py
β βββ wrapper.py # Lightning wrapper around DepthMasterModel
β βββ data/datamodule.py
β βββ configs/ # Hydra configs (train / wrapper / data / paths)
βββ eval_panorama/ # Panoramic evaluation
β βββ eval.py # ERP -> Cubemap -> ERP evaluation pipeline (GPU)
β βββ eval/ # Datasets, alignment, metrics
β βββ configs/eval_panorama.json
βββ train.sh # Multi-node DDP training launcher
βββ eval_perspective.sh # Perspective benchmark launcher
βββ eval_panorama.sh # Panoramic benchmark launcher
βββ assets/teaser.png # Teaser image used in this README
βββ requirements.txt
βββ LICENSE
This project builds on top of several outstanding open-source efforts. We sincerely thank the authors and maintainers of:
- MoGe β geometry-aware monocular depth/point estimator (training infrastructure and benchmark suite).
- DA-2 β panoramic evaluation pipeline reference and panoramic benchmark release.
- CUT3R β academic dataset preparation pipeline.
- depth-anything-3 β academic dataset preparation pipeline.
If you find DepthMaster useful for your research, please consider citing:
@article{wang2026depthmaster,
title = {DepthMaster: A Unified Perspective and Panoramic Monocular Depth Estimator},
author = {Wang, Pengfei and Wang, Shihao and Chen, Liyi and Ma, Zhiyuan and Zhang, Guowen and Zhang, Lei},
journal = {arXiv preprint arXiv:2606.12368},
year = {2026},
eprint = {2606.12368},
archivePrefix = {arXiv},
primaryClass = {cs.CV}
}If you have any questions or suggestions, please feel free to open an issue or contact pengfei.wang@connect.polyu.hk.
This project is released under the MIT License.
