Acknowledgements

GemDepth: Geometry-Embedded Features for 3D-Consistent Video Depth

Yuecheng liu¹, Junda Cheng^1†, Longliang Liu^1,2, Wenjing Liao^1,2, Hanrui Cheng^1,2, Yuzhou Wang¹, Xin Yang^1,3

^†Corresponding Author
¹Hust, ²Carizon, ³Optics Valley Laboratory

If you like our project, please give us a star ⭐ on GitHub for the latest updates!

🤗 Demo Video

📢 News

[2026.05.18] 🤗🤗🤗 Evaluation datasets released on Hugging Face.
[2026.05.16] 🤗🤗🤗 Hugging Face Gradio demos released.
[2026.05.16] Add GPU memory adjustment schemes for inference and training.
[2026.05.15] 🤗🤗🤗Pre-trained weights released on Hugging Face.
[2026.05.14] Add run_video_pointcloud for pointcloud reconstruction.
[2026.05.09] 🔥🔥🔥GemDepth is out! It effectively recovering fine-grained details and has better 3D temporal consistency.

👋 Introduction

Welcome to the official repository for GemDepth!

GemDepth is a framework built on the insight that an explicit awareness of camera motion and global 3D structure is a prerequisite for 3D consistency. Distinctively, GemDepth introduces a Geometry-Embedding Module (GEM) that predicts inter-frame camera poses to generate implicit geometric embeddings. This injection of motion priors equips the network with intrinsic 3D perception and alignment capabilities. Guided by these geometric cues, our Alternating Spatio-Temporal Transformer (ASTT) captures latent point-level correspondences to simultaneously enhance spatial precision for sharp details and enforce rigorous temporal consistency.

GemDepth achieves state of-the-art performance across multiple datasets, particularly in complex dynamic scenarios.

📝 Benchmarks performance

Comparisons with state-of-the-art methods across four of the most widely used benchmarks.

⏳ Usage

Preparation

git clone https://github.com/Yuechengliu919/GemDepth
cd GemDepth
conda create -n gemdepth python=3.10
conda activate gemdepth
pip install -r requirements.txt

Model weights

Model	Link
GemDepth	Download 🤗

The final structure shoule be like

GemDepth
├── checkpoint/
├──── gemdepth.pth
├── configs/
├── model/
├── ...

Use our model

import torch
from model.gemdepth import GemDepth
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
model_configs = {
    'vits': {'encoder': 'vits''features': 64, 'out_channels': [4896, 192, 384]},
    'vitl': {'encoder': 'vitl''features': 256, 'out_channels'[256, 512, 1024, 1024]},
}
gemdepth = GemDepth(**model_configs[argencoder])
checkpoint = torch.load("./checkpoint/gemdepth.pth",map_location='cpu',weights_only=False)
gemdepth.load_state_dict(checkpoint,strict=True)
gemdepth = gemdepth.to(DEVICE).eval()
frames, target_fps = read_video_frames(video_path, args.max_len, args.target_fps, 1280)
depths, fps = gemdepth.infer_video_depth(frames, target_fps, input_size=args.input_size,device=DEVICE, fp32=args.fp32)

Running script on video

# Only video depth output
python evaluation/inference/run_video.py --input_dir ./assets/example_videos --output_dir ./assets/example_result
# video depth & pointcloud output
python evaluation/inference/run_video_pointcloud.py --input_dir ./assets/example_videos --output_dir ./assets/example_result

Tips: If GPU memory is insufficient, you can adjust the infer settings in model/gemdepth.py. The default settings are:

INFER_LEN = 32
OVERLAP = 10
KEYFRAMES = [0, 12, 24, 25, 26, 27, 28, 29, 30, 31]
INTERP_LEN = 8

which require about 44GB GPU memory. You can reduce them as follows:

INFER_LEN = 16
OVERLAP = 6
KEYFRAMES = [0, 6, 12, 13, 14, 15]
INTERP_LEN = 4

which require about 25GB GPU memory, or:

INFER_LEN = 8
OVERLAP = 4
KEYFRAMES = [0, 3, 6, 7]
INTERP_LEN = 2

which require about 15GB GPU memory. You can adjust these parameters according to your GPU memory.

Interactive Demo

We provide an interactive Gradio interface for you to easily test GemDepth on your own videos without writing any code.

pip install -r demo/requirements.txt
python demo/app.py

Our Gradio-based interface allows you to upload videos, run video depth prediction and pointcloud reconstruction, and interactively explore the 3D scene in your browser.

✏️ Training Data

✈️ Evaluation

Prepare Evaluation Datasets

Datasets	Link
Sintel	Download 🤗
KITTI	Download 🤗
Bonn	Download 🤗
Scannet	Download 🤗

You can directly download the evaluation datasets via the link above, or follow the preprocessing steps below.

Follow VideoDepthAnything, download raw datasets from the following links: Sintel, KITTI, Bonn, ScanNet

pip install natsort
cd dataset/dataset_extract
python dataset_extrtact${dataset}.py

This script will extract the dataset to the dataset/dataset_extract/dataset folder. It will also generate the json file for the dataset.

Run inference

python evaluation/inference/infer/infer.py \
    --infer_path ${out_path} \
    --json_file ${json_path} \
    --datasets ${dataset}

Options:

--infer_path: path to save the output results
--json_file: path to the json file for the dataset, like sintel_video.json, kitti_video_500.json, scannet_video_tae.json
--datasets: dataset name, choose from sintel, kitti, bonn, scannet

Run evaluation

## ~500frame 
python evaluation/eval/eval.py \
    --infer_path ${pred_root} \
    --benchmark_path ${benchmark_root} \
    --datasets ${dataset}

✈️ Training

To train GemDepth on mix-datasets, run

## stage1
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch train.py --config-name stage1
## stage2
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch train.py --config-name stage2

Tips: If GPU memory is insufficient, you can adjust seq_len in the config file.

✈️ Citation

If you find our works useful in your research, please consider citing our papers:

@inproceedings{Liu2026GemDepthGF,
  title={GemDepth: Geometry-Embedded Features for 3D-Consistent Video Depth},
  author={Yuecheng Liu and Junda Cheng and Longliang Liu and Wenjing Liao and Hanrui Cheng and Yuzhou Wang and Xin Yang},
  year={2026},
  url={https://api.semanticscholar.org/CorpusID:288258595}
}

Acknowledgements

This project is based on VideoDepthAnything、VGGT and DepthAnythingV2. We thank the original authors for their excellent works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GemDepth: Geometry-Embedded Features for 3D-Consistent Video Depth

If you like our project, please give us a star ⭐ on GitHub for the latest updates!

🤗 Demo Video

📢 News

👋 Introduction

📝 Benchmarks performance

⏳ Usage

Preparation

Model weights

Use our model

Running script on video

Interactive Demo

✏️ Training Data

✈️ Evaluation

Prepare Evaluation Datasets

Run inference

Run evaluation

✈️ Training

✈️ Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
assets		assets
config		config
dataset		dataset
demo		demo
evaluation		evaluation
loss		loss
model		model
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

GemDepth: Geometry-Embedded Features for 3D-Consistent Video Depth

If you like our project, please give us a star ⭐ on GitHub for the latest updates!

🤗 Demo Video

📢 News

👋 Introduction

📝 Benchmarks performance

⏳ Usage

Preparation

Model weights

Use our model

Running script on video

Interactive Demo

✏️ Training Data

✈️ Evaluation

Prepare Evaluation Datasets

Run inference

Run evaluation

✈️ Training

✈️ Citation

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages