Skip to content

Latest commit

 

History

History
159 lines (105 loc) · 6.67 KB

README.md

File metadata and controls

159 lines (105 loc) · 6.67 KB

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer

This repository contains code to compute depth from a single image. It accompanies our paper:

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer
René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, Vladlen Koltun

and our preprint:

Vision Transformers for Dense Prediction
René Ranftl, Alexey Bochkovskiy, Vladlen Koltun

MiDaS was trained on 10 datasets (ReDWeb, DIML, Movies, MegaDepth, WSVD, TartanAir, HRWSI, ApolloScape, BlendedMVS, IRS) with multi-objective optimization. The original model that was trained on 5 datasets (MIX 5 in the paper) can be found here.

Changelog

Setup

  1. Pick one or more models and download corresponding weights to the weights folder:
  • For highest quality: dpt_large
  • For moderately less quality, but better speed on CPU and slower GPUs: dpt_hybrid
  • For real-time applications on resource-constrained devices: midas_v21_small
  • Legacy convolutional model: midas_v21
  1. Set up dependencies:

    conda install pytorch torchvision opencv
    pip install timm

    The code was tested with Python 3.7, PyTorch 1.8.0, OpenCV 4.5.1, and timm 0.4.5.

Usage

  1. Place one or more input images in the folder input.

  2. Run the model:

    python run.py --model_type dpt_large
    python run.py --model_type dpt_hybrid 
    python run.py --model_type midas_v21_small
    python run.py --model_type midas_v21
  3. The resulting inverse depth maps are written to the output folder.

via Docker

  1. Make sure you have installed Docker and the NVIDIA Docker runtime.

  2. Build the Docker image:

    docker build -t midas .
  3. Run inference:

    docker run --rm --gpus all -v $PWD/input:/opt/MiDaS/input -v $PWD/output:/opt/MiDaS/output midas

    This command passes through all of your NVIDIA GPUs to the container, mounts the input and output directories and then runs the inference.

via PyTorch Hub

The pretrained model is also available on PyTorch Hub

via TensorFlow or ONNX

See README in the tf subdirectory.

Currently only supports MiDaS v2.1. DPT-based models to be added.

via Mobile (iOS / Android)

See README in the mobile subdirectory.

via ROS1 (Robot Operating System)

See README in the ros subdirectory.

Currently only supports MiDaS v2.1. DPT-based models to be added.

Accuracy

Zero-shot error (the lower - the better) and speed (FPS):

Model DIW, WHDR Eth3d, AbsRel Sintel, AbsRel Kitti, δ>1.25 NyuDepthV2, δ>1.25 TUM, δ>1.25 Speed, FPS
Small models: iPhone 11
MiDaS v2 small 0.1248 0.1550 0.3300 21.81 15.73 17.00 0.6
MiDaS v2.1 small URL 0.1344 0.1344 0.3370 29.27 13.43 14.53 30
Big models: GPU RTX 3090
MiDaS v2 large URL 0.1246 0.1290 0.3270 23.90 9.55 14.29 51
MiDaS v2.1 large URL 0.1295 0.1155 0.3285 16.08 8.71 12.51 51
MiDaS v3.0 DPT-Hybrid URL 0.1106 0.0934 0.2741 11.56 8.69 10.89 46
MiDaS v3.0 DPT-Large URL 0.1082 0.0888 0.2697 8.46 8.32 9.97 47

Citation

Please cite our paper if you use this code or any of the models:

@article{Ranftl2020,
	author    = {Ren\'{e} Ranftl and Katrin Lasinger and David Hafner and Konrad Schindler and Vladlen Koltun},
	title     = {Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer},
	journal   = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
	year      = {2020},
}

If you use a DPT-based model, please also cite:

@article{Ranftl2021,
	author    = {Ren\'{e} Ranftl and Alexey Bochkovskiy and Vladlen Koltun},
	title     = {Vision Transformers for Dense Prediction},
	journal   = {ArXiv preprint},
	year      = {2021},
}

Acknowledgements

Our work builds on and uses code from timm. We'd like to thank the author for making these libraries available.

License

MIT License