# Pretrained Models + Demo

> This page is for downloading and using the pretrained models in PyTorch. You can also try a <a href=//omnidata.vision/demo>demo in your browser</a> or <a href=//docs.omnidata.vision/training.html>train your own models</a>.
Short explanation of models: to demonstrate that data is capable of training strong models and not too much limited by rendering, mesh coarseness, etc, we train some models for different tasks. At the time of publishing, the models were comparable or better than sota (link oasis) for common multiple vision tasks.

We provide several of these models here: (maybe add one for curvature or sth?).

[ Images of results ]



## Installation

```bash
git clone https://github.com/Ainaz99/omnidata-tools
cd omnidata-tools/torch
conda create -n testenv -y python=3.8
source activate testenv
pip install -r requirements.txt
```
You can see the complete list of required packages in [omnidata-tools/torch/requirements.txt](https://github.com/Ainaz99/omnidata-tools/blob/main/requirements.txt). We recommend using virtualenv for the installation.

## Pretrained Models
We are providing our pretrained models which (as of publishing time) have state-of-the-art performance in depth and surface normal estimation.

#### Network Architecture
The surface normal network is based on the [UNet](https://arxiv.org/pdf/1505.04597.pdf) architecture (6 down/6 up). It is trained with both angular and L1 loss and input resolutions between 256 and 512.

The depth networks have DPT-based architectures (similar to [MiDaS v3.0](https://github.com/isl-org/MiDaS)) and are trained with scale- and shift-invariant loss and scale-invariant gradient matching term introduced in [MiDaS](https://arxiv.org/pdf/1907.01341v3.pdf), and also [virtual normal loss](https://openaccess.thecvf.com/content_ICCV_2019/papers/Yin_Enforcing_Geometric_Constraints_of_Virtual_Normal_for_Depth_Prediction_ICCV_2019_paper.pdf). You can see a public implementation of the MiDaS loss [here](#midas-implementation). We provide 2 pretrained depth models for both DPT-hybrid and DPT-large architectures with input resolution 384.

#### Download pretrained models
```bash
sh ./tools/download_depth_models.sh
sh ./tools/download_surface_normal_models.sh
```
These will download the pretrained models for `depth` and `normals` to a folder called `./pretrained_models`.

## Run our models on your own image
After downloading the [pretrained models](#pretrained-models), you can run them on your own image with the following command:
```bash
python demo.py --task $TASK --img_path $PATH_TO_IMAGE_OR_FOLDER --output_path $PATH_TO_SAVE_OUTPUT
```
The `--task` flag should be either `normal` or `depth`. To run the script for a `normal` target on an [example image](./assets/demo/test1.png):
```bash
python demo.py --task normal --img_path assets/demo/test1.png --output_path assets/
```


|  |   |   |   |  |  |  |
| :-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|
| <img src="/omnidata-tools/images/torch/demo/test1.png" style='max-width: 100%;'/> |  <img src="/omnidata-tools/images/torch/demo/test2.png" style='max-width: 100%;'/> | <img src="/omnidata-tools/images/torch/demo/test3.png" style='max-width: 100%;'/>  | <img src="/omnidata-tools/images/torch/demo/test4.png" style='max-width: 100%;'/>  | <img src="/omnidata-tools/images/torch/demo/test5.png" style='max-width: 100%;'/> | <img src="/omnidata-tools/images/torch/demo/test6.png" style='max-width: 100%;'/> | <img src="/omnidata-tools/images/torch/demo/test7.png" style='max-width: 100%;'/> |
| <img src="/omnidata-tools/images/torch/demo/test1_normal.png" style='max-width: 100%;'/> |  <img src="/omnidata-tools/images/torch/demo/test2_normal.png" style='max-width: 100%;'/> | <img src="/omnidata-tools/images/torch/demo/test3_normal.png" style='max-width: 100%;'/>  | <img src="/omnidata-tools/images/torch/demo/test4_normal.png" style='max-width: 100%;'/>  | <img src="/omnidata-tools/images/torch/demo/test5_normal.png" style='max-width: 100%;'/> | <img src="/omnidata-tools/images/torch/demo/test6_normal.png" style='max-width: 100%;'/> | <img src="/omnidata-tools/images/torch/demo/test7_normal.png" style='max-width: 100%;'/> |
| <img src="/omnidata-tools/images/torch/demo/test1_depth.png" style='max-width: 100%;'/> |  <img src="/omnidata-tools/images/torch/demo/test2_depth.png" style='max-width: 100%;'/> | <img src="/omnidata-tools/images/torch/demo/test3_depth.png" style='max-width: 100%;'/>  | <img src="/omnidata-tools/images/torch/demo/test4_depth.png" style='max-width: 100%;'/>  | <img src="/omnidata-tools/images/torch/demo/test5_depth.png" style='max-width: 100%;'/> | <img src="/omnidata-tools/images/torch/demo/test6_depth.png" style='max-width: 100%;'/> | <img src="/omnidata-tools/images/torch/demo/test7_depth.png" style='max-width: 100%;'/> |


## Citation
If you find the code or models useful, please cite our paper:
```
@inproceedings{eftekhar2021omnidata,
  title={Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets From 3D Scans},
  author={Eftekhar, Ainaz and Sax, Alexander and Malik, Jitendra and Zamir, Amir},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={10786--10796},
  year={2021}
}
```
