In [None]:
#hide 
from omnidata_tools.dataset.download import download
from fastcore.script import anno_parser
import os
os.environ["COLUMNS"] = '100'

# How to download the starter datset

> (in one line)

## Download/Installation

**`omnitools.download` is a one-line utility for rapidly downloading the starter (& similar) datasets.** For more about the tools themselves (`omnitools.download` and `omnitools.upload`), please see the [dedicated page](/omnidata-tools/omnitools.html).

<!-- **_NOTE:_  There's also a complementary `omnitools.upload` that compresses and stores datasets in a compliant format. If you use the omnidata annotator to create a new datset, then `omnitools.upload` might be useful for when you want to distribute that dataset. I.e. other people will be able to use the download tool to download your dataset.** -->

To download the starter dataset, make sure that omnidata-tooling is installed and then run the full download command which will prompt you to accept the component licenses to proceed:

**Run the following:** (Estimated download time for [_RGB + 1 Task + Masks_]: **1 day**) (_Full dataset_ [30TB]: **5 days**)
<br>

```bash
# Make sure everything is installed
sudo apt-get install aria2
pip install 'omnidata-tools' # Just to make sure it's installed

# Install the 'debug' subset of the Replica and Taskonomy components of the dataset
omnitools.download rgb normals point_info \
  --components replica taskonomy \
  --subset debug \
  --dest ./omnidata_starter_dataset/ --agree-all
```

You should see the prompt:

<img src="https://epfl-vilab.github.io/omnidata-tools/images/download_example.jpg" alt="drawing" style='max-width: 100%;'/>

## Examples

Here are some other examples:

Download the full Omnidata dataset and agree to licenses
```bash
omnitools.download all --components all --subset fullplus \
  --dest ./omnidata_starter_dataset/ \
  --connections_total 40 --agree
```

Download Taskonomy only:
```bash
omnitools.download all --components taskonomy --subset fullplus \
  --dest ./omnidata_starter_dataset/ \
  --connections_total 40 --agree
```

Omnidata but only depth and masks and keep the compressed files
```bash
omnitools.download rgb depth mask_valid --components all --subset fullplus \
  --dest ./omnidata_starter_dataset/ \
  --connections_total 40 --agree
```

Download meshes for Clevr
```bash
omnitools.download mesh --components clevr_simple --subset fullplus \
  --dest ./omnidata_starter_dataset/ \
  --dest_compressed ./omnidata_starter_dataset_compresssed --keep_compressed True \
  --connections_total 40 --agree
```

Use multiple workers to download Omnidata--this is for worker 7/100, but do a dryrun
```bash
omnitools.download all --components all --subset fullplus \
  --num_chunk 6 --num_total_chunks 100 \
  --dest ./omnidata_starter_dataset/ \
  --connections_total 40 --agree --dryrun
```

...you get the idea :)

### Command-line options

`omnitools.download` is pretty configurable, and you can choose which comonents/subset/split/tasks to download and extract. The downloader will spawn many workers to then download those compressed files, verify the download against checksums on the server, and unpack them. Here are the available options:

```bash
> omnitools.download -h
```

In [None]:
#hide_input
argparser = anno_parser(download)
argparser.prog = 'omnitools.download'
argparser.print_help()

usage: omnitools.download [-h] [--subset {debug,tiny,medium,full,fullplus}]
                          [--split {train,val,test,all}]
                          [--components {all,replica,taskonomy,gso_in_replica,hypersim,blendedmvs,hm3d,clevr_simple,clevr_complex} [{all,replica,taskonomy,gso_in_replica,hypersim,blendedmvs,hm3d,clevr_simple,clevr_complex} ...]]
                          [--dest DEST] [--dest_compressed DEST_COMPRESSED]
                          [--keep_compressed KEEP_COMPRESSED] [--only_download ONLY_DOWNLOAD]
                          [--max_tries_per_model MAX_TRIES_PER_MODEL]
                          [--connections_total CONNECTIONS_TOTAL]
                          [--connections_per_server_per_download CONNECTIONS_PER_SERVER_PER_DOWNLOAD]
                          [--n_workers N_WORKERS] [--num_chunk NUM_CHUNK]
                          [--num_total_chunks NUM_TOTAL_CHUNKS] [--ignore_checksum IGNORE_CHECKSUM]
                          [--dryrun] [--aria2_uri ARIA2_

## Citation
If you find the code or models useful, please cite our paper:
```
@inproceedings{eftekhar2021omnidata,
  title={Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets From 3D Scans},
  author={Eftekhar, Ainaz and Sax, Alexander and Malik, Jitendra and Zamir, Amir},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={10786--10796},
  year={2021}
}
```
