# Exploring and Downloading Datasets and Models

Let's start by exploring the repository of datasets and models. 

You can do that at the different accessibility layers of EOTDL: the user interface, the API, the command line interface (CLI) and the Python library.

## The User Interface

The easiest way to get started with EOTDL is by exploring the user interface: [https://eotdl.com/](https://www.eotdl.com/). Through the UI you will be able to:

- Explore the datasets and models available in the repository (filtering by name, tags and liked)
- Edit your own datasets and models information.
- Read the tutorials on the blog.
- Read the documentation.
- Find useful links to other resources (GitHub, Discord, ...)

![web](images/web.png)

## Quality levels

Datasets and models in EOTDL are categorized into quality levels. The quality levels are:

- **Q0**: datasets in the form of an archive with arbitary files without curation. This level is ideal for easy and fast upload/download of small datasets.
- **Q1**: datasets with STAC metadata but no QA. These datasets can leverage a limited set of EOTDL features.
- **Q2**: datasets with STAC metadata with the EOTDL custom extensions and automated QA. These datasets can leverage the full potential of the EOTDL.
- **Q3**: Q2 datasets that are manually curated. These datasets are the most reliable and can be used as benchmark datasets.

You will learn more about the quality levels in the [data curation](05_stac.ipynb) tutorial.

## The Command Line Interface

Even though the UI is the easiest way to get started, it is not the most convenient for actually working with the datasets and models. For that we recommend installing the CLI.

If you are running this notebook locally, consider creating a virtual environment before installing the CLI to avoid conflicts with other packages.

With conda:

```bash
conda create -n eotdl python=3.8
conda activate eotdl
```

With python: 

```bash
python -m venv eotdl
source eotdl/bin/activate
```

You may also have to install Jupyter on the new environment and restart the notebook.

Then, you can install the CLI with pip:

In [5]:
# uncomment to install

# !pip install eotdl

Once installed, you can execute the CLI with different commands. 

In [1]:
!eotdl --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl [OPTIONS] COMMAND [ARGS]...[0m[1m                                      [0m[1m [0m
[1m                                                                                [0m
 Welcome to EOTDL. Learn more at https://www.eotdl.com/                         
                                                                                
[2m╭─[0m[2m Options [0m[2m───────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-install[0m[1;36m-completion[0m          Install completion for the current shell.      [2m│[0m
[2m│[0m [1;36m-[0m[1;36m-show[0m[1;36m-completion[0m             Show completion for the current shell, to copy [2m│[0m
[2m│[0m                               it or customize the installation.              [2m│[0m
[2m│[0m [1;36m-[0m[1;36m-help[0m                        Show

In [2]:
!eotdl version

EOTDL Version: 2024.04.25


In [3]:
!eotdl datasets --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl datasets [OPTIONS] COMMAND [ARGS]...[0m[1m                             [0m[1m [0m
[1m                                                                                [0m
 Explore, ingest and download training datasets.                                
                                                                                
[2m╭─[0m[2m Options [0m[2m───────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-help[0m          Show this message and exit.                                  [2m│[0m
[2m╰──────────────────────────────────────────────────────────────────────────────╯[0m
[2m╭─[0m[2m Commands [0m[2m──────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36mget      [0m[1;36m [0m Download a dataset from the EOTDL.                         

You can explore datasets with the following command:

In [4]:
!eotdl datasets list 

['EuroSAT-Q1', 'EuroSAT-Q1-small', 'EuroSAT-Q2-small', 'SEN12MS-CR', 'DeepGlobeRoadExtraction', 'MassachusettsRoadsDataset', 'OpenEarthMap', 'ESA-Worldcover', 'AlignSAR-Groningen-Sentinel1-Q0', 'AI4EO-MapYourCity', 'Enhanced-Sentinel-2-Agriculture', 'WorldStrat', 'SEN2Venus', 'AlignSAR-Groningen-Sentinel1-Q1', 'SEN12MS', 'AlignSAR-Chennai-OilSpill-Sentinel1-Q0', 'PhilEO-downstream', 'Alignsar', 'boadella-dataset', 'EuroSAT-RGB', 'EuroSAT-RGB-Q1', 'EuroSAT-RGB-Q2', 'Boadella-BiDS23', 'COWC', 'Stanford-Drone-dataset', 'EuroSAT-RGB-STAC', 'BigEarthNet', 'xview2', 'LandcoverAI', 'open-cities-tt2-source', 'open-cities-tt1-source', 'open-cities-test', 'PASTIS-R', 'EuroCrops', 'SloveniaLandCover', 'ISPRS-Potsdam2D', 'SEN12-FLOOD', 'Urban3dChallenge', 'tropical-cyclone-dataset', 'Vessel-detection', 'Airplanes-detection', 'S2-SHIPS', 'SpaceNet-7', 'Sentinel-2-Cloud-Mask', 'PASTIS', 'FlodNet', 'SeCo100k', 'SeCo', 'AirbusAircraftDetection', 'AirbusWindTurbinesPatches', 'RoadNet', 'EuroSAT', 'UCMe

In [5]:
!eotdl datasets list --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl datasets list [OPTIONS][0m[1m                                          [0m[1m [0m
[1m                                                                                [0m
 Retrieve a list with all the datasets in the EOTDL.                            
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-name[0m[2m, it will filter the results by name. If no name is provided, [0m  
 [2mit will return all the datasets.[0m                                               
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-limit[0m[2m, it will limit the number of results. If no limit is [0m         
 [2mprovided, it will return all the datasets.[0m                                     
                                                                                
 [2mExamples[0m                                                                       
 [1;2;36m--------[0m 

In [6]:
!eotdl datasets list -n eurosat

['EuroSAT-Q1', 'EuroSAT-Q1-small', 'EuroSAT-Q2-small', 'EuroSAT-RGB', 'EuroSAT-RGB-Q1', 'EuroSAT-RGB-Q2', 'EuroSAT-RGB-STAC', 'EuroSAT']


As you may have guessed, you can download a dataset with the following command:

In [9]:
!eotdl datasets get EuroSAT-RGB -v 1

  0%|                                                   | 0/1 [00:00<?, ?file/s]
  0%|                                              | 0.00/90.3M [00:00<?, ?iB/s][A
 11%|████                                 | 10.0M/90.3M [00:02<00:22, 3.67MiB/s][A
 22%|████████▏                            | 20.0M/90.3M [00:03<00:10, 7.03MiB/s][A
 33%|████████████▎                        | 30.0M/90.3M [00:03<00:05, 11.0MiB/s][A
 44%|████████████████▍                    | 40.0M/90.3M [00:03<00:03, 15.7MiB/s][A
 55%|████████████████████▍                | 50.0M/90.3M [00:04<00:01, 21.4MiB/s][A
 66%|████████████████████████▌            | 60.0M/90.3M [00:04<00:01, 28.1MiB/s][A
 78%|████████████████████████████▋        | 70.0M/90.3M [00:04<00:00, 35.2MiB/s][A
 89%|████████████████████████████████▊    | 80.0M/90.3M [00:04<00:00, 42.2MiB/s][A
100%|█████████████████████████████████████| 90.3M/90.3M [00:04<00:00, 20.2MiB/s][A
100%|███████████████████████████████████████████| 1/1 [00:04<00:00,  4.98s/file

The first time you run the command, you will be asked to login (which will require you to create an account if you haven't already). You can also login with the command

In [4]:
!eotdl auth login

On your computer or mobile device navigate to:  https://earthpulse.eu.auth0.com/activate?user_code=CGZB-SPQB
Authenticated!
- Id Token: eyJhbGciOi...
Saved credentials to:  /home/juan/.cache/eotdl/creds.json
You are logged in as it@earthpulse.es


In [10]:
!eotdl auth --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl auth [OPTIONS] COMMAND [ARGS]...[0m[1m                                 [0m[1m [0m
[1m                                                                                [0m
 Login to EOTDL.                                                                
                                                                                
[2m╭─[0m[2m Options [0m[2m───────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-help[0m          Show this message and exit.                                  [2m│[0m
[2m╰──────────────────────────────────────────────────────────────────────────────╯[0m
[2m╭─[0m[2m Commands [0m[2m──────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36mlogin            [0m[1;36m [0m Login to the EOTDL.                                

In [11]:
!eotdl datasets get --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl datasets get [OPTIONS] [DATASET][0m[1m                                 [0m[1m [0m
[1m                                                                                [0m
 Download a dataset from the EOTDL.                                             
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-path[0m[2m, it will download the dataset to the specified path. If no [0m    
 [2mpath is provided, it will download to ~/.eotdl/datasets.[0m                       
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-file[0m[2m, it will download the specified file. If no file is provided, [0m 
 [2mit will download the entire dataset.[0m                                           
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-version[0m[2m, it will download the specified version. If no version is [0m  
 [2mprovided, it will download the latest version.[0m               

By default, datasets will be downloaded to your `$HOME/.cache/eotdl/datasets` folder or the path in the `EOTDL_DOWNLOAD_PATH` environment variable. You can change this with the `--path` argument.

In [12]:
!eotdl datasets get EuroSAT-RGB -v 1 -p data

  0%|                                                   | 0/1 [00:00<?, ?file/s]
  0%|                                              | 0.00/90.3M [00:00<?, ?iB/s][A
 11%|████                                 | 10.0M/90.3M [00:02<00:23, 3.61MiB/s][A
 22%|████████▏                            | 20.0M/90.3M [00:03<00:10, 6.93MiB/s][A
 33%|████████████▎                        | 30.0M/90.3M [00:03<00:05, 10.8MiB/s][A
 44%|████████████████▍                    | 40.0M/90.3M [00:03<00:03, 15.6MiB/s][A
 55%|████████████████████▍                | 50.0M/90.3M [00:04<00:01, 21.2MiB/s][A
 66%|████████████████████████▌            | 60.0M/90.3M [00:04<00:01, 27.8MiB/s][A
 78%|████████████████████████████▋        | 70.0M/90.3M [00:04<00:00, 34.6MiB/s][A
 89%|████████████████████████████████▊    | 80.0M/90.3M [00:04<00:00, 41.3MiB/s][A
100%|█████████████████████████████████████| 90.3M/90.3M [00:04<00:00, 19.9MiB/s][A
100%|███████████████████████████████████████████| 1/1 [00:05<00:00,  5.22s/file

You can choose a particular version to download with the `--version` argument. If you don't specify a version, the latest version will be downloaded.

In [14]:
!eotdl datasets get EuroSAT-RGB -p data -v 1

Dataset `EuroSAT-RGB v1` already exists at data/EuroSAT-RGB/v1. To force download, use force=True or -f in the CLI.


The version number will be used to create a folder with the same name inside the path you specified. Inside this folder you will find the dataset files.

If you try to re-download a datasets, the CLI will complain. You can force a re-download with the `--force` argument.

In [15]:
!eotdl datasets get EuroSAT-RGB -p data -v 1 -f

  0%|                                                   | 0/1 [00:00<?, ?file/s]
  0%|                                              | 0.00/90.3M [00:00<?, ?iB/s][A
 11%|████                                 | 10.0M/90.3M [00:02<00:23, 3.61MiB/s][A
 22%|████████▏                            | 20.0M/90.3M [00:03<00:10, 6.95MiB/s][A
 33%|████████████▎                        | 30.0M/90.3M [00:03<00:05, 10.9MiB/s][A
 44%|████████████████▍                    | 40.0M/90.3M [00:03<00:03, 15.6MiB/s][A
 55%|████████████████████▍                | 50.0M/90.3M [00:04<00:01, 21.2MiB/s][A
 66%|████████████████████████▌            | 60.0M/90.3M [00:04<00:01, 27.8MiB/s][A
 78%|████████████████████████████▋        | 70.0M/90.3M [00:04<00:00, 34.8MiB/s][A
 89%|████████████████████████████████▊    | 80.0M/90.3M [00:04<00:00, 41.8MiB/s][A
100%|█████████████████████████████████████| 90.3M/90.3M [00:04<00:00, 19.8MiB/s][A
100%|███████████████████████████████████████████| 1/1 [00:05<00:00,  5.07s/file

For Q1+ datasets, the `get` command will only download the STAC metadata of the dataset.

In [16]:
!eotdl datasets get EuroSAT-Q1-small -p data

To download assets, set assets=True or -a in the CLI.
Data available at data/EuroSAT-Q1-small/v1


Inside the metadata you will find the links to all the assets, so you can download them individually (maybe after some filtering or processing using only the metadata). However, you can download all assets with the command:

In [19]:
!eotdl datasets get EuroSAT-Q1-small -p data -a -f

100%|█████████████████████████████████████████| 200/200 [01:03<00:00,  3.16it/s]
Data available at data/EuroSAT-Q1-small/v1


Working with models is very much the same at this point.

In [20]:
!eotdl models --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl models [OPTIONS] COMMAND [ARGS]...[0m[1m                               [0m[1m [0m
[1m                                                                                [0m
 Explore, ingest and download ML models.                                        
                                                                                
[2m╭─[0m[2m Options [0m[2m───────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-help[0m          Show this message and exit.                                  [2m│[0m
[2m╰──────────────────────────────────────────────────────────────────────────────╯[0m
[2m╭─[0m[2m Commands [0m[2m──────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36mget      [0m[1;36m [0m Download a model from the EOTDL.                           

In [21]:
!eotdl models list

['RoadSegmentation', 'EuroSAT-RGB-BiDS23-Q1', 'MAVERICC', 'WALDO25', 'EuroSAT-RGB-BiDS23']


In [22]:
!eotdl models list --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl models list [OPTIONS][0m[1m                                            [0m[1m [0m
[1m                                                                                [0m
 Retrieve a list with all the models in the EOTDL.                              
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-name[0m[2m, it will filter the results by name. If no name is provided, [0m  
 [2mit will return all the models.[0m                                                 
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-limit[0m[2m, it will limit the number of results. If no limit is [0m         
 [2mprovided, it will return all the models.[0m                                       
                                                                                
 [2mExamples[0m                                                                       
 [1;2;36m--------[0m 

In [23]:
!eotdl models get RoadSegmentation

100%|███████████████████████████████████████████| 1/1 [00:05<00:00,  5.54s/file]
Data available at /home/juan/.cache/eotdl/models/RoadSegmentation/v1


In [24]:
!eotdl models get --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl models get [OPTIONS] [MODEL][0m[1m                                     [0m[1m [0m
[1m                                                                                [0m
 Download a model from the EOTDL.                                               
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-path[0m[2m, it will download the model to the specified path. If no path [0m 
 [2mis provided, it will download to ~/.eotdl/models.[0m                              
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-file[0m[2m, it will download the specified file. If no file is provided, [0m 
 [2mit will download the entire model.[0m                                             
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-version[0m[2m, it will download the specified version. If no version is [0m  
 [2mprovided, it will download the latest version.[0m               

We will explore how to ingest datasets and models in the next tutorials.

## The Library

Everything that we have done so far with the CLI is also enabled through the Python library. When installing the CLI, the library is automatically installed as well.

In [25]:
import eotdl

eotdl.__version__

'2024.04.25'

In [26]:
from eotdl.datasets import retrieve_datasets

datasets = retrieve_datasets()
len(datasets)

53

In [27]:
retrieve_datasets("eurosat")

['EuroSAT-Q1',
 'EuroSAT-Q1-small',
 'EuroSAT-Q2-small',
 'EuroSAT-RGB',
 'EuroSAT-RGB-Q1',
 'EuroSAT-RGB-Q2',
 'EuroSAT-RGB-STAC',
 'EuroSAT']

With the library, you have full control over the datasets and models.

In [28]:
[d for d in datasets if "eurosat" in d.lower()]

['EuroSAT-Q1',
 'EuroSAT-Q1-small',
 'EuroSAT-Q2-small',
 'EuroSAT-RGB',
 'EuroSAT-RGB-Q1',
 'EuroSAT-RGB-Q2',
 'EuroSAT-RGB-STAC',
 'EuroSAT']

You can download datasets as well, but now you will have to manage potential errors.

In [29]:
from eotdl.datasets import download_dataset

download_dataset("EuroSAT-RGB")

Exception: Dataset `EuroSAT-RGB v3` already exists at /home/juan/.cache/eotdl/datasets/EuroSAT-RGB/v3. To force download, use force=True or -f in the CLI.

In [31]:
download_dataset("EuroSAT-RGB", version=1, force=True)

100%|██████████| 90.3M/90.3M [00:04<00:00, 20.1MiB/s]
100%|██████████| 1/1 [00:05<00:00,  5.01s/file]


'/home/juan/.cache/eotdl/datasets/EuroSAT-RGB/v1'

In [32]:
download_dataset("EuroSAT-RGB",  version=1, force=True, path="data")

100%|██████████| 90.3M/90.3M [00:04<00:00, 20.2MiB/s]
100%|██████████| 1/1 [00:04<00:00,  4.96s/file]


'data/EuroSAT-RGB/v1'

In fact, the CLI is built on top of the library.

And the same for the models

In [33]:
from eotdl.models import retrieve_models

retrieve_models()

['RoadSegmentation',
 'EuroSAT-RGB-BiDS23-Q1',
 'MAVERICC',
 'WALDO25',
 'EuroSAT-RGB-BiDS23']

In [34]:
from eotdl.models import download_model 

path = download_model("RoadSegmentation", force=True)
path

100%|██████████| 1/1 [00:05<00:00,  5.85s/file]


'/home/juan/.cache/eotdl/models/RoadSegmentation/v1'

In [35]:
import os 

os.listdir(path)

['README.md', 'unet-resnet50.onnx']

## Discussion and Contribution opportunities

Feel free to ask questions now (live or through Discord) and make suggestions for future improvements.


- What features would like to see for exploration and downloading?