# Exploring and Downloading Datasets and Models

Let's start by exploring the repository of datasets and models. 

You can do that at the different accessibility layers of EOTDL: the user interface, the API, the command line interface (CLI) and the Python library.

## The User Interface

The easiest way to get started with EOTDL is by exploring the user interface: [https://eotdl.com/](https://www.eotdl.com/). Through the UI you will be able to:

- Explore the datasets and models available in the repository (filtering by name, tags and liked)
- Edit your own datasets and models information.
- Read the tutorials on the blog.
- Read the documentation.
- Find useful links to other resources (GitHub, Discord, ...)

![web](images/web.png)

## Quality levels

Datasets and models in EOTDL are categorized into quality levels. The quality levels are:

- **Q0**: datasets in the form of an archive with arbitary files without curation. This level is ideal for easy and fast upload/download of small datasets.
- **Q1**: datasets with STAC metadata but no QA. These datasets can leverage a limited set of EOTDL features.
- **Q2**: datasets with STAC metadata with the EOTDL custom extensions and automated QA. These datasets can leverage the full potential of the EOTDL.
- **Q3**: Q2 datasets that are manually curated. These datasets are the most reliable and can be used as benchmark datasets.

You will learn more about the quality levels in the [data curation](05_stac.ipynb) tutorial.

## The Command Line Interface

Even though the UI is the easiest way to get started, it is not the most convenient for actually working with the datasets and models. For that we recommend installing the CLI.

If you are running this notebook locally, consider creating a virtual environment before installing the CLI to avoid conflicts with other packages.

With conda:

```bash
conda create -n eotdl python=3.8
conda activate eotdl
```

With python: 

```bash
python -m venv eotdl
source eotdl/bin/activate
```

You may also have to install Jupyter on the new environment and restart the notebook.

Then, you can install the CLI with pip:

In [1]:
# uncomment to install

# !pip install eotdl

Once installed, you can execute the CLI with different commands. 

In [3]:
!eotdl --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl [OPTIONS] COMMAND [ARGS]...[0m[1m                                      [0m[1m [0m
[1m                                                                                [0m
 Welcome to EOTDL. Learn more at https://www.eotdl.com/                         
                                                                                
[2m╭─[0m[2m Options [0m[2m───────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-install[0m[1;36m-completion[0m          Install completion for the current shell.      [2m│[0m
[2m│[0m [1;36m-[0m[1;36m-show[0m[1;36m-completion[0m             Show completion for the current shell, to copy [2m│[0m
[2m│[0m                               it or customize the installation.              [2m│[0m
[2m│[0m [1;36m-[0m[1;36m-help[0m                        Show

In [4]:
!eotdl version

EOTDL Version: 2023.11.03


In [5]:
!eotdl datasets --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl datasets [OPTIONS] COMMAND [ARGS]...[0m[1m                             [0m[1m [0m
[1m                                                                                [0m
 Explore, ingest and download training datasets.                                
                                                                                
[2m╭─[0m[2m Options [0m[2m───────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-help[0m          Show this message and exit.                                  [2m│[0m
[2m╰──────────────────────────────────────────────────────────────────────────────╯[0m
[2m╭─[0m[2m Commands [0m[2m──────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36mget      [0m[1;36m [0m Download a dataset from the EOTDL.                         

You can explore datasets with the following command:

In [6]:
!eotdl datasets list 

['EuroSAT-RGB', 'UCMerced', 'EuroSAT', 'SeCo100k', 'SeCo', 'AirbusAircraftDetection', 'AirbusWindTurbinesPatches', 'RoadNet', 'SloveniaLandCover', 'ISPRS-Potsdam2D', 'SEN12-FLOOD', 'Urban3dChallenge', 'tropical-cyclone-dataset', 'Vessel-detection', 'Airplanes-detection', 'S2-SHIPS', 'SpaceNet-7', 'Sentinel-2-Cloud-Mask', 'PASTIS', 'FlodNet', 'EuroCrops', 'open-cities-test', 'PASTIS-R', 'open-cities-tt1-source', 'open-cities-tt2-source', 'LandcoverAI', 'xview2', 'BigEarthNet', 'EuroSAT-RGB-STAC', 'EuroSAT-STAC', 'COWC', 'Stanford-Drone-dataset', 'EuroSAT-small', 'test-q0', 'Boadella-BiDS23']


In [7]:
!eotdl datasets list --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl datasets list [OPTIONS][0m[1m                                          [0m[1m [0m
[1m                                                                                [0m
 Retrieve a list with all the datasets in the EOTDL.                            
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-name[0m[2m, it will filter the results by name. If no name is provided, [0m  
 [2mit will return all the datasets.[0m                                               
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-limit[0m[2m, it will limit the number of results. If no limit is [0m         
 [2mprovided, it will return all the datasets.[0m                                     
                                                                                
 [2mExamples[0m                                                                       
 [1;2;36m--------[0m 

In [8]:
!eotdl datasets list -n eurosat

['EuroSAT-RGB', 'EuroSAT', 'EuroSAT-RGB-STAC', 'EuroSAT-STAC', 'EuroSAT-small']


As you may have guessed, you can download a dataset with the following command:

In [10]:
!eotdl datasets get EuroSAT-small

100%|███████████████████████████████████████████| 7/7 [00:02<00:00,  2.41file/s]
Data available at /home/juan/.cache/eotdl/datasets/EuroSAT-small/v10


The first time you run the command, you will be asked to login (which will require you to create an account if you haven't already). You can also login with the command

In [14]:
!eotdl auth login

You are logged in as it@earthpulse.es


In [11]:
!eotdl auth --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl auth [OPTIONS] COMMAND [ARGS]...[0m[1m                                 [0m[1m [0m
[1m                                                                                [0m
 Login to EOTDL.                                                                
                                                                                
[2m╭─[0m[2m Options [0m[2m───────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-help[0m          Show this message and exit.                                  [2m│[0m
[2m╰──────────────────────────────────────────────────────────────────────────────╯[0m
[2m╭─[0m[2m Commands [0m[2m──────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36mlogin            [0m[1;36m [0m Login to the EOTDL.                                

In [15]:
!eotdl datasets get --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl datasets get [OPTIONS] [DATASET][0m[1m                                 [0m[1m [0m
[1m                                                                                [0m
 Download a dataset from the EOTDL.                                             
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-path[0m[2m, it will download the dataset to the specified path. If no [0m    
 [2mpath is provided, it will download to ~/.eotdl/datasets.[0m                       
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-file[0m[2m, it will download the specified file. If no file is provided, [0m 
 [2mit will download the entire dataset.[0m                                           
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-version[0m[2m, it will download the specified version. If no version is [0m  
 [2mprovided, it will download the latest version.[0m               

By default, datasets will be downloaded to your `$HOME/.cache/eotdl/datasets` folder or the path in the `EOTDL_DOWNLOAD_PATH` environment variable. You can change this with the `--path` argument.

In [16]:
!eotdl datasets get EuroSAT-small -p data

100%|███████████████████████████████████████████| 7/7 [00:02<00:00,  2.71file/s]
Data available at data/EuroSAT-small/v10


You can choose a particular version to download with the `--version` argument. If you don't specify a version, the latest version will be downloaded.

In [17]:
!eotdl datasets get EuroSAT-small -p data -v 1

Dataset `EuroSAT-small v1` already exists at data/EuroSAT-small/v1. To force download, use force=True or -f in the CLI.


The version number will be used to create a folder with the same name inside the path you specified. Inside this folder you will find the dataset files.

If you try to re-download a datasets, the CLI will complain. You can force a re-download with the `--force` argument.

In [18]:
!eotdl datasets get EuroSAT-small -p data -v 1

Dataset `EuroSAT-small v1` already exists at data/EuroSAT-small/v1. To force download, use force=True or -f in the CLI.


In [19]:
!eotdl datasets get EuroSAT-small -p data -v 1 -f

100%|███████████████████████████████████████████| 6/6 [00:02<00:00,  2.48file/s]
Data available at data/EuroSAT-small/v1


For Q1+ datasets, the `get` command will only download the STAC metadata of the dataset.

In [21]:
!eotdl datasets get EuroSAT-RGB-Q1 -p data 

Downloading a STAC dataset is not implemented


Inside the metadata you will find the links to all the assets, so you can download them individually (maybe after some filtering or processing using only the metadata). However, you can download all assets with the command:

In [7]:
!eotdl datasets get eurosat-rgb -p data -a

Downloading a STAC dataset is not implemented


Working with models is very much the same at this point.

In [24]:
!eotdl models --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl models [OPTIONS] COMMAND [ARGS]...[0m[1m                               [0m[1m [0m
[1m                                                                                [0m
 Explore, ingest and download ML models.                                        
                                                                                
[2m╭─[0m[2m Options [0m[2m───────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-help[0m          Show this message and exit.                                  [2m│[0m
[2m╰──────────────────────────────────────────────────────────────────────────────╯[0m
[2m╭─[0m[2m Commands [0m[2m──────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36mget      [0m[1;36m [0m Download a model from the EOTDL.                           

In [25]:
!eotdl models list

['EuroSAT-RGB-BiDS23']


In [26]:
!eotdl models list --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl models list [OPTIONS][0m[1m                                            [0m[1m [0m
[1m                                                                                [0m
 Retrieve a list with all the models in the EOTDL.                              
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-name[0m[2m, it will filter the results by name. If no name is provided, [0m  
 [2mit will return all the models.[0m                                                 
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-limit[0m[2m, it will limit the number of results. If no limit is [0m         
 [2mprovided, it will return all the models.[0m                                       
                                                                                
 [2mExamples[0m                                                                       
 [1;2;36m--------[0m 

In [27]:
!eotdl models get EuroSAT-RGB-BiDS23

100%|███████████████████████████████████████████| 2/2 [00:04<00:00,  2.02s/file]
Data available at /home/juan/.cache/eotdl/models/EuroSAT-RGB-BiDS23/v1


In [28]:
!eotdl models get --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl models get [OPTIONS] [MODEL][0m[1m                                     [0m[1m [0m
[1m                                                                                [0m
 Download a model from the EOTDL.                                               
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-path[0m[2m, it will download the model to the specified path. If no path [0m 
 [2mis provided, it will download to ~/.eotdl/models.[0m                              
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-file[0m[2m, it will download the specified file. If no file is provided, [0m 
 [2mit will download the entire model.[0m                                             
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-version[0m[2m, it will download the specified version. If no version is [0m  
 [2mprovided, it will download the latest version.[0m               

We will explore how to ingest datasets and models in the next tutorials.

## The Library

Everything that we have done so far with the CLI is also enabled through the Python library. When installing the CLI, the library is automatically installed as well.

In [29]:
import eotdl

eotdl.__version__

'2023.11.02-3'

In [30]:
from eotdl.datasets import retrieve_datasets

datasets = retrieve_datasets()
len(datasets)

36

In [31]:
retrieve_datasets("eurosat")

['EuroSAT-RGB',
 'EuroSAT',
 'EuroSAT-RGB-STAC',
 'EuroSAT-STAC',
 'eurosat-rgb',
 'eurosat-rgb-q2',
 'EuroSAT-small']

With the library, you have full control over the datasets and models.

In [32]:
[d for d in datasets if "eurosat" in d.lower()]

['EuroSAT-RGB',
 'EuroSAT',
 'EuroSAT-RGB-STAC',
 'EuroSAT-STAC',
 'eurosat-rgb',
 'eurosat-rgb-q2',
 'EuroSAT-small']

You can download datasets as well, but now you will have to manage potential errors.

In [35]:
from eotdl.datasets import download_dataset

download_dataset("EuroSAT-small")

Exception: Dataset `EuroSAT-small v9` already exists at /home/juan/.cache/eotdl/datasets/EuroSAT-small/v9. To force download, use force=True or -f in the CLI.

In [36]:
download_dataset("EuroSAT-small", force=True)

100%|██████████| 7/7 [00:02<00:00,  2.48file/s]


'/home/juan/.cache/eotdl/datasets/EuroSAT-small/v9'

In [37]:
download_dataset("EuroSAT-small", force=True, path="data")

100%|██████████| 7/7 [00:02<00:00,  2.66file/s]


'data/EuroSAT-small/v9'

In fact, the CLI is built on top of the library.

And the same for the models

In [3]:
from eotdl.models import retrieve_models

retrieve_models()

['EuroSAT-RGB-BiDS23']

In [6]:
from eotdl.models import download_model 

path = download_model("EuroSAT-RGB-BiDS23", force=True)
path

100%|██████████| 2/2 [00:03<00:00,  1.95s/file]


'/home/juan/.cache/eotdl/models/EuroSAT-RGB-BiDS23/v1'

In [7]:
import os 

os.listdir(path)

['metadata.yml', 'model.onnx']

## The Application Programming Interface

The last way to interact with EOTDL is using the API. You can explore the interactive documentation at [https://api.eotdl.com/docs](https://api.eotdl.com/docs)

You can get the full list of datasets hosted in the EOTDL with the followgin API call:

In [8]:
import requests

datasets = requests.get("https://api.eotdl.com/datasets").json()
datasets

[{'uid': 'auth0|616b0057af0c7500691a026e',
  'id': '6454b4ba05740a8762edfcdb',
  'name': 'EuroSAT-RGB',
  'authors': [' Patrick Helber'],
  'source': 'http://madm.dfki.de/downloads',
  'license': '-',
  'files': '6526972d7d4d50bd035d033d',
  'versions': [{'version_id': 1,
    'createdAt': '2023-10-11T14:37:38.155',
    'size': 377122268},
   {'version_id': 2, 'createdAt': '2023-10-11T15:38:45.833', 'size': 0},
   {'version_id': 3, 'createdAt': '2023-10-11T15:38:45.833', 'size': 0},
   {'version_id': 4, 'createdAt': '2023-10-11T15:38:45.833', 'size': 0},
   {'version_id': 5, 'createdAt': '2023-10-11T15:38:45.833', 'size': 0},
   {'version_id': 6, 'createdAt': '2023-10-11T15:38:45.833', 'size': 0},
   {'version_id': 7, 'createdAt': '2023-10-11T15:38:45.833', 'size': 0},
   {'version_id': 8, 'createdAt': '2023-10-12T07:14:16.642', 'size': 5406191},
   {'version_id': 9, 'createdAt': '2023-10-12T07:14:16.642', 'size': 5610422},
   {'version_id': 10,
    'createdAt': '2023-10-12T07:14:16.642

As you can see, here you get all the information about the dataset, not only the name (author, license, versions, etc). This is why the API is ideal for building third party applications on top of EOTDL.

In [9]:
datasets = requests.get("https://api.eotdl.com/datasets?match=eurosat-small&limit=1").json()
[(d['name'], d['id'], d['files'], len(d['versions'])) for d in datasets]	

[('EuroSAT-small', '6526accffd974011abc2413a', '6526accffd974011abc2413b', 9)]

In fact, the library (and CLI) are built on top of the API, so you can achieve the same functionality (or even better!) on your own applications.

In [10]:
files = requests.get("https://api.eotdl.com/datasets/6526accffd974011abc2413a/files?version=2").json()
len(files)

6

In [12]:
files[0]

{'filename': 'Forest/Forest_3.tif',
 'version': 1,
 'checksum': '3e7bb982f9db5f7dabc556016c3d081dfb1fb73d'}

Some API calls requires you to be authenticated. You can do that with as follows:

- Use the `auth/login` endpoint to get a login URL and a code
- Navigate to the login URL to login
- Use the `auth/token` endpoint to get a token with the provided code
- Use the token to authenticate your requests

In [66]:
import os

token = '...'

file = files[0]
filename = file["filename"]
filepath = f'data/{filename}'

os.makedirs(os.path.dirname(filepath), exist_ok=True)
response = requests.get(
    f'https://api.eotdl.com/datasets/6526accffd974011abc2413a/download/{filename}?version=1', 
    headers={'Authorization': f'Bearer {token}'},
    stream=True
)
response.raise_for_status()

with open(filepath, 'wb') as file:
    for chunk in response.iter_content(chunk_size=8192):
        file.write(chunk)


HTTPError: 401 Client Error: Unauthorized for url: https://api.eotdl.com/datasets/6526accffd974011abc2413a/download/Forest/Forest_3.tif?version=1

## Roadmap

These are some features we are planning to add in order to enhance the exploration and download experience:

- Visual exploration tools in the UI
- Geographical and temporal search for Q1+ datasets (bounding box)
- Metadata queries for Q1+ datasets

## Discussion and Contribution opportunities

Feel free to ask questions now (live or through Discord) and make suggestions for future improvements.


- What features would like to see for exploration and downloading?