In [41]:
%load_ext autoreload
%autoreload 2

import os
os.environ["EOTDL_API_URL"] = "http://localhost:8000/"

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Exploring and Staging Datasets and Models

Let's start by exploring the repository of datasets and models. 

You can do that at the different accessibility layers of EOTDL: the user interface, the API, the command line interface (CLI) and the Python library.

## The User Interface

The easiest way to get started with EOTDL is by exploring the user interface: [https://eotdl.com/](https://www.eotdl.com/). Through the UI you will be able to:

- Explore the datasets and models available in the repository (filtering by name, tags and liked)
- Edit your own datasets and models information.
- Read the tutorials on the blog.
- Read the documentation.
- Find useful links to other resources (GitHub, Discord, ...)

## The Command Line Interface

Even though the UI is the easiest way to get started, it is not the most convenient for actually working with the datasets and models. For that we recommend installing the CLI.

If you are running this notebook locally, consider creating a virtual environment before installing the CLI to avoid conflicts with other packages.

For example, with UV:

```bash
uv venv 
source .venv/bin/activate
```

> Learn how to install and work with uv at https://docs.astral.sh/uv/getting-started/installation/#__tabbed_1_1

You may also have to install Jupyter on the new environment and restart the notebook.

Then, you can install the CLI with pip:

In [42]:
# uncomment to install

# !pip install eotdl

Once installed, you can execute the CLI with different commands. 

In [43]:
!eotdl --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl [OPTIONS] COMMAND [ARGS]...[0m[1m                                      [0m[1m [0m
[1m                                                                                [0m
 Welcome to EOTDL. Learn more at https://www.eotdl.com/                         
                                                                                
[2m╭─[0m[2m Options [0m[2m───────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-install[0m[1;36m-completion[0m          Install completion for the current shell.      [2m│[0m
[2m│[0m [1;36m-[0m[1;36m-show[0m[1;36m-completion[0m             Show completion for the current shell, to copy [2m│[0m
[2m│[0m                               it or customize the installation.              [2m│[0m
[2m│[0m [1;36m-[0m[1;36m-help[0m                        Show

In [44]:
!eotdl version

EOTDL Version: 2025.03.25


In [45]:
!eotdl datasets --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl datasets [OPTIONS] COMMAND [ARGS]...[0m[1m                             [0m[1m [0m
[1m                                                                                [0m
 Explore, ingest and download training datasets.                                
                                                                                
[2m╭─[0m[2m Options [0m[2m───────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-help[0m          Show this message and exit.                                  [2m│[0m
[2m╰──────────────────────────────────────────────────────────────────────────────╯[0m
[2m╭─[0m[2m Commands [0m[2m──────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36mget      [0m[1;36m [0m Download a dataset from the EOTDL.                         

You can explore datasets with the following command:

In [46]:
!eotdl datasets list 

['Test-links', 'EuroSAT-RGB-small-STAC', 'EuroSAT-small']


In [47]:
!eotdl datasets list --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl datasets list [OPTIONS][0m[1m                                          [0m[1m [0m
[1m                                                                                [0m
 Retrieve a list with all the datasets in the EOTDL.                            
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-name[0m[2m, it will filter the results by name. If no name is provided, [0m  
 [2mit will return all the datasets.[0m                                               
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-limit[0m[2m, it will limit the number of results. If no limit is [0m         
 [2mprovided, it will return all the datasets.[0m                                     
                                                                                
 [2mExamples[0m                                                                       
 [1;2;36m--------[0m 

In [48]:
!eotdl datasets list -n eurosat

['EuroSAT-RGB-small-STAC', 'EuroSAT-small']


As you may have guessed, you can stage a dataset with the following command:

In [49]:
!eotdl datasets get EuroSAT-small -f

Data available at /Users/juan/.cache/eotdl/datasets/EuroSAT-small


The first time you run the command, you will be asked to login (which will require you to create an account if you haven't already). You can also login with the command

In [11]:
!eotdl auth login

You are logged in as it@earthpulse.es


In [12]:
!eotdl auth --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl auth [OPTIONS] COMMAND [ARGS]...[0m[1m                                 [0m[1m [0m
[1m                                                                                [0m
 Login to EOTDL.                                                                
                                                                                
[2m╭─[0m[2m Options [0m[2m───────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-help[0m          Show this message and exit.                                  [2m│[0m
[2m╰──────────────────────────────────────────────────────────────────────────────╯[0m
[2m╭─[0m[2m Commands [0m[2m──────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36mlogin            [0m[1;36m [0m Login to the EOTDL.                                

In [13]:
!eotdl datasets get --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl datasets get [OPTIONS] [DATASET][0m[1m                                 [0m[1m [0m
[1m                                                                                [0m
 Download a dataset from the EOTDL.                                             
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-path[0m[2m, it will download the dataset to the specified path. If no [0m    
 [2mpath is provided, it will download to ~/.eotdl/datasets.[0m                       
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-file[0m[2m, it will download the specified file. If no file is provided, [0m 
 [2mit will download the entire dataset.[0m                                           
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-version[0m[2m, it will download the specified version. If no version is [0m  
 [2mprovided, it will download the latest version.[0m               

By default, datasets will be staged to your `$HOME/.cache/eotdl/datasets` folder or the path in the `EOTDL_DOWNLOAD_PATH` environment variable. You can change this with the `--path` argument.

In [14]:
!eotdl datasets get EuroSAT-small -p data -f

Data available at data/EuroSAT-small


You can choose a particular version to download with the `--version` argument. If you don't specify a version, the latest version will be downloaded.

In [16]:
!eotdl datasets get EuroSAT-small -p data -v 1 -f

Data available at data/EuroSAT-small


In [17]:
!eotdl datasets get EuroSAT-small -p data -v 2 -f

Data available at data/EuroSAT-small


If you try to re-download a datasets, the CLI will complain. You can force a re-download with the `--force` argument.

In [18]:
!eotdl datasets get EuroSAT-small -p data

Dataset `EuroSAT-small` already exists at data/EuroSAT-small. To force download, use force=True or -f in the CLI.


In [19]:
!eotdl datasets get EuroSAT-small -p data -f

Data available at data/EuroSAT-small


By default, the `get` command will only stage the dataset metadata.

In [20]:
os.listdir("data/EuroSAT-small")

['catalog.v1.parquet', 'catalog.v2.parquet']

Inside the metadata you will find the links to all the assets, so you can download them individually (maybe after some filtering or processing using only the metadata). 

In [21]:
import geopandas as gpd

gdf = gpd.read_parquet("data/EuroSAT-small/catalog.v1.parquet")
gdf.head()

Unnamed: 0,type,stac_version,stac_extensions,datetime,id,bbox,geometry,assets,links,repository
0,Feature,1.0.0,[],2025-03-25 15:32:28.130806,README.md,"{'xmax': 0.0, 'xmin': 0.0, 'ymax': 0.0, 'ymin'...",POLYGON EMPTY,{'asset': {'checksum': 'a6bb30a57d0f5ff0aaa65b...,[],eotdl
1,Feature,1.0.0,[],2025-03-25 15:32:28.130935,Forest/Forest_1.tif,"{'xmax': 0.0, 'xmin': 0.0, 'ymax': 0.0, 'ymin'...",POLYGON EMPTY,{'asset': {'checksum': 'f3b8b9fef6b2df6f24792e...,[],eotdl
2,Feature,1.0.0,[],2025-03-25 15:32:28.131050,Forest/Forest_2.tif,"{'xmax': 0.0, 'xmin': 0.0, 'ymax': 0.0, 'ymin'...",POLYGON EMPTY,{'asset': {'checksum': '2e38dab64435bfbab25bab...,[],eotdl
3,Feature,1.0.0,[],2025-03-25 15:32:28.131141,Forest/Forest_3.tif,"{'xmax': 0.0, 'xmin': 0.0, 'ymax': 0.0, 'ymin'...",POLYGON EMPTY,{'asset': {'checksum': '3e7bb982f9db5f7dabc556...,[],eotdl
4,Feature,1.0.0,[],2025-03-25 15:32:28.131231,AnnualCrop/AnnualCrop_2.tif,"{'xmax': 0.0, 'xmin': 0.0, 'ymax': 0.0, 'ymin'...",POLYGON EMPTY,{'asset': {'checksum': 'c406cb8920858b98898b9e...,[],eotdl


In [22]:
from eotdl.datasets import stage_dataset_file

for _, row in gdf.sample(3).iterrows(): # or do some filtering
    for k, v in row['assets'].items():
        path = stage_dataset_file(v['href'], "data/outputs")
        print(path)

data/outputs/AnnualCrop/AnnualCrop_3.tif
data/outputs/README.md
data/outputs/AnnualCrop/AnnualCrop_1.tif


However, you can stage all the assets with the command:

In [23]:
!eotdl datasets get EuroSAT-small -p data -a -f

Staging assets: 100%|█████████████████████████████| 8/8 [00:00<00:00, 82.43it/s]
Data available at data/EuroSAT-small


In [24]:
from glob import glob
from pathlib import Path

files = glob("data/EuroSAT-small/**/*", recursive=True)
files = [file for file in files if Path(file).is_file()]
len(files), files[:5]

(10,
 ['data/EuroSAT-small/README.md',
  'data/EuroSAT-small/catalog.v1.parquet',
  'data/EuroSAT-small/test.txt',
  'data/EuroSAT-small/catalog.v2.parquet',
  'data/EuroSAT-small/Forest/Forest_1.tif'])

Working with models is very much the same at this point.

In [25]:
!eotdl models --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl models [OPTIONS] COMMAND [ARGS]...[0m[1m                               [0m[1m [0m
[1m                                                                                [0m
 Explore, ingest and download ML models.                                        
                                                                                
[2m╭─[0m[2m Options [0m[2m───────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-help[0m          Show this message and exit.                                  [2m│[0m
[2m╰──────────────────────────────────────────────────────────────────────────────╯[0m
[2m╭─[0m[2m Commands [0m[2m──────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36mget      [0m[1;36m [0m Download a model from the EOTDL.                           

In [26]:
!eotdl models list

['RoadSegmentation']


In [27]:
!eotdl models list --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl models list [OPTIONS][0m[1m                                            [0m[1m [0m
[1m                                                                                [0m
 Retrieve a list with all the models in the EOTDL.                              
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-name[0m[2m, it will filter the results by name. If no name is provided, [0m  
 [2mit will return all the models.[0m                                                 
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-limit[0m[2m, it will limit the number of results. If no limit is [0m         
 [2mprovided, it will return all the models.[0m                                       
                                                                                
 [2mExamples[0m                                                                       
 [1;2;36m--------[0m 

In [28]:
!eotdl models get RoadSegmentation

Data available at /Users/juan/.cache/eotdl/models/RoadSegmentation


In [29]:
!eotdl models get --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl models get [OPTIONS] [MODEL][0m[1m                                     [0m[1m [0m
[1m                                                                                [0m
 Download a model from the EOTDL.                                               
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-path[0m[2m, it will download the model to the specified path. If no path [0m 
 [2mis provided, it will download to ~/.eotdl/models.[0m                              
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-file[0m[2m, it will download the specified file. If no file is provided, [0m 
 [2mit will download the entire model.[0m                                             
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-version[0m[2m, it will download the specified version. If no version is [0m  
 [2mprovided, it will download the latest version.[0m               

We will explore how to ingest datasets and models in the next tutorials.

## The Library

Everything that we have done so far with the CLI is also enabled through the Python library. When installing the CLI, the library is automatically installed as well.

In [30]:
import eotdl

eotdl.__version__

'2025.03.25'

In [31]:
from eotdl.datasets import retrieve_datasets

datasets = retrieve_datasets()
len(datasets)

3

In [32]:
retrieve_datasets("eurosat")

['EuroSAT-RGB-small-STAC', 'EuroSAT-small']

With the library, you have full control over the datasets and models.

In [33]:
[d for d in datasets if "small" in d.lower()]

['EuroSAT-RGB-small-STAC', 'EuroSAT-small']

You can stage datasets as well, but now you will have to manage potential errors.

In [34]:
from eotdl.datasets import stage_dataset

try:
	stage_dataset("EuroSAT-small")
except Exception as e:
	print("ERROR")
	print(e)


ERROR
Dataset `EuroSAT-small` already exists at /Users/juan/.cache/eotdl/datasets/EuroSAT-small. To force download, use force=True or -f in the CLI.


In [35]:
stage_dataset("EuroSAT-small", force=True)

'/Users/juan/.cache/eotdl/datasets/EuroSAT-small'

In [36]:
stage_dataset("EuroSAT-small", force=True, path="data")

'data/EuroSAT-small'

In [37]:
stage_dataset("EuroSAT-small", force=True, path="data", version=1)

'data/EuroSAT-small'

In [38]:
stage_dataset("EuroSAT-small", force=True, path="data", assets=True)

Staging assets: 100%|██████████| 8/8 [00:00<00:00, 82.33it/s]


'data/EuroSAT-small'

In fact, the CLI is built on top of the library.

And the same for the models

In [39]:
from eotdl.models import retrieve_models

retrieve_models()

['RoadSegmentation']

In [40]:
from eotdl.models import stage_model

path = stage_model("RoadSegmentation", force=True)
path

'/Users/juan/.cache/eotdl/models/RoadSegmentation'