In [1]:
%load_ext autoreload
%autoreload 2

import os
os.environ["EOTDL_API_URL"] = "http://localhost:8000/"

# Exploring and Downloading Datasets and Models

Let's start by exploring the repository of datasets and models. 

You can do that at the different accessibility layers of EOTDL: the user interface, the API, the command line interface (CLI) and the Python library.

## The User Interface

The easiest way to get started with EOTDL is by exploring the user interface: [https://eotdl.com/](https://www.eotdl.com/). Through the UI you will be able to:

- Explore the datasets and models available in the repository (filtering by name, tags and liked)
- Edit your own datasets and models information.
- Read the tutorials on the blog.
- Read the documentation.
- Find useful links to other resources (GitHub, Discord, ...)

## Quality levels

Datasets and models in EOTDL have an associated quality level, which depends on their metadata.

> TODO: quality

## The Command Line Interface

Even though the UI is the easiest way to get started, it is not the most convenient for actually working with the datasets and models. For that we recommend installing the CLI.

If you are running this notebook locally, consider creating a virtual environment before installing the CLI to avoid conflicts with other packages.

For example, with UV:

```bash
uv venv 
source .venv/bin/activate
```

> Learn how to install and work with uv at https://docs.astral.sh/uv/getting-started/installation/#__tabbed_1_1

You may also have to install Jupyter on the new environment and restart the notebook.

Then, you can install the CLI with pip:

In [1]:
# uncomment to install

# !pip install eotdl

Once installed, you can execute the CLI with different commands. 

In [4]:
!eotdl --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl [OPTIONS] COMMAND [ARGS]...[0m[1m                                      [0m[1m [0m
[1m                                                                                [0m
 Welcome to EOTDL. Learn more at https://www.eotdl.com/                         
                                                                                
[2m╭─[0m[2m Options [0m[2m───────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-install[0m[1;36m-completion[0m          Install completion for the current shell.      [2m│[0m
[2m│[0m [1;36m-[0m[1;36m-show[0m[1;36m-completion[0m             Show completion for the current shell, to copy [2m│[0m
[2m│[0m                               it or customize the installation.              [2m│[0m
[2m│[0m [1;36m-[0m[1;36m-help[0m                        Show

In [5]:
!eotdl version

EOTDL Version: 2024.10.07


In [6]:
!eotdl datasets --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl datasets [OPTIONS] COMMAND [ARGS]...[0m[1m                             [0m[1m [0m
[1m                                                                                [0m
 Explore, ingest and download training datasets.                                
                                                                                
[2m╭─[0m[2m Options [0m[2m───────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-help[0m          Show this message and exit.                                  [2m│[0m
[2m╰──────────────────────────────────────────────────────────────────────────────╯[0m
[2m╭─[0m[2m Commands [0m[2m──────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36mget      [0m[1;36m [0m Download a dataset from the EOTDL.                         

You can explore datasets with the following command:

In [11]:
!eotdl datasets list 

['EuroSAT-RGB-small', 'EuroSAT-RGB']


In [12]:
!eotdl datasets list --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl datasets list [OPTIONS][0m[1m                                          [0m[1m [0m
[1m                                                                                [0m
 Retrieve a list with all the datasets in the EOTDL.                            
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-name[0m[2m, it will filter the results by name. If no name is provided, [0m  
 [2mit will return all the datasets.[0m                                               
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-limit[0m[2m, it will limit the number of results. If no limit is [0m         
 [2mprovided, it will return all the datasets.[0m                                     
                                                                                
 [2mExamples[0m                                                                       
 [1;2;36m--------[0m 

In [13]:
!eotdl datasets list -n eurosat

['EuroSAT-RGB-small', 'EuroSAT-RGB']


As you may have guessed, you can stage a dataset with the following command:

In [15]:
!eotdl datasets get EuroSAT-RGB-small

Data available at /home/juan/.cache/eotdl/datasets/EuroSAT-RGB-small


The first time you run the command, you will be asked to login (which will require you to create an account if you haven't already). You can also login with the command

In [16]:
!eotdl auth login

You are logged in as it@earthpulse.es


In [17]:
!eotdl auth --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl auth [OPTIONS] COMMAND [ARGS]...[0m[1m                                 [0m[1m [0m
[1m                                                                                [0m
 Login to EOTDL.                                                                
                                                                                
[2m╭─[0m[2m Options [0m[2m───────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-help[0m          Show this message and exit.                                  [2m│[0m
[2m╰──────────────────────────────────────────────────────────────────────────────╯[0m
[2m╭─[0m[2m Commands [0m[2m──────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36mlogin            [0m[1;36m [0m Login to the EOTDL.                                

In [18]:
!eotdl datasets get --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl datasets get [OPTIONS] [DATASET][0m[1m                                 [0m[1m [0m
[1m                                                                                [0m
 Download a dataset from the EOTDL.                                             
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-path[0m[2m, it will download the dataset to the specified path. If no [0m    
 [2mpath is provided, it will download to ~/.eotdl/datasets.[0m                       
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-file[0m[2m, it will download the specified file. If no file is provided, [0m 
 [2mit will download the entire dataset.[0m                                           
 [2mIf using [0m[1;2;36m-[0m[1;2;36m-version[0m[2m, it will download the specified version. If no version is [0m  
 [2mprovided, it will download the latest version.[0m               

By default, datasets will be staged to your `$HOME/.cache/eotdl/datasets` folder or the path in the `EOTDL_DOWNLOAD_PATH` environment variable. You can change this with the `--path` argument.

In [4]:
!eotdl datasets get EuroSAT-RGB-small -p data

Data available at data/EuroSAT-RGB-small


You can choose a particular version to download with the `--version` argument. If you don't specify a version, the latest version will be downloaded.

In [5]:
# TODO

# !eotdl datasets get EuroSAT-RGB-small -p data -v 1

The version number will be used to create a folder with the same name inside the path you specified. Inside this folder you will find the dataset files.

If you try to re-download a datasets, the CLI will complain. You can force a re-download with the `--force` argument.

In [6]:
# !eotdl datasets get EuroSAT-small -p data -v 1

# TODO

In [9]:
# !eotdl datasets get EuroSAT-small -p data -v 1 -f
!eotdl datasets get EuroSAT-RGB-small -p data -f

Data available at data/EuroSAT-RGB-small


By default, the `get` command will only stage the dataset metadata.

In [11]:
os.listdir("data/EuroSAT-RGB-small")

['catalog.parquet']

Inside the metadata you will find the links to all the assets, so you can download them individually (maybe after some filtering or processing using only the metadata). However, you can stage all the assets with the command:

In [13]:
!eotdl datasets get EuroSAT-RGB-small -p data -a -f

Staging assets: 100%|████████████████████████| 102/102 [00:00<00:00, 104.01it/s]
Data available at data/EuroSAT-RGB-small


In [17]:
from glob import glob
from pathlib import Path

files = glob("data/EuroSAT-RGB-small/**/*", recursive=True)
files = [file for file in files if Path(file).is_file()]
len(files), files[:5]

(102,
 ['data/EuroSAT-RGB-small/catalog.parquet',
  'data/EuroSAT-RGB-small/README.md',
  'data/EuroSAT-RGB-small/Industrial/Industrial_1743.jpg',
  'data/EuroSAT-RGB-small/Industrial/Industrial_1273.jpg',
  'data/EuroSAT-RGB-small/Industrial/Industrial_1117.jpg'])

Working with models is very much the same at this point.

In [None]:
# !eotdl models --help

# TODO

In [None]:
# !eotdl models list

# TODO

In [None]:
# !eotdl models list --help

# TODO

In [None]:
# !eotdl models get EuroSAT-RGB-BiDS23

In [None]:
# !eotdl models get --help

We will explore how to ingest datasets and models in the next tutorials.

## The Library

Everything that we have done so far with the CLI is also enabled through the Python library. When installing the CLI, the library is automatically installed as well.

In [23]:
import eotdl

eotdl.__version__

'2024.10.07'

In [24]:
from eotdl.datasets import retrieve_datasets

datasets = retrieve_datasets()
len(datasets)

2

In [25]:
retrieve_datasets("eurosat")

['EuroSAT-RGB-small', 'EuroSAT-RGB']

With the library, you have full control over the datasets and models.

In [27]:
[d for d in datasets if "small" in d.lower()]

['EuroSAT-RGB-small']

You can stage datasets as well, but now you will have to manage potential errors.

In [30]:
from eotdl.datasets import stage_dataset

stage_dataset("EuroSAT-RGB-small")

Exception: Dataset `EuroSAT-RGB-small` already exists at /home/juan/.cache/eotdl/datasets/EuroSAT-RGB-small. To force download, use force=True or -f in the CLI.

In [31]:
stage_dataset("EuroSAT-RGB-small", force=True)

'/home/juan/.cache/eotdl/datasets/EuroSAT-RGB-small'

In [32]:
stage_dataset("EuroSAT-RGB-small", force=True, path="data")

'data/EuroSAT-RGB-small'

In [33]:
stage_dataset("EuroSAT-RGB-small", force=True, path="data", assets=True)

Staging assets: 100%|██████████| 102/102 [00:00<00:00, 104.69it/s]


'data/EuroSAT-RGB-small'

In fact, the CLI is built on top of the library.

And the same for the models

In [34]:
# from eotdl.models import retrieve_models

# retrieve_models()

In [35]:
# from eotdl.models import download_model 

# path = download_model("EuroSAT-RGB-BiDS23", force=True)
# path

In [36]:
# import os 

# os.listdir(path)