In [4]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Q0 Training Datasets

Training Datasets (TDS) in EOTDL are categorized into different [quality levels](https://eotdl.com/docs/datasets/quality), which in trun will impact on the range of functionality that will be available for each dataset.

In this tutorial you will learn about Q0 datsets, the lowest quality level. Q0 datasets are datasets with minimal standardized metadata. This level is ideal for easy and fast upload/download of small datasets.

##  Explore datasets

One of the first things that you may want to do within EOTDL is to explore the available datasets. You can do this at all accesibility layers.

In the user interface, visit [datasets](/datasets) to explore all the available datasets. You can click on a dataset card to see more information about it, download it, etc.

You can also explore datasets using the API, CLI and library, which will give you more flexibility and control over the results. Check the [documentation](/docs/datasets/explore) to learn more.

In [12]:
from eotdl.datasets import retrieve_datasets

In [13]:
datasets = retrieve_datasets()
len(datasets)

0

You will receive an object with the names of the datasets as keys and the list of files associated to the dataset as values. You can filter datasets by name.

In [7]:
datasets = retrieve_datasets("eurosat")
datasets

['EuroSAT-RGB',
 'EuroSAT',
 'EuroSAT-RGB-STAC',
 'EuroSAT-STAC',
 'eurosat-rgb',
 'eurosat-rgb-q2',
 'EuroSAT-small']

##  Download datasets

Once you find a suitable dataset you can download it for training your models.

In the user interface, click on the `DOWNLOAD` button in the dataset page. You’ll need to be logged in to download datasets.

You can also download datasets using the API, CLI and library, check the [documentation](/docs/datasets/download) to learn more.

In [8]:
from eotdl.datasets import download_dataset

In [None]:
dst_path = download_dataset('EuroSAT-RGB')
dst_path

By default, all the files in the dataset are downloaded to the directory ~/.eotdl/datasets. You can change the default directory or download a single file.

In [None]:
dst_path = download_dataset('EuroSAT-RGB', file='EuroSAT-RGB.zip', path='example_data')
dst_path

## Ingest datasets

Additionally, you can ingest your own datasets into EOTDL. This will allow you to use them in your own projects and share them with other users. 

In the user interface, visit [datasets](/datasets) and click on the `INGEST` button. You’ll need to be logged in to ingest datasets.

You can also ingest datasets using the API, CLI and library (CLI recommended), check the [documentation](/docs/datasets/ingest) to learn more.

In [None]:
from eotdl.datasets import ingest_dataset

In order to ingest a Q0 dataset you will have to create a folder with the data that you want to upload. Additionally, a `metadata.yml` file is required with the following structure:

```yaml
name: dataset-name
authors: 
  - author 1
  - author 2
license: dataset-license
source: http://link-to-source
```

In [None]:
import os 

os.listdir('data/EuroSAT2')

['EuroSAT-RGB.zip', 'metadata.yml', 'README.md']

In [None]:
ingest_dataset('data/EuroSAT2')

Uploading directory (only files, not recursive)
The following files will be uploaded:
EuroSAT-RGB.zip
README.md
Uploading file data/EuroSAT2/EuroSAT-RGB.zip...
Computing checksum...
de4455c5f375f3509f0f7144d3be3927b495b707
Ingesting file...


89.91/89.91 MB: : 9it [00:07,  1.16it/s]                     



Completing upload...
Done
Uploading file data/EuroSAT2/README.md...
Computing checksum...
dd5034ce10edabb9de02a171fc2b1f6a0f80852b
Ingesting file...
Done


{'dataset_id': '64c7be7c2a65dcd4ae2ca630',
 'dataset_name': 'EuroSAT2',
 'file_name': 'README.md'}

You can re-upload new versions of existing files, as well as delete files in the repository if they no longer exist in your local folder

In [None]:
os.remove('data/EuroSAT2/README.md')
os.listdir('data/EuroSAT2')

['EuroSAT-RGB.zip', 'metadata.yml']

In [None]:
ingest_dataset(
    'data/EuroSAT2',
    f=True, # force re-upload of existing files
    d=True, # delete files not in the dataset
)

Uploading directory (only files, not recursive)
The following files are no longer in your dataset (use --d to delete):
README.md
Deleting file README.md...
Done
The following files will be uploaded:
EuroSAT-RGB.zip
Uploading file data/EuroSAT2/EuroSAT-RGB.zip...
Computing checksum...
de4455c5f375f3509f0f7144d3be3927b495b707
Ingesting file...


89.91/89.91 MB: : 9it [00:08,  1.11it/s]                     



Completing upload...
Done


{'dataset': {'uid': 'auth0|616b019942cfbe00690b958a',
  'id': '64c7be7c2a65dcd4ae2ca630',
  'name': 'EuroSAT2',
  'authors': ['Patrick Helber'],
  'source': 'http://madm.dfki.de/downloads',
  'license': '-',
  'size': 188561134,
  'files': [{'name': 'EuroSAT-RGB.zip',
    'size': 94280567,
    'checksum': 'de4455c5f375f3509f0f7144d3be3927b495b707'}],
  'description': '',
  'tags': [],
  'createdAt': '2023-07-19T13:19:12.136',
  'updatedAt': '2023-07-31T16:03:16.204662',
  'likes': 0,
  'downloads': 0,
  'quality': 0}}

From now on, your dataset will be available in EOTDL.

In [None]:
retrieve_datasets("eurosat")

{'EuroSAT-RGB-STAC': ['EuroSAT-RGB-STAC.zip'],
 'EuroSAT-STAC': ['EuroSAT-STAC.zip'],
 'EuroSAT2': ['EuroSAT-RGB.zip'],
 'EuroSAT': ['EuroSAT.zip'],
 'EuroSAT-RGB': ['EuroSAT-RGB.zip']}

In [None]:
download_dataset('EuroSAT2')

Downloading EuroSAT-RGB.zip


100%|██████████| 89.9M/89.9M [00:01<00:00, 50.9MiB/s]


Done


'/home/juan/.eotdl/datasets/EuroSAT2'