In [7]:
%load_ext autoreload
%autoreload 2

import os
os.environ["EOTDL_API_URL"] = "http://localhost:8001/"


Experiment on new way to ingest data.

At this moment, users can ingest a folder with some data. If there is no `catalog.json` file, the data will be ingested as a `Q0` dataset using metadata from a `README.md` file. Otherwise, the data will be ingested as a `Q1` or `Q2` dataset depending on the metadata.

The new proposed way is similar, but if there is no `catalog.json` file, it will be created so all datasets in EOTDL are STAC compliant. Also, the `README.md` file will be required for all datasets. This pseudo-catalogs can be generated from filesystem for local datasets of from a list of links to host only metadata.

In this way, we can make quality as a spectrum instead of discrete, depending on the metadata.

Also, it enables wiki-style metadata management, where users can add metadata to a dataset that was ingested by another user.

# Example 1 - ingesting a dataset from a folder without a catalog.json

a subsample of theEuroSAT dataset

In [65]:
from glob import glob

path = 'data/EuroSAT-RGB-small'

# # retrieve all files in the folder recursively
# files = glob(path + '/**/*', recursive=True)

# files



In [12]:
# create README.md

text = """---
name: EuroSAT-RGB-small-prototype
authors: 
  - Juan B. Pedro
license: free
source: https://github.com/earthpulse/eotdl/blob/develop/tutorials/workshops/philab24/02_prototype_ingesting.ipynb
---

# EuroSAT-RGB-small-prototype

This is a prototype of the EuroSAT dataset.
"""

with open(f"{path}/README.md", "w") as outfile:
    outfile.write(text)

In [45]:
from eotdl.datasets import ingest_dataset_prototype

ingest_dataset_prototype(path)

Using EOTDL API URL: http://localhost:8001/
Using EOTDL API URL: http://localhost:8001/
Loading STAC catalog...
Using EOTDL API URL: http://localhost:8001/
New version created, version: 3


100%|██████████| 101/101 [00:01<00:00, 82.34it/s]

Ingesting STAC catalog...
Done





will get all files in the folder recursively, create a simple catalog.json and ingest it into EOTDL.

In [16]:
!rm -rf data/EuroSAT-RGB-small/catalog.json
!rm -rf data/EuroSAT-RGB-small/collection

# Example 2 - ingesting a dataset from a list of links

We can ingest a new dataset from a list of links.

In [40]:
links = [
	'https://link1.com',
	'https://link2.com',
	'https://link3.com',
]

metadata = {
	'name': 'Test-links',
	'authors': ['Juan B. Pedro'],
	'license': 'free',
	'source': 'https://github.com/earthpulse/eotdl/blob/develop/tutorials/workshops/philab24/02_prototype_ingesting.ipynb',
	'description': """# Test links

Testing the ingestion of a dataset from a list of links.
"""
}

path = 'data/test-links'

ingest_dataset_prototype(path, metadata, links, replicate=False)

Using EOTDL API URL: http://localhost:8001/
Using EOTDL API URL: http://localhost:8001/
Loading STAC catalog...
Using EOTDL API URL: http://localhost:8001/
New version created, version: 1


100%|██████████| 3/3 [00:00<00:00, 7543.71it/s]

Ingesting STAC catalog...
Done





will create a simple catalog.json with links as items and ingest it into EOTDL. We can choose if we want to replicate the assets in EOTDL or not (use direct sources).

In [39]:
!rm -rf data/test-links

# Example 3 - ingesting a dataset from a catalog


If STAC catalog already exists, we can ingest it into EOTDL. In this case, create README.md and place it in the root of the catalog.

In [48]:
!cp -r data/EuroSAT-RGB-small data/EuroSAT-RGB-small-stac

In [50]:
path = 'data/EuroSAT-RGB-small-stac'

files = os.listdir(path)
assert 'catalog.json' in files, "catalog.json not found"

!cat data/EuroSAT-RGB-small-stac/catalog.json

{
  "type": "Catalog",
  "id": "EuroSAT-RGB-small-prototype",
  "stac_version": "1.0.0",
  "description": "STAC catalog",
  "links": [
    {
      "rel": "root",
      "href": "./catalog.json",
      "type": "application/json",
      "title": "EuroSAT-RGB-small-prototype"
    },
    {
      "rel": "child",
      "href": "./collection/collection.json",
      "type": "application/json",
      "title": "collection"
    }
  ],
  "eotdl": {
    "name": "EuroSAT-RGB-small-prototype",
    "license": "free",
    "source": "https://github.com/earthpulse/eotdl/blob/develop/tutorials/workshops/philab24/02_prototype_ingesting.ipynb",
    "thumbnail": "",
    "authors": [
      "Juan B. Pedro"
    ],
    "description": "# EuroSAT-RGB-small-prototype\n\nThis is a prototype of the EuroSAT dataset."
  },
  "title": "EuroSAT-RGB-small-prototype"
}

In [57]:
# create README.md

text = """---
name: EuroSAT-RGB-small-catalog-prototype
authors: 
  - Juan B. Pedro
license: free
source: https://github.com/earthpulse/eotdl/blob/develop/tutorials/workshops/philab24/02_prototype_ingesting.ipynb
---

# EuroSAT-RGB-small-catalog-prototype

This is a prototype of the EuroSAT dataset.
"""

with open(f"{path}/README.md", "w") as outfile:
    outfile.write(text)

In [64]:
ingest_dataset_prototype(path, replicate=False)

Using EOTDL API URL: http://localhost:8001/
Using EOTDL API URL: http://localhost:8001/
Loading STAC catalog...
Using EOTDL API URL: http://localhost:8001/
New version created, version: 1


100%|██████████| 101/101 [00:03<00:00, 29.18it/s]

Ingesting STAC catalog...
Done



