This repository houses data used to define a VEDA dataset to load into the VEDA catalog. Inclusion in the VEDA catalog is a prerequisite for displaying the dataset in the VEDA Dashboard.
The data provided here gets processed in the ingestion system veda-data-airflow, to which this repository is directly linked (as a Git submodule).
The VEDA user docs explain the full dataset submission process.
Ultimately, submission to the VEDA catalog requires that you open an issue with the "new dataset" template. This template will require, at minimum:
- a description of the dataset
- the location of the data (in S3, CMR, etc.), and
- a point of contact for the VEDA team to collaborate with.
One or more notebooks showing how the data should be processed would be appreciated.
When submitting STAC records to ingest, a pull request can be made with the data structured as described below.
The ingestion-data/collections/
directory holds json files representing the data for VEDA collection metadata (STAC).
Should follow the following format:
{
"id": "<collection-id>",
"type": "Collection",
"links":[
],
"title":"<collection-title>",
"description": "<collection-description>",
"extent":{
"spatial":{
"bbox":[
[
"<min-longitude>",
"<min-latitude>",
"<max-longitude>",
"<max-latitude>",
]
]
},
"temporal":{
"interval":[
[
"<start-date>",
"<end-date>",
]
]
}
},
"license":"MIT",
"stac_extensions": [
"https://stac-extensions.github.io/render/v1.0.0/schema.json",
"https://stac-extensions.github.io/item-assets/v1.0.0/schema.json"
],
"stac_version": "1.0.0",
"license": "CC0-1.0",
"dashboard:is_periodic": "<true/false>",
"dashboard:time_density": "<month/>day/year>",
"item_assets": {
"cog_default": {
"type": "image/tiff; application=geotiff; profile=cloud-optimized",
"roles": [
"data",
"layer"
],
"title": "Default COG Layer",
"description": "Cloud optimized default layer to display on map"
}
},
"providers": [
{
"name": "NASA VEDA",
"url": "https://www.earthdata.nasa.gov/dashboard/",
"roles": [
"host"
]
}
],
"renders": {
"dashboard": {
"colormap_name": "<colormap_name>",
"rescale": [
[
"<min_rescale>",
"<max_rescale>"
]
],
"nodata": "nan",
"assets": [
"cog_default"
],
"title": "VEDA Dashboard Render Parameters"
}
}
}
The ingestion-data/discovery-items/
directory holds json files representing the step function inputs for initiating the discovery, ingest and publication workflows.
Can either be a single input event or a list of input events.
Should follow the following format:
{
"collection": "<collection-id>",
"discovery": "<s3/cmr>",
## for s3 discovery
"prefix": "<s3-key-prefix>",
"bucket": "<s3-bucket>",
"filename_regex": "<filename-regex>",
"datetime_range": "<month/day/year>",
## for cmr discovery
"version": "<collection-version>",
"temporal": ["<start-date>", "<end-date>"],
"bounding_box": ["<bounding-box-as-comma-separated-LBRT>"],
"include": "<filename-pattern>",
### misc
"cogify": "<true/false>",
"upload": "<true/false>",
"dry_run": "<true/false>",
}
The ingestion-data/dataset-config/
directory holds json files that can be used with the dataset/publish
stac ingestor endpoint, combining both collection metadata and discovery items. For an example of this ingestion workflow, see this jupyter notebook.
{
"collection": "<collection-id>",
"title": "<collection-title>",
"description": "<collection-description>",
"type": "cog",
"spatial_extent": {
"xmin": -180,
"ymin": 90,
"xmax": -90,
"ymax": 180
},
"temporal_extent": {
"startdate": "<start-date>",
"enddate": "<end-date>"
},
"license": "CC0-1.0",
"is_periodic": false,
"time_density": null,
"stac_version": "1.0.0",
"discovery_items": [
{
"prefix": "<prefix>",
"bucket": "<bucket>",
"filename_regex": "<regexß>",
"discovery": "s3",
"upload": false
}
]
}
This repository provides a script for validating all collections. First, install the requirements (preferably in a virtual environment):
pip install -r requirements.txt
Then:
pytest
We use pre-commit hooks to keep our notebooks and Python scripts consistently formatted. To contribute, first install the requirements, then install the pre-commit hooks:
pip install -r requirements.txt # recommend a virtual environment
pre-commit install
The hooks will run automatically on any changed files when you commit. To run the hooks on the entire repository (which is what happens in CI):
pre-commit run --all-files
If you need to add a Python dependency, add your dependency to requirements.in
:
Then run:
pip-compile
This will update requirements.txt
with a complete, realized set of Python dependencies.