Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ghgc 269/fix workflows api #179

Merged
merged 54 commits into from
Jul 23, 2024
Merged

Ghgc 269/fix workflows api #179

merged 54 commits into from
Jul 23, 2024

Conversation

sanzog03
Copy link
Contributor

Summary: Summary of changes

Addresses GHGC-269: Fix workflows api for GHGC

Changes

  • Add support for multiple item_assets.
  • Made DiscoveryItemAssets schema optional, and added a default item_assets if not provided.
  • Add in missing attributes as optional, needed for stac collections and discovery_items in ghgc. Namely renderers, stac_extensions for stac collection and id_regex, id_template and assets for discovery_items.
  • Fix on list dags endpoint response.
  • Other minor changes and fixes.

PR Checklist

  • Update CHANGELOG
  • Unit tests
  • Ad-hoc testing - Deploy changes and test manually
  • Integration tests

Jennifer Tran and others added 30 commits June 17, 2024 10:52
1. removed CMR input and its usages
2. added in missing attributes into s3Input which helps in item discovery. The s3Input is later used in dataset schema and then cogDataset Schema
3. also added in pydantic validators for the new attributes
…eda-data-airflow into GHGC-269/fix-workflows_api
…eda-data-airflow into GHGC-269/fix-workflows_api
sanzog03 and others added 5 commits July 2, 2024 11:07
2. filter out unnecessary attributes that doesnot align with the stac specs
3. user provided attribute has greater precedence than the created one
…_payload

json prepared for ingest api now allows arbitrary key value pairs
workflows_api/runtime/src/main.py Outdated Show resolved Hide resolved
workflows_api/runtime/src/utils/events.py Show resolved Hide resolved
infrastructure/main.tf Outdated Show resolved Hide resolved
@smohiudd smohiudd self-requested a review July 15, 2024 15:02
@anayeaye anayeaye self-requested a review July 15, 2024 16:19
@slesaad
Copy link
Member

slesaad commented Jul 16, 2024

Testing

Case 1: No discovery_items.assets, but item_assets provided

omi-trno2-item-assets-only
{
  "collection": "omi-trno2-item-assets-only",
  "data_type": "cog",
  "spatial_extent": {
    "xmin": -127,
    "ymin": 29,
    "xmax": -103,
    "ymax": 52
  },
  "temporal_extent": {
    "startdate": "1995-01-01T00:00:00Z",
    "enddate": "2095-03-31T00:00:00Z"
  },
  "description": "OMI_trno2 - 0.10 x 0.10 Annual as Cloud-Optimized GeoTIFFs (COGs)",
  "discovery_items": [
    {
      "bucket": "veda-data-store-staging",
      "datetime_range": "year",
      "discovery": "s3",
      "filename_regex": "^(.*).tif$",
      "prefix": "OMI_trno2-COG/"
    }
  ],
  "is_periodic": true,
  "license": "MIT",
  "sample_files": ["s3://veda-data-store-staging/OMI_trno2-COG/OMI_trno2_0.10x0.10_2005_Col3_V4.tif"],
  "providers": [
    {
      "name": "NASA VEDA",
      "roles": [
        "host"
      ],
      "url": "https://www.earthdata.nasa.gov/dashboard/"
    }
  ],
  "renders": {
    "dashboard": {
      "assets": [
        "cog_default"
      ],
      "colormap_name": "reds",
      "rescale": [
        [
          0,
          3000000000000000
        ]
      ],
      "title": "VEDA Dashboard Render Parameters"
    }
  },
  "item_assets": {
    "no2": {
      "type": "image/tiff; application=geotiff; profile=cloud-optimized",
      "roles": [
        "data",
        "layer"
      ],
      "title": "NO2 values",
      "description": "description",
      "other_attr": "lets see"
    }
  },
  "assets": {
    "thumbnail": {
      "description": "Photo by [Mick Truyts](https://unsplash.com/photos/x6WQeNYJC1w) (Power plant shooting steam at the sky)",
      "href": "https://thumbnails.openveda.cloud/no2--dataset-cover.jpg",
      "roles": [
        "thumbnail"
      ],
      "title": "Thumbnail",
      "type": "image/jpeg"
    }
  },
  "time_density": "year",
  "title": "DELETE ME OMI_trno2"
}

Output:
https://dev.openveda.cloud/api/stac/collections/omi-trno2-item-assets-only

Case 2: item_assets and discovery.assets provided

omi-trno2-custom-assets
{
  "collection": "omi-trno2-custom-assets",
  "data_type": "cog",
  "spatial_extent": {
    "xmin": -127,
    "ymin": 29,
    "xmax": -103,
    "ymax": 52
  },
  "temporal_extent": {
    "startdate": "1995-01-01T00:00:00Z",
    "enddate": "2095-03-31T00:00:00Z"
  },
  "description": "OMI_trno2 - 0.10 x 0.10 Annual as Cloud-Optimized GeoTIFFs (COGs)",
  "discovery_items": [
    {
      "bucket": "veda-data-store-staging",
      "datetime_range": "year",
      "discovery": "s3",
      "filename_regex": "^(.*).tif$",
      "prefix": "OMI_trno2-COG/",
      "assets": {
        "no2": {
          "type": "image/tiff; application=geotiff; profile=cloud-optimized",
          "roles": [
            "data",
            "layer"
          ],
          "title": "NO2 values",
          "description": "description",
          "other_attr": "lets see",
          "regex": ".*"
        }
      }
    }
  ],
  "is_periodic": true,
  "license": "MIT",
  "sample_files": ["s3://veda-data-store-staging/OMI_trno2-COG/OMI_trno2_0.10x0.10_2005_Col3_V4.tif"],
  "providers": [
    {
      "name": "NASA VEDA",
      "roles": [
        "host"
      ],
      "url": "https://www.earthdata.nasa.gov/dashboard/"
    }
  ],
  "renders": {
    "dashboard": {
      "assets": [
        "cog_default"
      ],
      "colormap_name": "reds",
      "rescale": [
        [
          0,
          3000000000000000
        ]
      ],
      "title": "VEDA Dashboard Render Parameters"
    }
  },
  "assets": {
    "thumbnail": {
      "description": "Photo by [Mick Truyts](https://unsplash.com/photos/x6WQeNYJC1w) (Power plant shooting steam at the sky)",
      "href": "https://thumbnails.openveda.cloud/no2--dataset-cover.jpg",
      "roles": [
        "thumbnail"
      ],
      "title": "Thumbnail",
      "type": "image/jpeg"
    }
  },
  "item_assets": {
    "no2": {
      "type": "image/tiff; application=geotiff; profile=cloud-optimized",
      "roles": [
        "data",
        "layer"
      ],
      "title": "NO2 values",
      "description": "description",
      "other_attr": "lets see"
    }
  },
  "time_density": "year",
  "title": "DELETE ME OMI_trno2"
}

Output:
https://dev.openveda.cloud/api/stac/collections/omi-trno2-custom-assets

Case 3: No assets anywhere

omi-trno2-no-assets
{
  "collection": "omi-trno2-no-assets",
  "data_type": "cog",
  "spatial_extent": {
    "xmin": -127,
    "ymin": 29,
    "xmax": -103,
    "ymax": 52
  },
  "temporal_extent": {
    "startdate": "1995-01-01T00:00:00Z",
    "enddate": "2095-03-31T00:00:00Z"
  },
  "description": "OMI_trno2 - 0.10 x 0.10 Annual as Cloud-Optimized GeoTIFFs (COGs)",
  "discovery_items": [
    {
      "bucket": "veda-data-store-staging",
      "datetime_range": "year",
      "discovery": "s3",
      "filename_regex": "^(.*).tif$",
      "prefix": "OMI_trno2-COG/"
    }
  ],
  "is_periodic": true,
  "license": "MIT",
  "sample_files": ["s3://veda-data-store-staging/OMI_trno2-COG/OMI_trno2_0.10x0.10_2005_Col3_V4.tif"],
  "providers": [
    {
      "name": "NASA VEDA",
      "roles": [
        "host"
      ],
      "url": "https://www.earthdata.nasa.gov/dashboard/"
    }
  ],
  "renders": {
    "dashboard": {
      "assets": [
        "cog_default"
      ],
      "colormap_name": "reds",
      "rescale": [
        [
          0,
          3000000000000000
        ]
      ],
      "title": "VEDA Dashboard Render Parameters"
    }
  },
  "assets": {
    "thumbnail": {
      "description": "Photo by [Mick Truyts](https://unsplash.com/photos/x6WQeNYJC1w) (Power plant shooting steam at the sky)",
      "href": "https://thumbnails.openveda.cloud/no2--dataset-cover.jpg",
      "roles": [
        "thumbnail"
      ],
      "title": "Thumbnail",
      "type": "image/jpeg"
    }
  },
  "time_density": "year",
  "title": "DELETE ME OMI_trno2"
}

Output:
https://dev.openveda.cloud/api/stac/collections/omi-trno2-no-assets

Case 4: multi assets

climdex-tmaxxf-access-cm2-ssp126-multi-asset
{ 
  "collection": "climdex-tmaxxf-access-cm2-ssp126-multi-asset",
  "data_type": "cog",
  "spatial_extent": {
    "xmin": -180,
    "ymin": -90,
    "xmax": 180,
    "ymax": 90
  },
  "temporal_extent": {
    "startdate": "2015-01-01T00:00:00Z",
    "enddate": "2101-12-31T23:59:59Z"
  },
  "description": "CLIMDEX ACCESS CM2 SSP125 - variable tmaxXF",
  "is_periodic": true,
  "license": "MIT",
  "item_assets": {
    "cog_default": {
        "type": "image/tiff; application=geotiff; profile=cloud-optimized",
        "roles": [
            "data",
            "layer"
        ],
        "title": "Default COG Layer",
        "description": "Cloud optimized default layer to display on map"
    },
    "tmax_above_86": {
        "type": "image/tiff; application=geotiff; profile=cloud-optimized",
        "roles": [
            "data",
            "layer"
        ],
        "title": "Tmax Above 86",
        "description": "Tmax Above 86"
    },
    "tmax_above_90": {
        "type": "image/tiff; application=geotiff; profile=cloud-optimized",
        "roles": [
            "data",
            "layer"
        ],
        "title": "Tmax Above 90",
        "description": "Tmax Above 90"
    },
    "tmax_above_100": {
        "type": "image/tiff; application=geotiff; profile=cloud-optimized",
        "roles": [
            "data",
            "layer"
        ],
        "title": "Tmax Above 100",
        "description": "Tmax Above 100"
    },
    "tmax_above_110": {
        "type": "image/tiff; application=geotiff; profile=cloud-optimized",
        "roles": [
            "data",
            "layer"
        ],
        "title": "Tmax Above 110",
        "description": "Tmax Above 110"
    },
    "tmax_above_115": {
        "type": "image/tiff; application=geotiff; profile=cloud-optimized",
        "roles": [
            "data",
            "layer"
        ],
        "title": "Tmax Above 115",
        "description": "Tmax Above 115"
    }
  },
  "sample_files": ["s3://veda-data-store-staging/climdex-tmaxxf-access-cm2-ssp126/tmaxXF-ACCESS-CM2-ssp126_2099_tmax_above_86.tif"],
  "providers": [
        {
            "name": "NASA VEDA",
            "url": "https://www.earthdata.nasa.gov/dashboard/",
            "roles": [
                "host"
            ]
        }
    ],
  "renders": {
    "dashboard": {
      "assets": [
        "cog_default"
      ],
      "colormap_name": "reds",
      "rescale": [
        [
          0,
          3000000000000000
        ]
      ],
      "title": "VEDA Dashboard Render Parameters"
    }
  },
  "assets": {
    "thumbnail": {
      "title": "Thumbnail",
      "description": "Photo by NASA (CMIP6 Climdex TmaxXF Screenshot)",
      "href": "https://thumbnails.openveda.cloud/cmip6-climdex-tmaxxf-access-cm2.png",
      "type": "image/png",
      "roles": ["thumbnail"]
    }
  },
  "time_density": "year",
  "title": "DELETE ME CLIMDEX",
  "discovery_items": [
    {
      "collection": "climdex-tmaxxf-access-cm2-ssp126-deleteme",
      "bucket": "veda-data-store-staging",
      "prefix": "climdex-tmaxxf-access-cm2-ssp126/",
      "filename_regex": ".*-ssp126_209(.*)_tmax.*.tif$",
      "id_regex": ".*-ssp126_(.*)_tmax.*.tif$",
      "id_template": "climdex-tmaxxf-access-cm2-ssp126-{}",
      "datetime_range": "year",
      "assets": {
          "tmax_above_86": {
            "title": "Tmax Above 86",
            "description": "Tmax Above 86",
            "regex": ".*-ssp126_(.*)_tmax_above_86.tif"
          },
          "tmax_above_90": {
            "title": "Tmax Above 90",
            "description": "Tmax Above 90",
            "regex": ".*-ssp126_(.*)_tmax_above_90.tif"
          },
          "tmax_above_100": {
            "title": "Tmax Above 100",
            "description": "Tmax Above 100",
            "regex": ".*-ssp126_(.*)_tmax_above_100.tif"
          },
          "tmax_above_110": {
            "title": "Tmax Above 110",
            "description": "Tmax Above 110",
            "regex": ".*-ssp126_(.*)_tmax_above_110.tif"
          },
          "tmax_above_115": {
            "title": "Tmax Above 115",
            "description": "Tmax Above 115",
            "regex": ".*-ssp126_(.*)_tmax_above_115.tif"
          }
        },
      "discovery": "s3",
      "upload": false
    }
  ]
}

Output:
https://dev.openveda.cloud/api/stac/collections/climdex-tmaxxf-access-cm2-ssp126-multi-asset

@slesaad slesaad requested a review from smohiudd July 16, 2024 19:29
}
}
response = await start_discovery_workflow_execution(discovery)
if (dataset.item_assets):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use item_assets is no assets provided, and if that's also not provided send nothing to airflow - which handles adding the cog_default asset there

@anayeaye anayeaye mentioned this pull request Jul 17, 2024
4 tasks
@anayeaye
Copy link
Contributor

anayeaye commented Jul 17, 2024

I think we need an exclude_unset=True somewhere in the dataset path to publishing a collection record to pgstac, maybe as the last step before posting to the ingest API? I see some published keywords: null, stac_extensions: null, ...

EDIT: this also means we don't have all the validations we thought we had in place for the ingest-api. I think we will probably be addressing that in the transactions work in a way that can be applied in the ingest api as well.

@anayeaye
Copy link
Contributor

I think we need an exclude_unset=True somewhere in the dataset path to publishing a collection record to pgstac, maybe as the last step before posting to the ingest API? I see some published keywords: null, stac_extensions: null, ...

I think it is just a switch to the stac_pydantic to_dict method which defaults exclude_unset=True
collection.json(by_alias=True)-->collection.to_dict(by_alias=True)

print("Success:", response.json())
else:
print("Error:", response.status_code, response.text)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were apparently two ingest methods defined - that tripped me up for a while 😓

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh no! 🙃

@@ -57,7 +57,6 @@ class DashboardCollection(Collection):
assets: Optional[Dict]
extent: SpatioTemporalExtent
renders: Optional[Dict]
stac_extensions: Optional[List[str]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stac_pydantic.Collection already defines this

Copy link
Contributor

@anayeaye anayeaye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One change to id_template default value + I am still looking at whether these fields should be required or if the preference was to default them when not provided (from datasets/publish). Either way I think we can open a new issue to address this error.

  {'loc': ('body', 'COGDataset', 'spatial_extent'), 'msg': 'field required', 'type': 'value_error.missing'}
  {'loc': ('body', 'COGDataset', 'temporal_extent'), 'msg': 'field required', 'type': 'value_error.missing'}, 
  {'loc': ('body', 'COGDataset', 'sample_files'), 'msg': 'field required', 'type': 'value_error.missing'}

workflows_api/runtime/src/schemas.py Show resolved Hide resolved
@slesaad slesaad merged commit ff77a44 into dev Jul 23, 2024
3 checks passed
@slesaad slesaad deleted the GHGC-269/fix-workflows_api branch July 23, 2024 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants