feat(metadata): persist dpp_spatial_extent for CSV lat/lon resources#298
Merged
Conversation
For Shapefile / GeoJSON inputs, FormatConverterStage already writes `dpp_spatial_extent` on the simplified resource it uploads (`format_converter.py:240`). Plain CSV resources that happen to carry lat/lon columns never get the same field — the bbox was only computed at jinja2 render time in `spatial_extent_wkt`, so external consumers (gazetteer widgets, third-party extensions) couldn't read it without re-deriving from stats. `MetadataStage._maybe_write_csv_spatial_extent` now fills this gap. It runs in `_update_resource_metadata` after the CKAN re-fetch and before `dsu.update_resource`, so the new field rides along with the existing preview / datastore_active updates. The bbox uses the same BoundingBox shape FormatConverterStage emits, keeping `spatial_extent_wkt` and `spatial_extent_feature_collection` unchanged. The stage no-ops when: - `ckanext.datapusher_plus.auto_csv_spatial_extent` is off (default on), - `dpp_spatial_extent` is already present (shapefile path), - stats aren't available, or - lat/lon detection returns nothing / values are out-of-range. To avoid duplicating the detection heuristic, the lat/lon scan in `FormulaProcessor.__init__` is extracted to a module-level `jinja2_helpers.detect_lat_lon_fields()`. FormulaProcessor still populates `dpp["LAT_FIELD"]` / `dpp["LON_FIELD"]` identically. This re-lands the CSV slice of #253 against the Prefect-stage pipeline. Closes part of #253. Co-Authored-By: Minhajuddin Mohammed <minhajuddin2510@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds persistence of dpp_spatial_extent for CSV resources with detected lat/lon columns so external consumers can read a stored bbox instead of re-deriving it at template render time, while reusing the existing lat/lon detection heuristic used by the formula engine.
Changes:
- Add
MetadataStage._maybe_write_csv_spatial_extent()and call it during resource metadata update to persist a BoundingBox on the resource. - Extract lat/lon detection logic into
jinja2_helpers.detect_lat_lon_fields()and refactorFormulaProcessorto use it. - Introduce
ckanext.datapusher_plus.auto_csv_spatial_extent(defaulttrue) and add unit tests covering detection + persistence behavior.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
tests/test_csv_spatial_extent.py |
New unit tests for lat/lon detection and CSV bbox persistence behavior. |
ckanext/datapusher_plus/jobs/stages/metadata.py |
Persist dpp_spatial_extent for CSV lat/lon resources during _update_resource_metadata. |
ckanext/datapusher_plus/jinja2_helpers.py |
Add shared detect_lat_lon_fields() and reuse it in FormulaProcessor. |
ckanext/datapusher_plus/config.py |
Add AUTO_CSV_SPATIAL_EXTENT config flag (default on). |
ckanext/datapusher_plus/config_declaration.yaml |
Declare the new operator-facing config option. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ature_collection Address Copilot review on #298. ``spatial_extent_feature_collection`` was reading ``resource["dpp_spatial_extent"]["coordinates"]`` and using it as a flat ``[min_lon, min_lat, max_lon, max_lat]`` list, but both the existing shapefile/GeoJSON path (FormatConverterStage) and the new CSV path persist the GeoJSON-ish nested form ``[[min_lon, min_lat], [max_lon, max_lat]]``. The helper was silently producing invalid GeoJSON for any resource carrying a persisted ``dpp_spatial_extent``. Flatten the nested coordinates the same way ``spatial_extent_wkt`` already does. Also tighten ``MetadataStage._maybe_write_csv_spatial_extent``: use ``logger.exception`` on the broad-Exception path from ``detect_lat_lon_fields`` so the traceback is preserved when an operator has to debug an unexpected stats shape. Pure-data helpers don't typically fail, but if one does we want to see why, not just the message. Three new tests guard the round-trip — ``spatial_extent_wkt`` and ``spatial_extent_feature_collection`` both read from a persisted BoundingBox dict and emit valid GeoJSON / WKT. Tests now 116/116 (was 113; +3 new). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 17, 2026
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Re-implements the CSV slice of #253 against the current Prefect-stage
pipeline. For Shapefile / GeoJSON inputs
FormatConverterStagealreadywrites
dpp_spatial_extenton the simplified resource (seeformat_converter.py:240). Plain CSV resources with lat/lon columnsnever got the same field — the bbox was only computed at jinja2 render
time in
spatial_extent_wkt, so external consumers (gazetteer widgets,third-party extensions) had to re-derive it from stats.
MetadataStage._maybe_write_csv_spatial_extentfills this gap. Itruns in
_update_resource_metadataafter the CKAN refetch and beforedsu.update_resource, so the new field rides along with the existingpreview /
datastore_activeupdates. Same BoundingBox dict shape asthe FormatConverterStage path → existing
spatial_extent_wkt/spatial_extent_feature_collectionhelpers keep working unchanged.The stage no-ops when:
ckanext.datapusher_plus.auto_csv_spatial_extentis off (default on),dpp_spatial_extentis already present (shapefile/GeoJSON path),To avoid duplicating the lat/lon detection heuristic, the scan in
FormulaProcessor.__init__is extracted to a module-leveljinja2_helpers.detect_lat_lon_fields().FormulaProcessorstillpopulates
dpp["LAT_FIELD"]/dpp["LON_FIELD"]identically — refactoris behavior-preserving.
Original idea + scaffolding from @minhajuddin2510 in #253; the diff
here is a fresh implementation against the post-Prefect-migration
codebase. Credited via
Co-Authored-By.Test plan
tests/test_csv_spatial_extent.py(detectionheuristic + MetadataStage helper) — all pass in
dpp-test.regressions in
FormulaProcessor/MetadataStagebehavior.resource["dpp_spatial_extent"]populated.spatial_extent_wkt()with noargs; confirm it reads from the persisted
dpp_spatial_extent.Config
New flag (default on):
ckanext.datapusher_plus.auto_csv_spatial_extent = trueOperators who don't want auto-bbox for CSVs can disable it; the
existing shapefile/GeoJSON path is unaffected.
🤖 Generated with Claude Code