Skip to content

feat(metadata): persist dpp_spatial_extent for CSV lat/lon resources#298

Merged
jqnatividad merged 2 commits into
mainfrom
csv-spatial-extent
May 16, 2026
Merged

feat(metadata): persist dpp_spatial_extent for CSV lat/lon resources#298
jqnatividad merged 2 commits into
mainfrom
csv-spatial-extent

Conversation

@jqnatividad
Copy link
Copy Markdown
Collaborator

Summary

Re-implements the CSV slice of #253 against the current Prefect-stage
pipeline. For Shapefile / GeoJSON inputs FormatConverterStage already
writes dpp_spatial_extent on the simplified resource (see
format_converter.py:240). Plain CSV resources with lat/lon columns
never got the same field — the bbox was only computed at jinja2 render
time in spatial_extent_wkt, so external consumers (gazetteer widgets,
third-party extensions) had to re-derive it from stats.

MetadataStage._maybe_write_csv_spatial_extent fills this gap. It
runs in _update_resource_metadata after the CKAN refetch and before
dsu.update_resource, so the new field rides along with the existing
preview / datastore_active updates. Same BoundingBox dict shape as
the FormatConverterStage path → existing spatial_extent_wkt /
spatial_extent_feature_collection helpers keep working unchanged.

The stage no-ops when:

  • ckanext.datapusher_plus.auto_csv_spatial_extent is off (default on),
  • dpp_spatial_extent is already present (shapefile/GeoJSON path),
  • stats aren't available, or
  • lat/lon detection returns nothing / values are out-of-range.

To avoid duplicating the lat/lon detection heuristic, the scan in
FormulaProcessor.__init__ is extracted to a module-level
jinja2_helpers.detect_lat_lon_fields(). FormulaProcessor still
populates dpp["LAT_FIELD"] / dpp["LON_FIELD"] identically — refactor
is behavior-preserving.

Original idea + scaffolding from @minhajuddin2510 in #253; the diff
here is a fresh implementation against the post-Prefect-migration
codebase. Credited via Co-Authored-By.

Test plan

  • 11 new unit tests in tests/test_csv_spatial_extent.py (detection
    heuristic + MetadataStage helper) — all pass in dpp-test.
  • Full unit suite: 113/113 passing (was 102; +11 new). No
    regressions in FormulaProcessor / MetadataStage behavior.
  • Integration CI on this PR.
  • Manual: push a CSV with lat/lon columns; confirm
    resource["dpp_spatial_extent"] populated.
  • Manual: push a CSV without lat/lon columns; confirm field absent.
  • Manual: push a shapefile; confirm existing behavior unchanged.
  • Manual: render a template calling spatial_extent_wkt() with no
    args; confirm it reads from the persisted dpp_spatial_extent.

Config

New flag (default on):

ckanext.datapusher_plus.auto_csv_spatial_extent = true

Operators who don't want auto-bbox for CSVs can disable it; the
existing shapefile/GeoJSON path is unaffected.

🤖 Generated with Claude Code

For Shapefile / GeoJSON inputs, FormatConverterStage already writes
`dpp_spatial_extent` on the simplified resource it uploads
(`format_converter.py:240`). Plain CSV resources that happen to carry
lat/lon columns never get the same field — the bbox was only computed
at jinja2 render time in `spatial_extent_wkt`, so external consumers
(gazetteer widgets, third-party extensions) couldn't read it without
re-deriving from stats.

`MetadataStage._maybe_write_csv_spatial_extent` now fills this gap.
It runs in `_update_resource_metadata` after the CKAN re-fetch and
before `dsu.update_resource`, so the new field rides along with the
existing preview / datastore_active updates. The bbox uses the same
BoundingBox shape FormatConverterStage emits, keeping
`spatial_extent_wkt` and `spatial_extent_feature_collection`
unchanged. The stage no-ops when:

- `ckanext.datapusher_plus.auto_csv_spatial_extent` is off (default on),
- `dpp_spatial_extent` is already present (shapefile path),
- stats aren't available, or
- lat/lon detection returns nothing / values are out-of-range.

To avoid duplicating the detection heuristic, the lat/lon scan in
`FormulaProcessor.__init__` is extracted to a module-level
`jinja2_helpers.detect_lat_lon_fields()`. FormulaProcessor still
populates `dpp["LAT_FIELD"]` / `dpp["LON_FIELD"]` identically.

This re-lands the CSV slice of #253 against the Prefect-stage pipeline.

Closes part of #253.

Co-Authored-By: Minhajuddin Mohammed <minhajuddin2510@gmail.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds persistence of dpp_spatial_extent for CSV resources with detected lat/lon columns so external consumers can read a stored bbox instead of re-deriving it at template render time, while reusing the existing lat/lon detection heuristic used by the formula engine.

Changes:

  • Add MetadataStage._maybe_write_csv_spatial_extent() and call it during resource metadata update to persist a BoundingBox on the resource.
  • Extract lat/lon detection logic into jinja2_helpers.detect_lat_lon_fields() and refactor FormulaProcessor to use it.
  • Introduce ckanext.datapusher_plus.auto_csv_spatial_extent (default true) and add unit tests covering detection + persistence behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/test_csv_spatial_extent.py New unit tests for lat/lon detection and CSV bbox persistence behavior.
ckanext/datapusher_plus/jobs/stages/metadata.py Persist dpp_spatial_extent for CSV lat/lon resources during _update_resource_metadata.
ckanext/datapusher_plus/jinja2_helpers.py Add shared detect_lat_lon_fields() and reuse it in FormulaProcessor.
ckanext/datapusher_plus/config.py Add AUTO_CSV_SPATIAL_EXTENT config flag (default on).
ckanext/datapusher_plus/config_declaration.yaml Declare the new operator-facing config option.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ckanext/datapusher_plus/jobs/stages/metadata.py Outdated
Comment thread ckanext/datapusher_plus/jobs/stages/metadata.py Outdated
…ature_collection

Address Copilot review on #298.

``spatial_extent_feature_collection`` was reading
``resource["dpp_spatial_extent"]["coordinates"]`` and using it as a flat
``[min_lon, min_lat, max_lon, max_lat]`` list, but both the existing
shapefile/GeoJSON path (FormatConverterStage) and the new CSV path
persist the GeoJSON-ish nested form ``[[min_lon, min_lat], [max_lon,
max_lat]]``. The helper was silently producing invalid GeoJSON for
any resource carrying a persisted ``dpp_spatial_extent``. Flatten the
nested coordinates the same way ``spatial_extent_wkt`` already does.

Also tighten ``MetadataStage._maybe_write_csv_spatial_extent``: use
``logger.exception`` on the broad-Exception path from
``detect_lat_lon_fields`` so the traceback is preserved when an
operator has to debug an unexpected stats shape. Pure-data helpers
don't typically fail, but if one does we want to see why, not just
the message.

Three new tests guard the round-trip — ``spatial_extent_wkt`` and
``spatial_extent_feature_collection`` both read from a persisted
BoundingBox dict and emit valid GeoJSON / WKT. Tests now 116/116
(was 113; +3 new).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants