feat(location): add AI enrichment, geocode cache, and map drilldown#9
Merged
Conversation
Introduce an end-to-end location enrichment pipeline so incidents can be verified and visualized at city level with less manual cleanup. - Add migration `0003_location_enrichment.sql` with verified-city metadata, lat/lon fields, timestamps, and a `city_geocode_cache` table + indexes. - Add `src/ai-location.js` to extract municipality from linked source text, score confidence, apply fallbacks, geocode Canadian cities via Nominatim, and cache geocode outcomes. - Add admin APIs for single and bulk enrichment: `POST /admin/api/records/:id/enrich-location` and `POST /admin/api/records/enrich-location-all`, with `force`, `geocode`, and `min_confidence` controls. - Extend admin UI with per-record and bulk location actions, plus display of verified city, confidence, coordinates, and location source. - Update map data flow to use enriched fields and add Leaflet province drilldown markers with mapped-vs-total visibility. - Document the new behavior/env vars and bump Wrangler to `^4.69.0` (lockfile refresh). Behavior changes: - `/map/canada` now prefers `city_verified` + `location_lat/location_lon` when available. - Bulk enrichment in `only_missing && !force` mode processes unchecked missing rows first to avoid endlessly reprocessing unresolved records. - Enrichment endpoints return `412` with a clear schema message if migration `0003` has not been applied. Risks: - City extraction/geocoding quality is probabilistic; some false positives/negatives are still possible. - External geocoding adds network and provider rate-limit failure modes. - Wrangler `4.69.0` raises local tooling expectations to Node 20+. Follow-ups: - Apply migration `0003_location_enrichment.sql` in staging/production before running backfill. - Monitor unresolved/failure counts and geocode-cache hit rate. - Consider scheduled retries for unresolved records and optional provider failover.
Throttle `triggerBulkRecordLocationEnrichment` geocode calls to ~1 request per 1.1 seconds after the first record. This aligns bulk enrichment with Nominatim’s rate limit and reduces rate-limit-related failures during `geocode=true` runs. Add `integrity` and `crossorigin` to Leaflet CSS/JS CDN includes in the Canada map template to harden third-party asset loading and improve frontend supply chain safety. Behavior changes: - Bulk geocoding now runs slower by design when geocoding is enabled. - Leaflet assets now require hash match; mismatches block asset execution/load. Risks: - Longer enrichment jobs may increase chance of worker/request timeout on large batches. - Future Leaflet CDN/version/hash changes can break map rendering until hashes are updated. Follow-ups: - Move throttling/retry into shared geocode rate-limit handling with adaptive backoff on 429 responses. - Consider self-hosting pinned Leaflet assets to reduce CDN hash drift.
Normalize `city_verified` with `normalizeCityName` and treat it as the primary fallback when AI verification does not meet `minConfidence`. This keeps previously verified data from being replaced by weaker fallbacks (`record.city`) and preserves existing verification metadata. Change the early skip condition to require both a verified city and coordinates before returning `skipped_already_enriched` (unless `force` is set). Records without coordinates now continue through enrichment even when geocoding is disabled, so city verification/metadata can still be refreshed. Also carry forward prior verification notes/source when AI reasoning is empty, and compute fallback confidence from the max of AI confidence, existing stored confidence, and a 0.55 floor. Behavior changes: - Partially enriched records (verified city but missing coords) are no longer skipped. - Existing verified cities are retained on low-confidence AI results. - Verification notes/source are less likely to be blanked out. Risks: - Stale verified cities may persist longer if AI confidence stays below threshold. - The 0.55 confidence floor can overstate certainty for legacy verified data. Follow-ups: - Add regression tests for skip gating (`force`, `geocode`, coords presence). - Add tests for city selection precedence and note/source preservation paths.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Introduce an end-to-end location enrichment pipeline so incidents can be verified and visualized at city level with less manual cleanup.
0003_location_enrichment.sqlwith verified-city metadata, lat/lon fields, timestamps, and acity_geocode_cachetable + indexes.src/ai-location.jsto extract municipality from linked source text, score confidence, apply fallbacks, geocode Canadian cities via Nominatim, and cache geocode outcomes.POST /admin/api/records/:id/enrich-locationandPOST /admin/api/records/enrich-location-all, withforce,geocode, andmin_confidencecontrols.^4.69.0(lockfile refresh).Behavior changes:
/map/canadanow preferscity_verified+location_lat/location_lonwhen available.only_missing && !forcemode processes unchecked missing rows first to avoid endlessly reprocessing unresolved records.412with a clear schema message if migration0003has not been applied.Risks:
4.69.0raises local tooling expectations to Node 20+.Follow-ups:
0003_location_enrichment.sqlin staging/production before running backfill.