Isolate filter feature for clean rollback; flag lex-comparison pitfall#240
Merged
thodson-usgs merged 3 commits intoDOI-USGS:mainfrom Apr 28, 2026
Merged
Conversation
559b466 to
68d1a81
Compare
Move all CQL filter and chunking logic out of api.py / utils.py into a dedicated dataretrieval/waterdata/filters.py module (with chunked as a decorator on the per-request fetch), and extract get_nearest_continuous into a sibling nearest.py — so the entire filter feature can be removed by deleting two source files, two test files, and two re-export lines. Adds a pre-flight check that raises on unquoted-numeric comparisons (value > 1000, parameter_code IN (60, 61), value BETWEEN 5 AND 10), since every Water Data API queryable is string-typed and the server either returns HTTP 500 or silently produces lexicographically-sorted wrong rows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fbd0f51 to
d81dd33
Compare
value
Contributor
There was a problem hiding this comment.
Pull request overview
This PR reorganizes Water Data OGC filter support into dedicated modules to enable easy rollback, while adding a client-side preflight guard that raises on unquoted numeric literals in cql-text filters (to prevent server 500s and lexicographic-comparison footguns). It also extracts get_nearest_continuous into its own module without changing the public API surface.
Changes:
- Moved filter/chunking/URL-budget logic into
dataretrieval/waterdata/filters.pyand applied it via a@chunked(...)decorator around the single-request fetch path. - Added
_check_numeric_filter_pitfallto reject unquoted numeric comparisons incql-textfilters, with extensive new tests. - Extracted
get_nearest_continuousintodataretrieval/waterdata/nearest.pyand updated imports/tests accordingly.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
dataretrieval/waterdata/filters.py |
New home for filter language type alias, chunking + URL-budget probe, and numeric-literal pitfall guard. |
dataretrieval/waterdata/utils.py |
Removes inline chunking pipeline; wraps _fetch_once with @filters.chunked(...) and returns metadata from the (possibly aggregated) response. |
dataretrieval/waterdata/nearest.py |
New module containing get_nearest_continuous extracted from api.py. |
dataretrieval/waterdata/api.py |
Updates FILTER_LANG import and docstrings; removes inlined get_nearest_continuous. |
dataretrieval/waterdata/types.py |
Removes FILTER_LANG type alias (now in filters.py). |
dataretrieval/waterdata/__init__.py |
Updates re-exports for FILTER_LANG and get_nearest_continuous. |
tests/waterdata_filters_test.py |
New test module covering filter passthrough, chunking behavior, and numeric-literal pitfall guard. |
tests/waterdata_utils_test.py |
Removes filter/chunking tests moved into waterdata_filters_test.py. |
tests/waterdata_nearest_test.py |
Updates imports/patch targets to follow the new nearest.py module. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two changes, in one PR because the second motivates the first:
Isolate the CQL
filterfeature into its own module so the entire feature can be removed by deleting two source files, two test files, and two re-export lines. All filter and chunking logic moves out ofapi.py/utils.pyintodataretrieval/waterdata/filters.py, andget_nearest_continuousmoves into a siblingdataretrieval/waterdata/nearest.py. No public API change.Add a pre-flight check that raises on unquoted-numeric comparisons in
cql-textfilters (value > 1000,parameter_code IN (60, 61),value BETWEEN 5 AND 10). Every queryable on the Water Data OGC API istype=stringserver-side, so unquoted numeric literals either get rejected with HTTP 500 or silently produce lexicographic results.Motivation
Concern raised on the R side (DOI-USGS/dataRetrieval#880):
That's a real failure mode for the Python side too. The pre-flight check addresses it; the module isolation makes the whole feature easy to roll back if it turns out to cause more confusion than it solves.
What changed
New modules
dataretrieval/waterdata/filters.py(+331) — ownsFILTER_LANG, top-level-ORsplitting, URL-byte-budget probing, the lex-pitfall guard, and achunkeddecorator that wraps the single-request fetch with all of the above.dataretrieval/waterdata/nearest.py(+241) —get_nearest_continuousextracted as-is fromapi.py.tests/waterdata_filters_test.py(+589) — all filter/chunking tests, plus 35 new tests for the lex-pitfall guard (every op × both orderings,IN,BETWEEN, negation, quoted-string false-positive guards, end-to-end throughget_continuous).Slimmed
dataretrieval/waterdata/api.py(−350 / +45) — chunking is now applied via@chunked(build_request=...)on the per-request fetch.dataretrieval/waterdata/utils.py(−241 / +15) — filter helpers removed.tests/waterdata_utils_test.py(−444) — filter tests moved towaterdata_filters_test.py.Re-exports in
dataretrieval/waterdata/__init__.pyupdated; no public symbol added or removed.Behavior of the lex-pitfall check
Quoted literals (
value >= '1000') are not flagged — the caller has signalled they want sort-order semantics.Live-confirmed against
USGS-02238500/continuous:Rollback path
If the
filterfeature is rolled back, the change is mechanical:dataretrieval/waterdata/filters.pytests/waterdata_filters_test.pyfrom .filters import FILTER_LANGand theFILTER_LANGentry in__all__indataretrieval/waterdata/__init__.py@chunked(build_request=...)decorator andfilter/filter_langkwargs fromapi.py/utils.pyget_nearest_continuous(innearest.py) is independent and stays.Test plan
pytest tests/waterdata_filters_test.py tests/waterdata_utils_test.py tests/waterdata_nearest_test.py— 111/111 pass.USGS-02238500/continuous: unquoted numeric RHS consistently returns 500; quoted literal returns 200 with lex-sorted results.🤖 Generated with Claude Code