waterdata: tighten stats + OGC pagination (geometry, KeyError, non-200)#255
Draft
thodson-usgs wants to merge 4 commits intoDOI-USGS:mainfrom
Draft
waterdata: tighten stats + OGC pagination (geometry, KeyError, non-200)#255thodson-usgs wants to merge 4 commits intoDOI-USGS:mainfrom
thodson-usgs wants to merge 4 commits intoDOI-USGS:mainfrom
Conversation
…-200
Three correctness fixes to the two pagination loops in
dataretrieval.waterdata.utils.
* `get_stats_data` honored `GEOPANDAS` for the first page but
hard-coded `geopd=False` on every continuation page. With geopandas
installed, a multi-page stats response started as a `GeoDataFrame`
and pages 2..N came back as plain `DataFrame`s; `pd.concat` then
silently downgraded the result and the caller lost geometry / CRS.
Use `geopd=GEOPANDAS` on every page.
* `get_stats_data` indexed `body["next"]` directly, raising `KeyError`
on responses without that key (some terminal responses simply omit
it). Switch to `body.get("next")`, which produces `None` and exits
the loop cleanly.
* Both `get_stats_data`'s in-loop request and `_walk_pages`'s in-loop
request returned the response without checking `status_code`. A 4xx
or 5xx page whose body happened to JSON-decode could be appended as
data, then pagination quietly stopped — the caller got a partial
result with no warning. Add an explicit `if status_code != 200`
raise inside each loop so the existing log-and-truncate handler
fires deliberately rather than incidentally.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both pagination loops now had four call sites repeating ``if resp.status_code != 200: raise RuntimeError(_error_body(resp))``. Move that into a one-line helper alongside ``_error_body`` and call it from every site. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 4, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This PR tightens pagination behavior in dataretrieval.waterdata.utils, specifically for WaterData stats and OGC fetches, so paginated responses preserve expected structure and stop more intentionally on continuation-page failures.
Changes:
- Refactors non-200 response checks into
_raise_if_not_ok()and applies it to initial and continuation-page requests. - Updates
get_stats_data()to tolerate missing"next"keys and to preserveGEOPANDAShandling across all pages. - Adds regression tests around missing
"next"handling and non-200 continuation pages.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
dataretrieval/waterdata/utils.py |
Adjusts pagination/error handling in _walk_pages() and get_stats_data(), including stats continuation parsing. |
tests/waterdata_utils_test.py |
Adds regression tests for pagination edge cases in _walk_pages() and get_stats_data(). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+342
to
+345
| def _raise_if_not_ok(resp: requests.Response) -> None: | ||
| """Raise ``RuntimeError(_error_body(resp))`` for any non-200 response.""" | ||
| if resp.status_code != 200: | ||
| raise RuntimeError(_error_body(resp)) |
| body = resp.json() | ||
| all_dfs.append(_handle_stats_nesting(body, geopd=False)) | ||
| next_token = body["next"] | ||
| all_dfs.append(_handle_stats_nesting(body, geopd=GEOPANDAS)) |
Comment on lines
+87
to
+93
| def test_walk_pages_raises_on_non_200_in_loop(): | ||
| """`_walk_pages` must surface a non-200 mid-loop, not silently truncate. | ||
|
|
||
| Regression: previously any non-200 page was appended (with whatever | ||
| body it had) and pagination quietly stopped because `_get_resp_data` | ||
| or `_next_req_url` raised inside the bare except. The user got a | ||
| partial result with no warning. |
Per copilot review on PR DOI-USGS#255: - _error_body: catch JSONDecodeError when a 4xx/5xx returns plain text or HTML. Previously, _raise_if_not_ok -> _error_body -> resp.json() raised JSONDecodeError on non-JSON bodies, defeating the in-loop status check. - Tests: - Rename test_walk_pages_raises_on_non_200_in_loop to test_walk_pages_truncates_on_non_200_continuation; the assertion verifies log-and-truncate, not raise. - New test_get_stats_data_preserves_geometry_across_pages exercises the GEOPANDAS=True continuation path so a regression to geopd=False on page 2..N is caught. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts: # tests/waterdata_utils_test.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three correctness fixes to the two pagination loops in `dataretrieval.waterdata.utils`.
Test plan
Related PRs
Other open PRs in this bug-review series that touch
dataretrieval/waterdata/utils.py(different functions, no functional conflicts):_format_api_datesaccept ISO 8601._arrange_colsstop mutating caller list._handle_stats_nestingtolerate missing drop columns.