fix(waterdata): Raise on mid-pagination failures instead of silently truncating#279
Merged
Merged
Conversation
0d02181 to
463106c
Compare
463106c to
83f1389
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR changes waterdata pagination behavior to fail loudly (raise) on any mid-pagination error rather than returning a silently truncated DataFrame, and updates tests + release notes to reflect the new contract.
Changes:
- Introduce
_paginated_failure_message(...)and wrap mid-pagination exceptions in aRuntimeErrorthat chains the original failure via__cause__. - Update
_walk_pagesandget_stats_datato raise on mid-pagination failures (429/5xx/network errors) instead of returning partial results. - Replace the prior “best-effort” pagination tests with new assertions that validate the raised exception and chaining behavior; document the behavior change in
NEWS.md.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
dataretrieval/waterdata/utils.py |
Adds shared failure-message helper and switches mid-pagination handling to raise with chained cause. |
tests/waterdata_utils_test.py |
Updates pagination-failure tests to assert raising behavior instead of logging/partial returns. |
NEWS.md |
Documents the behavior change for paginated waterdata requests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
20a9646 to
58f0aa4
Compare
58f0aa4 to
d654dd4
Compare
d654dd4 to
0bc9551
Compare
`_walk_pages` and `get_stats_data`'s pagination loops have, since PR DOI-USGS#273, logged failures correctly but preserved a "best effort" contract of returning whatever pages had been collected when a follow-up page failed. The waterdata API exposes no resume cursor once a paginated walk is interrupted, so the partial DataFrame couldn't reliably be extended — silently returning it handed callers truncated data they had no way to know was truncated. Both loops now wrap any mid-pagination exception (429, 5xx, network error) in a ``RuntimeError`` carrying: - the number of pages successfully collected, - the upstream cause (as both the message text and ``__cause__`` for programmatic inspection), - a short menu of recovery actions (wait and retry, reduce request size, or obtain an API token). The shared helper ``_paginated_failure_message`` builds the user- facing string so both loops stay aligned. Behavior change: callers that previously consumed partial DataFrames on transient upstream blips will now see an exception they need to handle (typically: retry, possibly with a smaller ``limit`` or narrower query). Called out in NEWS. Tests: - Replaced the prior best-effort-preserves-partial assertions with raises-with-cause-chain assertions for all three failure modes (connection error, 5xx, 429), in both ``_walk_pages`` and ``get_stats_data`` variants. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0bc9551 to
6dee412
Compare
This was referenced May 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The paginated request loops in
waterdata(_walk_pagesandget_stats_data) have, since #273, logged page failures correctly but kept a "best effort" contract: when any follow-up page failed, log the failure and return whatever pages had been collected so far. That contract was the wrong default — the waterdata API exposes no resume cursor once a paginated walk is interrupted, so the partial DataFrame couldn't reliably be extended. Callers silently received truncated data they had no way to detect.Both loops now raise on any mid-pagination failure (429, 5xx, network error). The wrapper
RuntimeErrorcarries:__cause__for programmatic inspection),limit), or obtain an API token.A small shared helper
_paginated_failure_messagebuilds the user-facing string so both loops stay aligned.Behavior change
Callers that previously consumed partial DataFrames on transient upstream blips (5xx, network errors) will now see an exception they need to handle — typically: retry, possibly with a smaller
limitor narrower query. Called out in NEWS.This is intentional. With no resume API, "best effort partial" was just "silent truncation with friendly framing." A loud error is strictly safer; programmatic callers can
try/except RuntimeErrorand usee.__cause__(isinstance(..., requests.ConnectionError), status-code parsing, etc.) to branch on the failure mode.Tests
Three new tests cover the new contract in both
_walk_pagesandget_stats_data—_raises_on_connection_error_mid_pagination,_raises_on_5xx_mid_pagination,_raises_on_mid_pagination_429. The two pre-existing best-effort tests were replaced (same scenarios, new assertion shape).Ordering / relationship to other open PRs
This PR is independent of #276 (the multi-value GET chunker) and can land in either order.
This PR should land before #278 (the draft parallel chunker). Parallel sub-requests make the silent-truncation behavior easy to trigger empirically (one of two runs returned 330,174 rows instead of 337,808 with no exception). With this fix in place, the chunker can be enhanced separately to catch the
RuntimeErrorand wrap it inQuotaExhaustedwithpartial_frame— but the silent-data-loss bug is closed at the source.Refs #273