refactor(errors): unify request errors under a DataRetrievalError root#313
Draft
thodson-usgs wants to merge 1 commit into
Draft
refactor(errors): unify request errors under a DataRetrievalError root#313thodson-usgs wants to merge 1 commit into
thodson-usgs wants to merge 1 commit into
Conversation
…onomy
An HTTP failure used to surface as a different exception type depending on which
module made the request:
- legacy query() (wqp, nwis, ngwmn, nldi): ValueError, or NoSitesError(Exception)
- waterdata: RateLimited / ServiceUnavailable (RuntimeError),
RequestTooLarge (ValueError), ChunkInterrupted (RuntimeError)
- nadp / streamstats: bare httpx.HTTPStatusError
so no single `except` clause could catch "any dataretrieval request failure."
Add a `DataRetrievalError` base and root the existing exceptions on it, so a
caller can `except dataretrieval.DataRetrievalError` for any request failure.
Each subclass also keeps the built-in it has historically raised (BadRequestError
is a ValueError; RateLimited is a RuntimeError), so existing `except ValueError` /
`except RuntimeError` handlers keep working unchanged.
The taxonomy lives in a new, dependency-free `dataretrieval/exceptions.py` — the
single home for it (cf. requests.exceptions, botocore.exceptions) — rather than
buried in the pandas/httpx-heavy utils.py:
- exceptions.py: DataRetrievalError + the legacy query() types (BadRequestError,
NotFoundError, RequestTooLargeError, ServiceUnavailableError; all ValueError)
+ NoSitesError + the Water Data transport types (RateLimited, ServiceUnavailable,
RequestTooLarge). Explicit __all__, re-exported from dataretrieval/__init__.py
so the top-level export is intentional, not an accident of `import *`.
- utils.py keeps the behavior: query() and a consolidated _raise_for_status()
status->exception mapper; it imports the types it raises.
- chunking.py keeps the chunker-specific resumable types (ChunkInterrupted and
its subclasses, which carry a ChunkedCall resume handle) and imports the
transport types from exceptions.
The old import paths still resolve (utils.NoSitesError,
waterdata.chunking.RateLimited, ...) via re-import, so nothing downstream breaks.
Out of scope (follow-ups): nadp/streamstats still raise httpx.HTTPStatusError;
nldi's manual non-200 ValueError isn't rooted; waterdata.utils._raise_for_non_200's
catch-all for non-retryable 4xx stays a bare RuntimeError (a deliberate
fatal/non-resumable signal the chunker relies on).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2afd2d7 to
9801177
Compare
thodson-usgs
added a commit
to thodson-usgs/dataretrieval-python
that referenced
this pull request
Jun 2, 2026
…ils/ package
## Why
`dataretrieval/waterdata/utils.py` had grown to 2033 LOC spanning ~6 unrelated
domains -- request building, response parsing, result finalization,
pagination/async, stats post-processing, and validation -- plus constants and
the public engines. It was the package's one genuine god-module. (An
architecture review found the package's OO is otherwise appropriate, so this is
a modularization, not an OO-pattern refactor.)
## What
Convert `utils.py` into a `utils/` package: the public surface stays in
`utils/__init__.py` (a thin facade) and the implementation is split across six
cohesive submodules, moving every definition verbatim (no signature/logic
changes):
| submodule | holds |
|---|---|
| `utils/constants.py` | URLs, `_OUTPUT_ID_BY_SERVICE`, regexes, param sets (dependency-free) |
| `utils/http.py` | headers, `_error_body`, `_raise_for_non_200`, retry-after |
| `utils/validate.py` | arg normalization/validation (`_get_args`, `_check_*`) |
| `utils/requests.py` | request building (`_construct_api_requests`, CQL2, dates) |
| `utils/responses.py` | geometry-agnostic parsing / finalization / stats shaping |
| `utils/engine.py` | pagination/async driver (`_paginate`, `_run_sync`, ...) |
`utils/__init__.py` re-exports the internal API (explicit `__all__`, 56 names),
so every existing `from dataretrieval.waterdata.utils import ...` and
`mock.patch("dataretrieval.waterdata.utils.<name>")` keeps working -- no import
sites or tests were touched. `dataretrieval.waterdata.utils` resolves to the
package's `__init__`, so the import path is unchanged from when it was a module.
Seven functions remain physically defined in `utils/__init__.py`
(`get_ogc_data`, `_fetch_once`, `get_stats_data`, `_get_resp_data`,
`_ogc_parse_response`, `_walk_pages`, `_handle_stats_nesting`) because the test
suite monkeypatches them (or `gpd`) by their `dataretrieval.waterdata.utils.*`
name, and a function's global lookups resolve in its defining module. The
geopandas probe stays with them, and the pagination logger keeps the name
`dataretrieval.waterdata.utils` (a caplog test pins it). These could later move
to the `engine`/`responses` submodules -- which do not import the package, so
there is no cycle -- but that requires re-targeting the test patches; left as a
follow-up.
## Behavior-preserving
- 56 top-level definitions moved verbatim -- none lost, none duplicated.
- 469 tests pass, 2 skipped; ruff clean; submodules import without cycles
(`constants` <- `http`/`validate` <- `requests`/`responses` <- `engine` <-
`__init__`); `chunking.py` untouched.
## Note
Overlaps with the error-taxonomy (DOI-USGS#313) and namespace (DOI-USGS#315) PRs on `waterdata/`
imports -- sequence on merge.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
An HTTP failure surfaces as a different exception type depending on which module made the request, so there is no single
excepta caller can use to handle "anydataretrievalrequest failure":query()(wqp,nwis,ngwmn,nldi)ValueError, orNoSitesError(Exception)waterdataRateLimited/ServiceUnavailable(RuntimeError),RequestTooLarge(ValueError),ChunkInterrupted(RuntimeError)nadp,streamstatshttpx.HTTPStatusErrorChange
Introduce a single base class,
dataretrieval.utils.DataRetrievalError(re-exported asdataretrieval.DataRetrievalError), and root the existing exceptions on it. Unification is at the root only — each exception keeps the built-in type it has always raised, so this is backward compatible:query()path — new typed errorsBadRequestError(400),NotFoundError(404),RequestTooLargeError(414 / over-long URL),ServiceUnavailableError(5xx), each(DataRetrievalError, ValueError). Same messages, same statuses raised on; stillValueError.query()is consolidated into a reusable_raise_for_status()helper.NoSitesErrornow subclassesDataRetrievalError(wasException).waterdata—RateLimited/ServiceUnavailable/RequestTooLarge/ChunkInterruptedgainDataRetrievalErroras an additional base (stillRuntimeError/ValueError; the chunker'sisinstanceclassification is unaffected).Callers can now write:
while
except ValueError(legacy path) andexcept RuntimeError(waterdata retryable types) continue to work.Backward compatibility
No behavior change beyond the widened type hierarchy — all raised messages and the exact set of statuses
query()raises on are identical. 208 tests pass (200 existing + 8 new). The new tests assert both the unified hierarchy and the preserved built-in bases. Live-verified through the refactoredquery(): a real200(happy path) and a real over-long-URL414(nowRequestTooLargeError, still caught asValueError).Out of scope (proposed follow-ups)
This is a focused first step that establishes the root and migrates the two major subsystems. Deliberately left for follow-up PRs, flagged here for discussion:
nadp/streamstatsstill raisehttpx.HTTPStatusError— a one-line swap each to_raise_for_status, but it changes their public error type.nldi's manual non-200ValueErrorandnwis._parse_json_or_raise's HTML-detectionValueErroraren't rooted yet.waterdata._raise_for_non_200's catch-all for non-retryable 4xx stays a bareRuntimeError— it is load-bearing for the chunker's fatal-vs-resumable classifier, so rooting it needs a dedicatedClientError(DataRetrievalError, RuntimeError)plus a chunker-classification review.🤖 Generated with Claude Code