refactor(api)!: access services via submodules, not the flattened top-level namespace#315
Merged
thodson-usgs merged 1 commit intoJun 1, 2026
Conversation
…namespace
`dataretrieval/__init__.py` star-imported every service module, flattening all
their public functions into the top-level package namespace. That:
- leaked one large, ambiguous namespace (`dataretrieval.get_dv`,
`dataretrieval.get_results`, ...), and
- silently collided on names defined by more than one service: `get_ratings`
(nwis vs waterdata) and `what_sites` (nwis vs wqp) resolved to whichever
module was star-imported last; the other was shadowed.
Replace the star-imports with submodule imports, so callers reach each service
through its own module -- the pattern the README, the docs (per-module
autodoc), and the tests already use:
from dataretrieval import waterdata
df, meta = waterdata.get_ratings(...)
from dataretrieval import nwis
df, meta = nwis.get_ratings(...)
`dataretrieval.<name>` and `from dataretrieval import <name>` still work for
every service module, and `__version__` is unchanged. `nldi` remains
import-on-demand (it pulls in the optional geopandas dependency). The collision
is now impossible -- each `get_ratings` / `what_sites` lives only under its
module.
BREAKING CHANGE: top-level function access (e.g. `dataretrieval.get_dv`) is
removed; use the service module (`dataretrieval.nwis.get_dv`). Exception and
helper classes likewise move under their modules (e.g.
`dataretrieval.utils.NoSitesError`).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
f416956 to
ca5926c
Compare
thodson-usgs
added a commit
to thodson-usgs/dataretrieval-python
that referenced
this pull request
Jun 2, 2026
…ils/ package
## Why
`dataretrieval/waterdata/utils.py` had grown to 2033 LOC spanning ~6 unrelated
domains -- request building, response parsing, result finalization,
pagination/async, stats post-processing, and validation -- plus constants and
the public engines. It was the package's one genuine god-module. (An
architecture review found the package's OO is otherwise appropriate, so this is
a modularization, not an OO-pattern refactor.)
## What
Convert `utils.py` into a `utils/` package: the public surface stays in
`utils/__init__.py` (a thin facade) and the implementation is split across six
cohesive submodules, moving every definition verbatim (no signature/logic
changes):
| submodule | holds |
|---|---|
| `utils/constants.py` | URLs, `_OUTPUT_ID_BY_SERVICE`, regexes, param sets (dependency-free) |
| `utils/http.py` | headers, `_error_body`, `_raise_for_non_200`, retry-after |
| `utils/validate.py` | arg normalization/validation (`_get_args`, `_check_*`) |
| `utils/requests.py` | request building (`_construct_api_requests`, CQL2, dates) |
| `utils/responses.py` | geometry-agnostic parsing / finalization / stats shaping |
| `utils/engine.py` | pagination/async driver (`_paginate`, `_run_sync`, ...) |
`utils/__init__.py` re-exports the internal API (explicit `__all__`, 56 names),
so every existing `from dataretrieval.waterdata.utils import ...` and
`mock.patch("dataretrieval.waterdata.utils.<name>")` keeps working -- no import
sites or tests were touched. `dataretrieval.waterdata.utils` resolves to the
package's `__init__`, so the import path is unchanged from when it was a module.
Seven functions remain physically defined in `utils/__init__.py`
(`get_ogc_data`, `_fetch_once`, `get_stats_data`, `_get_resp_data`,
`_ogc_parse_response`, `_walk_pages`, `_handle_stats_nesting`) because the test
suite monkeypatches them (or `gpd`) by their `dataretrieval.waterdata.utils.*`
name, and a function's global lookups resolve in its defining module. The
geopandas probe stays with them, and the pagination logger keeps the name
`dataretrieval.waterdata.utils` (a caplog test pins it). These could later move
to the `engine`/`responses` submodules -- which do not import the package, so
there is no cycle -- but that requires re-targeting the test patches; left as a
follow-up.
## Behavior-preserving
- 56 top-level definitions moved verbatim -- none lost, none duplicated.
- 469 tests pass, 2 skipped; ruff clean; submodules import without cycles
(`constants` <- `http`/`validate` <- `requests`/`responses` <- `engine` <-
`__init__`); `chunking.py` untouched.
## Note
Overlaps with the error-taxonomy (DOI-USGS#313) and namespace (DOI-USGS#315) PRs on `waterdata/`
imports -- sequence on merge.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
dataretrieval/__init__.pystar-imports every service module:This flattens all of their public functions into the top-level package namespace, so
dataretrieval.get_dv,dataretrieval.get_results, etc. all resolve at the top level. Two problems:get_ratingsis defined in bothnwisandwaterdata, andwhat_sitesin bothnwisandwqp. Withimport *, whichever module is imported last wins and the other is silently shadowed. (mypy flags this as an incompatible re-import — see chore(typing): set up mypy and tighten the package to mypy --strict #314, which only suppressed it.)Change
Expose the service submodules instead of flattening their contents:
Callers go through the owning module — the pattern the README, docs, and tests already use:
dataretrieval.<module>andfrom dataretrieval import <module>work for every service.__version__is unchanged.nldistays import-on-demand (it requires the optionalgeopandas), exactly as before —from dataretrieval import nldi.get_ratings/what_sitescollision is now structurally impossible — each lives only under its module. This resolves at the source what chore(typing): set up mypy and tighten the package to mypy --strict #314 had to suppress with a# type: ignore.Breaking change
Top-level function access is removed:
dataretrieval.get_dv(...)dataretrieval.nwis.get_dv(...)dataretrieval.get_results(...)dataretrieval.wqp.get_results(...)dataretrieval.get_ratings(...)dataretrieval.waterdata.get_ratings(...)(ornwis)dataretrieval.NoSitesErrordataretrieval.utils.NoSitesErrorIf a softer migration is preferred, a module-level
__getattr__shim could keep the old names working for one release with aDeprecationWarning— happy to add that instead of the hard removal.Verification
get_ratingsand bothwhat_sitesare reachable via their modules;from dataretrieval import waterdataand explicitnldiimport work;__version__preserved.automoduledirectives) already use this pattern.🤖 Generated with Claude Code