Skip to content

refactor(api)!: access services via submodules, not the flattened top-level namespace#315

Merged
thodson-usgs merged 1 commit into
DOI-USGS:mainfrom
thodson-usgs:refactor/api-access-via-submodules
Jun 1, 2026
Merged

refactor(api)!: access services via submodules, not the flattened top-level namespace#315
thodson-usgs merged 1 commit into
DOI-USGS:mainfrom
thodson-usgs:refactor/api-access-via-submodules

Conversation

@thodson-usgs
Copy link
Copy Markdown
Collaborator

Problem

dataretrieval/__init__.py star-imports every service module:

from dataretrieval.nwis import *
from dataretrieval.waterdata import *
from dataretrieval.wqp import *
...

This flattens all of their public functions into the top-level package namespace, so dataretrieval.get_dv, dataretrieval.get_results, etc. all resolve at the top level. Two problems:

  1. One large, ambiguous top-level namespace — every service's functions share a single flat namespace.
  2. Silent name collisionsget_ratings is defined in both nwis and waterdata, and what_sites in both nwis and wqp. With import *, whichever module is imported last wins and the other is silently shadowed. (mypy flags this as an incompatible re-import — see chore(typing): set up mypy and tighten the package to mypy --strict #314, which only suppressed it.)

Change

Expose the service submodules instead of flattening their contents:

from . import nadp, nwis, samples, streamstats, utils, waterdata, wqp

Callers go through the owning module — the pattern the README, docs, and tests already use:

from dataretrieval import waterdata
df, meta = waterdata.get_ratings(...)   # modern

from dataretrieval import nwis
df, meta = nwis.get_ratings(...)        # legacy — now unambiguous
  • dataretrieval.<module> and from dataretrieval import <module> work for every service.
  • __version__ is unchanged.
  • nldi stays import-on-demand (it requires the optional geopandas), exactly as before — from dataretrieval import nldi.
  • The get_ratings / what_sites collision is now structurally impossible — each lives only under its module. This resolves at the source what chore(typing): set up mypy and tighten the package to mypy --strict #314 had to suppress with a # type: ignore.

Breaking change

Top-level function access is removed:

Before After
dataretrieval.get_dv(...) dataretrieval.nwis.get_dv(...)
dataretrieval.get_results(...) dataretrieval.wqp.get_results(...)
dataretrieval.get_ratings(...) dataretrieval.waterdata.get_ratings(...) (or nwis)
dataretrieval.NoSitesError dataretrieval.utils.NoSitesError

If a softer migration is preferred, a module-level __getattr__ shim could keep the old names working for one release with a DeprecationWarning — happy to add that instead of the hard removal.

Verification

  • New access pattern asserted directly: top-level functions are gone; both get_ratings and both what_sites are reachable via their modules; from dataretrieval import waterdata and explicit nldi import work; __version__ preserved.
  • Full test suite: 469 passed, 2 skipped — the tests already use module-qualified access, so nothing needed changing.
  • README and docs (per-module automodule directives) already use this pattern.

🤖 Generated with Claude Code

…namespace

`dataretrieval/__init__.py` star-imported every service module, flattening all
their public functions into the top-level package namespace. That:
  - leaked one large, ambiguous namespace (`dataretrieval.get_dv`,
    `dataretrieval.get_results`, ...), and
  - silently collided on names defined by more than one service: `get_ratings`
    (nwis vs waterdata) and `what_sites` (nwis vs wqp) resolved to whichever
    module was star-imported last; the other was shadowed.

Replace the star-imports with submodule imports, so callers reach each service
through its own module -- the pattern the README, the docs (per-module
autodoc), and the tests already use:

    from dataretrieval import waterdata
    df, meta = waterdata.get_ratings(...)

    from dataretrieval import nwis
    df, meta = nwis.get_ratings(...)

`dataretrieval.<name>` and `from dataretrieval import <name>` still work for
every service module, and `__version__` is unchanged. `nldi` remains
import-on-demand (it pulls in the optional geopandas dependency). The collision
is now impossible -- each `get_ratings` / `what_sites` lives only under its
module.

BREAKING CHANGE: top-level function access (e.g. `dataretrieval.get_dv`) is
removed; use the service module (`dataretrieval.nwis.get_dv`). Exception and
helper classes likewise move under their modules (e.g.
`dataretrieval.utils.NoSitesError`).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@thodson-usgs thodson-usgs force-pushed the refactor/api-access-via-submodules branch from f416956 to ca5926c Compare June 1, 2026 02:30
@thodson-usgs thodson-usgs marked this pull request as ready for review June 1, 2026 03:02
@thodson-usgs thodson-usgs merged commit b375f19 into DOI-USGS:main Jun 1, 2026
8 checks passed
@thodson-usgs thodson-usgs deleted the refactor/api-access-via-submodules branch June 1, 2026 03:02
thodson-usgs added a commit to thodson-usgs/dataretrieval-python that referenced this pull request Jun 2, 2026
…ils/ package

## Why

`dataretrieval/waterdata/utils.py` had grown to 2033 LOC spanning ~6 unrelated
domains -- request building, response parsing, result finalization,
pagination/async, stats post-processing, and validation -- plus constants and
the public engines. It was the package's one genuine god-module. (An
architecture review found the package's OO is otherwise appropriate, so this is
a modularization, not an OO-pattern refactor.)

## What

Convert `utils.py` into a `utils/` package: the public surface stays in
`utils/__init__.py` (a thin facade) and the implementation is split across six
cohesive submodules, moving every definition verbatim (no signature/logic
changes):

| submodule | holds |
|---|---|
| `utils/constants.py` | URLs, `_OUTPUT_ID_BY_SERVICE`, regexes, param sets (dependency-free) |
| `utils/http.py` | headers, `_error_body`, `_raise_for_non_200`, retry-after |
| `utils/validate.py` | arg normalization/validation (`_get_args`, `_check_*`) |
| `utils/requests.py` | request building (`_construct_api_requests`, CQL2, dates) |
| `utils/responses.py` | geometry-agnostic parsing / finalization / stats shaping |
| `utils/engine.py` | pagination/async driver (`_paginate`, `_run_sync`, ...) |

`utils/__init__.py` re-exports the internal API (explicit `__all__`, 56 names),
so every existing `from dataretrieval.waterdata.utils import ...` and
`mock.patch("dataretrieval.waterdata.utils.<name>")` keeps working -- no import
sites or tests were touched. `dataretrieval.waterdata.utils` resolves to the
package's `__init__`, so the import path is unchanged from when it was a module.

Seven functions remain physically defined in `utils/__init__.py`
(`get_ogc_data`, `_fetch_once`, `get_stats_data`, `_get_resp_data`,
`_ogc_parse_response`, `_walk_pages`, `_handle_stats_nesting`) because the test
suite monkeypatches them (or `gpd`) by their `dataretrieval.waterdata.utils.*`
name, and a function's global lookups resolve in its defining module. The
geopandas probe stays with them, and the pagination logger keeps the name
`dataretrieval.waterdata.utils` (a caplog test pins it). These could later move
to the `engine`/`responses` submodules -- which do not import the package, so
there is no cycle -- but that requires re-targeting the test patches; left as a
follow-up.

## Behavior-preserving

- 56 top-level definitions moved verbatim -- none lost, none duplicated.
- 469 tests pass, 2 skipped; ruff clean; submodules import without cycles
  (`constants` <- `http`/`validate` <- `requests`/`responses` <- `engine` <-
  `__init__`); `chunking.py` untouched.

## Note

Overlaps with the error-taxonomy (DOI-USGS#313) and namespace (DOI-USGS#315) PRs on `waterdata/`
imports -- sequence on merge.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant