Skip to content

Make _handle_stats_nesting tolerant of missing drop columns#257

Merged
thodson-usgs merged 1 commit intoDOI-USGS:mainfrom
thodson-usgs:fix/handle-stats-nesting-errors-ignore
May 4, 2026
Merged

Make _handle_stats_nesting tolerant of missing drop columns#257
thodson-usgs merged 1 commit intoDOI-USGS:mainfrom
thodson-usgs:fix/handle-stats-nesting-errors-ignore

Conversation

@thodson-usgs
Copy link
Copy Markdown
Collaborator

@thodson-usgs thodson-usgs commented May 3, 2026

Summary

_handle_stats_nesting has two .drop(columns=...) calls (one for the geopandas branch, one for the pandas branch) that hardcode literal column names — [\"type\", \"properties.data\"] and [\"data\"] respectively. If a stats response ever comes back in a slightly different shape (renamed key, missing optional key, edge-case feature without the expected nesting), drop() raises KeyError and aborts the helper.

The sibling pd.json_normalize(...) call later in the same function already passes errors=\"ignore\", so this PR adds the same to the two drop() calls for parity.

Diff

# dataretrieval/waterdata/utils.py:894-901
if not geopd:
    df = pd.json_normalize(body[\"features\"]).drop(
        columns=[\"type\", \"properties.data\"], errors=\"ignore\"  # <-- added
    )
    df.columns = df.columns.str.split(\".\").str[-1]
else:
    df = gpd.GeoDataFrame.from_features(body[\"features\"]).drop(
        columns=[\"data\"], errors=\"ignore\"  # <-- added
    )

Test plan

  • New test test_handle_stats_nesting_tolerates_missing_drop_columns constructs a stats body whose features lack the top-level type key and confirms the function returns a populated DataFrame.
  • Verified that the new test fails on main with KeyError: \"['type'] not found in axis\".
  • Full waterdata_utils test suite passes (5 tests).
  • Full test suite (excluding deprecated nwis_test.py): 197 passed.

🤖 Generated with Claude Code

Related PRs

Other open PRs in this bug-review series that touch dataretrieval/waterdata/utils.py (different functions, no functional conflicts):

Both `.drop()` calls in `_handle_stats_nesting` (for the geopandas and
pandas branches) hardcoded literal column names — `["type",
"properties.data"]` and `["data"]`. If a stats response is ever returned
in a slightly different shape (or one of those keys is renamed/removed),
`drop()` raises `KeyError` and aborts the helper. The sibling
`pd.json_normalize(...)` call later in the same function already passes
`errors="ignore"`, so add the same to the two `drop()` calls for parity.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@thodson-usgs thodson-usgs changed the title Make tolerant of missing drop columns Make _handle_stats_nesting tolerant of missing drop columns May 3, 2026
@thodson-usgs thodson-usgs requested a review from Copilot May 4, 2026 14:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the dataretrieval.waterdata.utils._handle_stats_nesting helper against minor upstream schema variations by making its initial DataFrame/GeoDataFrame column drops tolerant of missing columns, preventing avoidable KeyErrors during stats response normalization.

Changes:

  • Add errors="ignore" to the pandas-branch drop(columns=["type", "properties.data"]) call.
  • Add errors="ignore" to the geopandas-branch drop(columns=["data"]) call.
  • Add a regression test ensuring _handle_stats_nesting succeeds when a drop-target column (e.g., type) is absent.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
dataretrieval/waterdata/utils.py Makes _handle_stats_nesting resilient to missing columns during initial flattening by ignoring absent drop targets.
tests/waterdata_utils_test.py Adds regression coverage for missing drop-target columns in stats feature normalization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@thodson-usgs thodson-usgs marked this pull request as ready for review May 4, 2026 14:55
@thodson-usgs thodson-usgs merged commit 353d379 into DOI-USGS:main May 4, 2026
12 checks passed
@thodson-usgs thodson-usgs deleted the fix/handle-stats-nesting-errors-ignore branch May 4, 2026 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants