Summary
waterdata getters return GeoDataFrames whose .crs is None, even though the
data is published in EPSG:4326 (WGS84) and the docstrings say so. The GeoDataFrame
is built via gpd.GeoDataFrame.from_features(...) without a crs= argument, so the
coordinate reference system is never attached.
This affects the modern, primary Water Data API path — the one most users hit.
Where
dataretrieval/ogc/shaping.py:143 — _get_resp_data:
df = gpd.GeoDataFrame.from_features(
[f if "geometry" in f else {**f, "geometry": None} for f in features]
) # no crs=
dataretrieval/waterdata/stats.py:113 — same pattern in the stats path.
Meanwhile the docstring for get_monitoring_locations states coordinates are
published in EPSG:4326 (dataretrieval/waterdata/api.py:638), so the returned
object contradicts the documentation.
Expected vs. actual
- Expected:
gdf.crs == "EPSG:4326".
- Actual:
gdf.crs is None.
Reproduction
from dataretrieval import waterdata
df, md = waterdata.get_monitoring_locations(state="Maryland", site_type_code="ST")
print(type(df)) # geopandas.GeoDataFrame
print(df.crs) # -> None (expected: EPSG:4326)
# Any CRS-aware operation now fails or warns:
df.to_crs("EPSG:3857")
# ValueError: Cannot transform naive geometries.
# Please set a crs on the object first.
Impact
Because the CRS is missing, standard GeoPandas workflows break or emit warnings:
.to_crs(...) raises ValueError: Cannot transform naive geometries.
.explore() / folium mapping requires a manual .set_crs(4326) first — the
repo's own map demo has to do this (acknowledged in the pyproject.toml
comment on the doc extra, which notes .set_crs().explore() is needed).
- Spatial joins and distance computations against other layers are unreliable
without a defined CRS.
Inconsistency across modules
The CRS policy is applied inconsistently across the package:
| Module |
Sets CRS? |
Value |
nldi |
✅ |
EPSG:4326 (dataretrieval/nldi.py:15,48) |
nwis (legacy) |
✅ |
EPSG:4269 / NAD83 (dataretrieval/nwis.py:43,162) |
waterdata (primary) |
❌ |
None |
The most-used module is the only one that omits the CRS.
Proposed fix
Attach the documented CRS where the GeoDataFrame is constructed, e.g. define a
module-level _CRS = "EPSG:4326" (mirroring nldi/nwis) and pass it:
df = gpd.GeoDataFrame.from_features(
[f if "geometry" in f else {**f, "geometry": None} for f in features],
crs="EPSG:4326",
)
Apply the same in dataretrieval/waterdata/stats.py:113. A regression test should
assert gdf.crs is EPSG:4326 for a getter that returns geometry (and remains
consistent through pagination/pd.concat of empty + non-empty pages).
Note: when skip_geometry=True (or all geometries are null) the result is a plain
DataFrame with no geometry, so the CRS assertion should only apply to the
geometry-bearing case.
Summary
waterdatagetters returnGeoDataFrames whose.crsisNone, even though thedata is published in EPSG:4326 (WGS84) and the docstrings say so. The
GeoDataFrameis built via
gpd.GeoDataFrame.from_features(...)without acrs=argument, so thecoordinate reference system is never attached.
This affects the modern, primary Water Data API path — the one most users hit.
Where
dataretrieval/ogc/shaping.py:143—_get_resp_data:dataretrieval/waterdata/stats.py:113— same pattern in the stats path.Meanwhile the docstring for
get_monitoring_locationsstates coordinates arepublished in EPSG:4326 (
dataretrieval/waterdata/api.py:638), so the returnedobject contradicts the documentation.
Expected vs. actual
gdf.crs == "EPSG:4326".gdf.crs is None.Reproduction
Impact
Because the CRS is missing, standard GeoPandas workflows break or emit warnings:
.to_crs(...)raisesValueError: Cannot transform naive geometries..explore()/ folium mapping requires a manual.set_crs(4326)first — therepo's own map demo has to do this (acknowledged in the
pyproject.tomlcomment on the
docextra, which notes.set_crs().explore()is needed).without a defined CRS.
Inconsistency across modules
The CRS policy is applied inconsistently across the package:
nldiEPSG:4326(dataretrieval/nldi.py:15,48)nwis(legacy)EPSG:4269/ NAD83 (dataretrieval/nwis.py:43,162)waterdata(primary)NoneThe most-used module is the only one that omits the CRS.
Proposed fix
Attach the documented CRS where the
GeoDataFrameis constructed, e.g. define amodule-level
_CRS = "EPSG:4326"(mirroringnldi/nwis) and pass it:Apply the same in
dataretrieval/waterdata/stats.py:113. A regression test shouldassert
gdf.crsis EPSG:4326 for a getter that returns geometry (and remainsconsistent through pagination/
pd.concatof empty + non-empty pages).Note: when
skip_geometry=True(or all geometries are null) the result is a plainDataFramewith no geometry, so the CRS assertion should only apply to thegeometry-bearing case.