Add 8 snow-related variables for hydrology departure (v0.2.0)#55
Merged
Conversation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 of #48. New scripts/backfill_edh_snow.py imports safeguards from scripts/_lib.py (preflight_single_instance, with_retry, write_geotiff, log, get_token) and produces both layers of snow variables from a single hourly EDH fetch per year: Monthly natives (12-band/year COG): snow_depth mm SWE monthly mean of daily sde * rsn snowfall mm monthly sum of daily sf (accum-handled) snowmelt mm monthly sum of daily smlt (accum-handled) snow_cover % monthly mean of daily snowc Annual derived (1-band/year COG): swe_max mm annual max of daily sde * rsn snowfall_fraction % 100 * annual_sum_sf / annual_sum_tp snowmelt_doy_50 day DOY when cumsum(smlt) >= 50% of annual snowmelt_rate_peak mm/wk annual max of 7-day rolling sum daily smlt New hourly_accum_to_daily() helper implements the 00:00 UTC reset trick: sf and smlt at valid_time = t (00:00 UTC) represent the accumulation from t-24h to t, so daily total for day D = value at D+1 00:00 UTC. Pattern was documented in the #36 EDH-migration archive but never implemented in code; this is the first call site. Smoke-tested on year 2020 (leap year): all 8 outputs written in 228s. Median DOY-50 = May 8 (freshet), median peak SWE = 276 mm, median snowfall_fraction = 27% — values span plausible BC ranges, alpine pixels reach plausible extremes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 4 of #48 - consumer-side registry, anomaly branch, tests. cd_variables() now ships 15 vars (7 existing + 8 snow): Monthly natives (pct_normal except snow_cover): swe, snowfall, snowmelt pct_normal % snow_cover pct_point_diff % Annual derived (mostly absolute): swe_max absolute mm snowfall_fraction pct_point_diff % snowmelt_doy_50 absolute day snowmelt_rate_peak absolute mm/wk Renamed snow_depth -> swe in script and registry: the value sde * rsn evaluates to kg/m^2 = mm of water (SWE), not vertical snow depth. Original name was misleading; caught during registry design. Running backfill uses the pre-rename script via its already-loaded code; outputs will be mv'd snow_depth_*.tif -> swe_*.tif after backfill completes (one-line shell op, no re-fetch). cd_anomaly() adds the pct_point_diff branch. Formula is identical to absolute (value - baseline_mean), combined via %in% - the distinct type is for downstream display semantics. cap_pct does NOT apply to pct_point_diff (covered by new test); only pct_normal is clamped. Tests: 166 PASS, 0 FAIL. New cases in test-cd_anomaly.R for pct_point_diff arithmetic and cap non-application; bumped test-cd_variables.R count 7 -> 15 plus two new membership tests; fixed the duplicate pct_normal assertion in test-cd_fetch.R. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2b of #48 - extend the Stage 3 R pipeline to aggregate the 4 monthly snow natives (swe, snowfall, snowmelt, snow_cover) into multi-year COGs alongside the existing 7 vars, and stack the 4 annual-derived snow scalars (swe_max, snowfall_fraction, snowmelt_doy_50, snowmelt_rate_peak) into 1-band-per-year multi-year COGs in a new Step 1b. agg_methods extended for the new monthly vars (sum for snowfall/snowmelt, mean for swe/snow_cover). Caught a substring-match bug in cd_stac_item() while dry-running the new pipeline: the parser used grepl(v, name_parts) which matches substrings, so swe_max_annual.tif was being mis-routed under the `swe` variable (because "swe" is a substring of "swe_max"). The bug was latent before because no existing variable name was a substring of another. Fixed with strict {var}_{period} exact-match against both registries (cd_variables() and cd_periods()). Catalog now has 59 items (35 existing + 24 new): 4 monthly snow x 5 periods (annual + winter/spring/summer/fall) 4 annual snow x 1 period (annual) Verified via dry-run: cd_catalog() correctly returns 5 periods for each monthly snow var and 1 period (annual) for each annual-derived scalar. Existing tmean catalog entries unchanged. Tests: 166 PASS, 0 FAIL (no regression from the parser change). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Full 76-year backfill landed (~2h52min main run + 3min retry for year 2022 ClientPayloadError). Renamed 75 snow_depth_*.tif outputs to swe_*.tif since sde * rsn evaluates to mm SWE, not vertical snow depth. Stage 3 R aggregation + S3 push completed. 24 new COGs uploaded (~114 MB): 4 monthly snow x 5 periods (annual + winter/spring/summer/fall) 4 annual derived x 1 period (annual) plus an updated catalog.json with 35+24 = 59 STAC items. cd_catalog() against the default S3 URL now returns the full set; verified all 8 new vars at expected periods. Updates planning files only - the script changes that produced this result already landed in 37a7510 (Stage 3 extension + cd_stac_item exact-match parser fix). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* main: Release v0.1.7 Snowpack-departure methodology lit review: rag-build + 11 papers + citation map (#54)
Phase 5 of #48. New "Snowpack" section in vignettes/peace-fwcp.Rmd between "Daytime Highs" and "Recent vs Pre-warming". Wires up the bibliography YAML field for the first time and consumes the citation map produced by #53 (snow methodology lit review). vignettes/references.bib (new, 22 KB, 11 entries): Generated by rbbt::bbt_write_bib() from BBT keys for the 11 papers in the NewGraphEnvironment/hydrology Zotero collection added during #53. Static bib committed so vignette renders on CI without needing Zotero/BBT. Vignette YAML: Adds bibliography: references.bib + link-citations: true. Snowpack section structure: - Intro: snowpack as hinge of BC hydrology, salmon-migration framing, broad context (Mote 2018, Pederson 2011, Najafi 2017 BC attribution, Kang 2016 Fraser freshet) - Methodology footnote: ERA5-Land bias caveat (Kouki 2023), raw MK is correct per Yue and Wang 2002 - Headline numbers paragraph: SWE annual -10 percent, summer SWE -75 percent, spring snowmelt +37 percent (freshet earlier), annual snowfall flat (-6 percent) so the SWE decline is warmth-driven not less-snow-driven - Seasonal-curve table: pivoted from monthly faceted plot to seasonal table because monthly aggregations aren't on S3 (COG schema is annual + 4 seasons). Seasonal level still tells the "when does snow accumulate / melt" story cleanly. - 4 annual time-series plots with Theil-Sen lines: swe_max, snowmelt_doy_50, snowmelt_rate_peak, snowfall_fraction - 3-finding interpretation: snow leaving earlier (not falling less), freshet shifting into spring, summers becoming snow-free. Each finding tied to a specific cited paper. Recent vs Pre-warming table extended: Added no_pct_vars list (snow_cover, snowfall_fraction, snowmelt_doy_50) to NA-out the Delta-percent column for vars where it's not meaningful. All 8 new vars appear in the table. data-raw/peace_fwcp_vignette_data.R: Extracted pct_normal_vars list to include swe, snowfall, snowmelt alongside prcp, soil_moisture for the regional and per-ecoregion cd_compare(method = "pct_change") calls. Re-ran precompute: peace_fwcp.rds is now 270 KB (was 160 KB) with all 15 vars. Render time 8.7 s. 166 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 of #48. New data-raw/qa_snow_validation.R cross-checks ERA5-Land swe_max against BC ASWS automated snow-pillow daily SWE at 4 active stations in the FWCP Peace AOI. Outputs land in planning/active/ for the eventual archive. Site selection (via bcsnowdata::snow_auto_location() spatially intersected with the FWCP Peace AOI): Pine Pass 1400 m 37 yr (1989-2025) Mount Sheba 1490 m 7 yr (2019-2025) Ware Upper 1565 m 10 yr (2016-2025) Aiken Lake 1050 m 41 yr (1985-2025) Germansen Landing was in the original list but lacks usable SWE record at this location. Findings (95 paired site-years, pooled r = 0.51): 1. Bias is direction-variable, NOT uniformly high: - Pine Pass: ERA5 underestimates by 61% (1140 vs 439 mm) - Mount Sheba: ERA5 underestimates by 32% (936 vs 633 mm) - Ware Upper: ERA5 close match (+1%, 250 vs 254 mm) - Aiken Lake: ERA5 overestimates by 54% (265 vs 407 mm) This contradicts a naive read of Kouki et al. 2023 (which reported 150-200% NH-wide overestimate). At BC interior sites ERA5-Land overestimates as Kouki found; at high-snowpack Coast-Mountain-spillover sites it underestimates badly. 2. Bias is approximately stable over time at all 4 sites: regression of (ERA5 - ASWS) on year is non-significant everywhere (p > 0.2). This supports the vignette claim that "trends are still defensible" even if absolute values are biased. Vignette updated: replaced the earlier quote of Kouki's NH-wide 150-200% figure with our specific BC findings in the methodology footnote of the Snowpack section. Re-rendered: 9.2 s. DESCRIPTION: bcsnowdata added to Suggests. Manual-survey secondary cross-check skipped — the ASWS bias stability is clean enough at all 4 sites that the secondary check isn't needed to support the vignette claim. Documented as deferred in findings.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 6 of #48. The monthly GHA at .github/workflows/climate-update.yml runs scripts/pipeline_update_edh.R, so that's the file that needs to know about the snow vars — the workflow YAML stays unchanged. scripts/pipeline_update_edh.R changes: * agg_methods extended with the 4 monthly snow natives (swe / snowfall / snowmelt / snow_cover); same shape as the 7 existing 12-band-per-year vars, flow through cd_aggregate identically. * annual_vars list added for the 4 annual derived snow scalars (swe_max / snowfall_fraction / snowmelt_doy_50 / snowmelt_rate_peak); these are 1-band per year, stored in data/backfill/annual/, bypass cd_aggregate. * Step 3 now calls both backfill_edh_all.py AND backfill_edh_snow.py for each candidate year, and verifies all 15 outputs wrote (7 core monthly + 4 snow monthly + 4 snow annual). The Python scripts skip incomplete years individually, so a partial year on EDH cleanly defers all 15 to the next run. * Step 4 split into a monthly path (cd_aggregate from 12-band) and an annual path (1-band straight onto the existing COG). Refactored the per-(var, period) append logic into a small append_to_cog() helper to avoid duplicating the grid-alignment check between the two paths. * Header docstring updated to describe the new flow. README.md: variable inventory now lists all 15 vars grouped as core climate / snow monthly natives / snow annual derived, with a periods note clarifying that annual-derived vars only have an "annual" period (not the seasonal+annual schema of the monthly natives). parse() clean on the R script. 166 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round of vignette refinements after the first user review of the
Snowpack section:
* Methodology paragraph: rewritten in plain language. Two-paragraph
structure now distinguishes the two stacked error sources at any
point comparison: (1) scale mismatch between a 80 km^2 cell average
and a single point measurement, and (2) the Northern-Hemisphere-
scale cell-mean bias documented by Kouki 2023. Walks the reader
through what our QA at Pine Pass and Aiken Lake actually shows.
Dropped the file pointer to the QA results.
* Acronyms expanded inline. DJF/MAM/JJA/SON now reads "winter
(December-February)" etc.
* New "Snow per ecoregion" subsection inside Per-Ecoregion Variation.
Two facet plots: peak SWE anomaly per ecoregion + snowmelt DOY-50
anomaly per ecoregion. Brief intro paragraph above the plots
pointing to the WSG x ecoregion table for watershed-group mapping;
deeper interpretation lives in the Interpretation section.
* Interpretation section gains a snowpack finding: snow leaving the
region earlier (not falling less); freshet timing shifting
uniformly across all 5 ecoregions (~1 day/decade earlier, p < 0.01
in every ecoregion); peak SWE detectable only in the regional
aggregate because year-to-year variability dominates at the
ecoregion scale. Earlier draft asserted an elevation gradient that
is NOT present in the data; corrected.
* Salmon-migration framing removed - the FWCP Peace is non-anadromous
(Bennett Dam blocks salmon from the Williston watershed). Replaced
with the FWCP-accurate resident-salmonid framing (bull trout,
Arctic grayling, mountain whitefish, rainbow trout, kokanee) and
freshet relevance for channel morphology / spawning gravel
mobilization / off-channel rearing habitat.
* "## References" heading added at the end so pandoc places the
auto-generated bibliography under an explicit section.
Render time 9.4 s, 11 citations resolve to References, 166 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
NewGraphEnvironment
added a commit
that referenced
this pull request
May 5, 2026
The pkgdown CI on the squash-merge of #55 failed because bcsnowdata is GitHub-only (bcgov/bcsnowdata) and pak couldn't resolve it from CRAN/PPM. Convention in this repo (matching how bcdata and fresh are already handled in other data-raw scripts) is that data-raw scripts manage their own ad-hoc deps; the developer running the script installs what's needed. data-raw/qa_snow_validation.R header now includes the pak::pak() install instruction.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds 8 snow-related variables to the cd package — the v0.2.0 release. Closes #48.
swe,snowfall,snowmelt,snow_coverswe_max,snowfall_fraction,snowmelt_doy_50,snowmelt_rate_peakSingle hourly EDH fetch on the producer side covers both layers. Snow methodology citations (11 papers in
NewGraphEnvironment/hydrologyZotero collection) and the citation map come from the companion lit-review PR #54 (already merged as v0.1.7). ASWS QA at 4 BC sites validates the bias-stability assumption that anchors the trend-defensibility claim.What's in this PR
scripts/backfill_edh_snow.py— new producer script, imports safeguards fromscripts/_lib.py(Extract bulk-fetch safeguards into shared scripts/_lib.py #52). First place that ECMWF's 00:00-UTC-reset accumulation trick lands in code (the pattern was archived from Migrate cd_fetch() to DestinE Earth Data Hub Zarr #36 but never used).scripts/pipeline_stage3_edh.R+scripts/pipeline_update_edh.R— extended to handle both monthly natives (cd_aggregate path) and annual derived (1-band stack path). Refactored COG-append logic intoappend_to_cog()helper.R/cd_variables.R— 7 → 15 vars; newpct_point_diffanomaly type forsnow_coverandsnowfall_fraction.R/cd_anomaly.R—pct_point_diffbranch added tocase_when. Formula identical toabsolute(combined via%in%) — distinct type carries display semantics for downstream.R/cd_stac_catalog.R— bug fix incd_stac_item(): substring-matchgrepl(v, name_parts)was mis-routingswe_max_annual.tifunderswe. Replaced with strict{var}_{period}exact-match. Non-breaking for existing tests.stac-era5-land, 4 monthly × 5 periods + 4 annual × 1 period × 76 years bands = 24 multi-year files), catalog now serves 59 items.vignettes/peace-fwcp.Rmd— new "Snowpack" section between "Daytime Highs" and "Recent vs Pre-warming" with seasonal-curve table + 4 annual time-series plots;bibliography: references.bibwired in (first use in this vignette); 11 citations sourced from Snowpack-departure methodology lit review: rag-build + 11 papers + citation map #54's findings; "Snow per ecoregion" subsection in Per-Ecoregion Variation; snowpack interpretation paragraph in Interpretation section. References section header added at end.data-raw/qa_snow_validation.R— ASWS cross-check at 4 BC sites viabcgov/bcsnowdata. 95 paired station-years; bias direction varies (Pine Pass −61%, Aiken Lake +54%) but is stable over time at every site (p > 0.2 everywhere).Headline scientific findings (from the rendered vignette)
Methodology defensibility
cd_trend()'s raw Mann-Kendall + Theil-Sen (no pre-whitening) is methodologically correct for our 76-year series with strong climate trends per Yue and Wang (2002 WRR). Oursnowmelt_rate_peak(annual max of 7-day rolling daily smlt) is novel — closest precedent is streamflow-based freshet flashiness, ours is upstream of that on the snowmelt flux directly.Test plan
devtools::test()clean (166 PASS, 0 FAIL)parse()clean on all R scriptsOut of scope
cd_compare()p-value (cd_compare(): add p-value for window-vs-window comparison #43) — pairs naturally with snow but stays in its own PR.asnsnow albedo,tsnsnow temperature) — file follow-up if motivated.Fixes #48
Relates to NewGraphEnvironment/sred-2025-2026#23
🤖 Generated with Claude Code