Skip to content

Add 8 snow-related variables for hydrology departure (v0.2.0)#55

Merged
NewGraphEnvironment merged 10 commits into
mainfrom
48-snow-vars
May 5, 2026
Merged

Add 8 snow-related variables for hydrology departure (v0.2.0)#55
NewGraphEnvironment merged 10 commits into
mainfrom
48-snow-vars

Conversation

@NewGraphEnvironment
Copy link
Copy Markdown
Owner

Summary

Adds 8 snow-related variables to the cd package — the v0.2.0 release. Closes #48.

Layer Variables Schema What it shows
Monthly natives swe, snowfall, snowmelt, snow_cover 12-band/year COG seasonal curve — when snow accumulates and melts
Annual derived swe_max, snowfall_fraction, snowmelt_doy_50, snowmelt_rate_peak 1-band/year COG climate-departure signals — peak SWE, freshet timing, flashiness

Single hourly EDH fetch on the producer side covers both layers. Snow methodology citations (11 papers in NewGraphEnvironment/hydrology Zotero collection) and the citation map come from the companion lit-review PR #54 (already merged as v0.1.7). ASWS QA at 4 BC sites validates the bias-stability assumption that anchors the trend-defensibility claim.

What's in this PR

  • scripts/backfill_edh_snow.py — new producer script, imports safeguards from scripts/_lib.py (Extract bulk-fetch safeguards into shared scripts/_lib.py #52). First place that ECMWF's 00:00-UTC-reset accumulation trick lands in code (the pattern was archived from Migrate cd_fetch() to DestinE Earth Data Hub Zarr #36 but never used).
  • scripts/pipeline_stage3_edh.R + scripts/pipeline_update_edh.R — extended to handle both monthly natives (cd_aggregate path) and annual derived (1-band stack path). Refactored COG-append logic into append_to_cog() helper.
  • R/cd_variables.R — 7 → 15 vars; new pct_point_diff anomaly type for snow_cover and snowfall_fraction.
  • R/cd_anomaly.Rpct_point_diff branch added to case_when. Formula identical to absolute (combined via %in%) — distinct type carries display semantics for downstream.
  • R/cd_stac_catalog.R — bug fix in cd_stac_item(): substring-match grepl(v, name_parts) was mis-routing swe_max_annual.tif under swe. Replaced with strict {var}_{period} exact-match. Non-breaking for existing tests.
  • 608 new COGs on S3 (stac-era5-land, 4 monthly × 5 periods + 4 annual × 1 period × 76 years bands = 24 multi-year files), catalog now serves 59 items.
  • vignettes/peace-fwcp.Rmd — new "Snowpack" section between "Daytime Highs" and "Recent vs Pre-warming" with seasonal-curve table + 4 annual time-series plots; bibliography: references.bib wired in (first use in this vignette); 11 citations sourced from Snowpack-departure methodology lit review: rag-build + 11 papers + citation map #54's findings; "Snow per ecoregion" subsection in Per-Ecoregion Variation; snowpack interpretation paragraph in Interpretation section. References section header added at end.
  • data-raw/qa_snow_validation.R — ASWS cross-check at 4 BC sites via bcgov/bcsnowdata. 95 paired station-years; bias direction varies (Pine Pass −61%, Aiken Lake +54%) but is stable over time at every site (p > 0.2 everywhere).
  • README — variable inventory updated to list all 15 vars grouped as core / snow monthly natives / snow annual derived.

Headline scientific findings (from the rendered vignette)

  • Annual SWE down ~10% (135 → 122 mm); summer SWE collapsed by 75% (21.5 → 5.3 mm); spring snowmelt +37% (212 → 290 mm); annual snowfall roughly stable (−6%). The snowpack story is about timing, not quantity — warming is removing snow earlier, not preventing snow from falling.
  • Freshet-timing shift is uniform across all five ecoregions (~1 day/decade earlier melt, p < 0.01 in every ecoregion). Peak SWE shows up in the regional aggregate but washes out at the ecoregion scale due to inter-annual variability.
  • ERA5-Land snow values are biased at point measurements but the bias is stable over time → trends are interpretable. Spatial averaging across hundreds of cells in the regional aggregate also cancels random point-vs-cell scale mismatch.

Methodology defensibility

cd_trend()'s raw Mann-Kendall + Theil-Sen (no pre-whitening) is methodologically correct for our 76-year series with strong climate trends per Yue and Wang (2002 WRR). Our snowmelt_rate_peak (annual max of 7-day rolling daily smlt) is novel — closest precedent is streamflow-based freshet flashiness, ours is upstream of that on the snowmelt flux directly.

Test plan

  • devtools::test() clean (166 PASS, 0 FAIL)
  • parse() clean on all R scripts
  • Vignette renders in 9.4 s, 11 citations resolve to a References section
  • Spatial sanity: median DOY-50 = May 8, median peak SWE = 276 mm, median snowfall_fraction = 27%, alpine pixels reach plausible extremes
  • pgrep guard fires correctly on backfill_edh_snow.py
  • Post-merge: pkgdown CI green
  • Post-merge: monthly GHA picks up new pipeline (next scheduled run; latest year 2025 already current → no new fetch until Q1 2027 when 2026 becomes complete on EDH)

Out of scope

Fixes #48
Relates to NewGraphEnvironment/sred-2025-2026#23

🤖 Generated with Claude Code

NewGraphEnvironment and others added 10 commits May 3, 2026 22:18
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 of #48. New scripts/backfill_edh_snow.py imports safeguards
from scripts/_lib.py (preflight_single_instance, with_retry,
write_geotiff, log, get_token) and produces both layers of snow
variables from a single hourly EDH fetch per year:

  Monthly natives (12-band/year COG):
    snow_depth      mm SWE  monthly mean of daily sde * rsn
    snowfall        mm      monthly sum of daily sf (accum-handled)
    snowmelt        mm      monthly sum of daily smlt (accum-handled)
    snow_cover      %       monthly mean of daily snowc

  Annual derived (1-band/year COG):
    swe_max                 mm     annual max of daily sde * rsn
    snowfall_fraction       %      100 * annual_sum_sf / annual_sum_tp
    snowmelt_doy_50         day    DOY when cumsum(smlt) >= 50% of annual
    snowmelt_rate_peak      mm/wk  annual max of 7-day rolling sum daily smlt

New hourly_accum_to_daily() helper implements the 00:00 UTC reset
trick: sf and smlt at valid_time = t (00:00 UTC) represent the
accumulation from t-24h to t, so daily total for day D = value at
D+1 00:00 UTC. Pattern was documented in the #36 EDH-migration
archive but never implemented in code; this is the first call site.

Smoke-tested on year 2020 (leap year): all 8 outputs written in
228s. Median DOY-50 = May 8 (freshet), median peak SWE = 276 mm,
median snowfall_fraction = 27% — values span plausible BC ranges,
alpine pixels reach plausible extremes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 4 of #48 - consumer-side registry, anomaly branch, tests.

cd_variables() now ships 15 vars (7 existing + 8 snow):
  Monthly natives (pct_normal except snow_cover):
    swe, snowfall, snowmelt           pct_normal     %
    snow_cover                        pct_point_diff %
  Annual derived (mostly absolute):
    swe_max                           absolute       mm
    snowfall_fraction                 pct_point_diff %
    snowmelt_doy_50                   absolute       day
    snowmelt_rate_peak                absolute       mm/wk

Renamed snow_depth -> swe in script and registry: the value
sde * rsn evaluates to kg/m^2 = mm of water (SWE), not vertical
snow depth. Original name was misleading; caught during registry
design. Running backfill uses the pre-rename script via its
already-loaded code; outputs will be mv'd snow_depth_*.tif -> swe_*.tif
after backfill completes (one-line shell op, no re-fetch).

cd_anomaly() adds the pct_point_diff branch. Formula is identical to
absolute (value - baseline_mean), combined via %in% - the distinct
type is for downstream display semantics. cap_pct does NOT apply to
pct_point_diff (covered by new test); only pct_normal is clamped.

Tests: 166 PASS, 0 FAIL. New cases in test-cd_anomaly.R for
pct_point_diff arithmetic and cap non-application; bumped
test-cd_variables.R count 7 -> 15 plus two new membership tests;
fixed the duplicate pct_normal assertion in test-cd_fetch.R.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2b of #48 - extend the Stage 3 R pipeline to aggregate the
4 monthly snow natives (swe, snowfall, snowmelt, snow_cover) into
multi-year COGs alongside the existing 7 vars, and stack the 4
annual-derived snow scalars (swe_max, snowfall_fraction,
snowmelt_doy_50, snowmelt_rate_peak) into 1-band-per-year multi-year
COGs in a new Step 1b. agg_methods extended for the new monthly
vars (sum for snowfall/snowmelt, mean for swe/snow_cover).

Caught a substring-match bug in cd_stac_item() while dry-running the
new pipeline: the parser used grepl(v, name_parts) which matches
substrings, so swe_max_annual.tif was being mis-routed under the
`swe` variable (because "swe" is a substring of "swe_max"). The bug
was latent before because no existing variable name was a substring
of another. Fixed with strict {var}_{period} exact-match against
both registries (cd_variables() and cd_periods()).

Catalog now has 59 items (35 existing + 24 new):
  4 monthly snow x 5 periods (annual + winter/spring/summer/fall)
  4 annual snow x 1 period (annual)

Verified via dry-run: cd_catalog() correctly returns 5 periods for
each monthly snow var and 1 period (annual) for each annual-derived
scalar. Existing tmean catalog entries unchanged.

Tests: 166 PASS, 0 FAIL (no regression from the parser change).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Full 76-year backfill landed (~2h52min main run + 3min retry for year
2022 ClientPayloadError). Renamed 75 snow_depth_*.tif outputs to
swe_*.tif since sde * rsn evaluates to mm SWE, not vertical snow depth.

Stage 3 R aggregation + S3 push completed. 24 new COGs uploaded (~114 MB):
  4 monthly snow x 5 periods (annual + winter/spring/summer/fall)
  4 annual derived x 1 period (annual)
plus an updated catalog.json with 35+24 = 59 STAC items.

cd_catalog() against the default S3 URL now returns the full set;
verified all 8 new vars at expected periods.

Updates planning files only - the script changes that produced this
result already landed in 37a7510 (Stage 3 extension + cd_stac_item
exact-match parser fix).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* main:
  Release v0.1.7
  Snowpack-departure methodology lit review: rag-build + 11 papers + citation map (#54)
Phase 5 of #48. New "Snowpack" section in vignettes/peace-fwcp.Rmd
between "Daytime Highs" and "Recent vs Pre-warming". Wires up the
bibliography YAML field for the first time and consumes the
citation map produced by #53 (snow methodology lit review).

vignettes/references.bib (new, 22 KB, 11 entries):
  Generated by rbbt::bbt_write_bib() from BBT keys for the 11
  papers in the NewGraphEnvironment/hydrology Zotero collection
  added during #53. Static bib committed so vignette renders
  on CI without needing Zotero/BBT.

Vignette YAML:
  Adds bibliography: references.bib + link-citations: true.

Snowpack section structure:
  - Intro: snowpack as hinge of BC hydrology, salmon-migration
    framing, broad context (Mote 2018, Pederson 2011, Najafi 2017
    BC attribution, Kang 2016 Fraser freshet)
  - Methodology footnote: ERA5-Land bias caveat (Kouki 2023),
    raw MK is correct per Yue and Wang 2002
  - Headline numbers paragraph: SWE annual -10 percent, summer
    SWE -75 percent, spring snowmelt +37 percent (freshet
    earlier), annual snowfall flat (-6 percent) so the SWE
    decline is warmth-driven not less-snow-driven
  - Seasonal-curve table: pivoted from monthly faceted plot to
    seasonal table because monthly aggregations aren't on S3
    (COG schema is annual + 4 seasons). Seasonal level still
    tells the "when does snow accumulate / melt" story cleanly.
  - 4 annual time-series plots with Theil-Sen lines: swe_max,
    snowmelt_doy_50, snowmelt_rate_peak, snowfall_fraction
  - 3-finding interpretation: snow leaving earlier (not falling
    less), freshet shifting into spring, summers becoming
    snow-free. Each finding tied to a specific cited paper.

Recent vs Pre-warming table extended:
  Added no_pct_vars list (snow_cover, snowfall_fraction,
  snowmelt_doy_50) to NA-out the Delta-percent column for vars
  where it's not meaningful. All 8 new vars appear in the table.

data-raw/peace_fwcp_vignette_data.R:
  Extracted pct_normal_vars list to include swe, snowfall,
  snowmelt alongside prcp, soil_moisture for the regional and
  per-ecoregion cd_compare(method = "pct_change") calls. Re-ran
  precompute: peace_fwcp.rds is now 270 KB (was 160 KB) with
  all 15 vars.

Render time 8.7 s. 166 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 of #48. New data-raw/qa_snow_validation.R cross-checks
ERA5-Land swe_max against BC ASWS automated snow-pillow daily
SWE at 4 active stations in the FWCP Peace AOI. Outputs land in
planning/active/ for the eventual archive.

Site selection (via bcsnowdata::snow_auto_location() spatially
intersected with the FWCP Peace AOI):
  Pine Pass    1400 m  37 yr (1989-2025)
  Mount Sheba  1490 m   7 yr (2019-2025)
  Ware Upper   1565 m  10 yr (2016-2025)
  Aiken Lake   1050 m  41 yr (1985-2025)
Germansen Landing was in the original list but lacks usable SWE
record at this location.

Findings (95 paired site-years, pooled r = 0.51):

1. Bias is direction-variable, NOT uniformly high:
   - Pine Pass:   ERA5 underestimates by 61% (1140 vs 439 mm)
   - Mount Sheba: ERA5 underestimates by 32% (936 vs 633 mm)
   - Ware Upper:  ERA5 close match (+1%, 250 vs 254 mm)
   - Aiken Lake:  ERA5 overestimates by 54% (265 vs 407 mm)
   This contradicts a naive read of Kouki et al. 2023 (which
   reported 150-200% NH-wide overestimate). At BC interior sites
   ERA5-Land overestimates as Kouki found; at high-snowpack
   Coast-Mountain-spillover sites it underestimates badly.

2. Bias is approximately stable over time at all 4 sites:
   regression of (ERA5 - ASWS) on year is non-significant
   everywhere (p > 0.2). This supports the vignette claim that
   "trends are still defensible" even if absolute values are
   biased.

Vignette updated: replaced the earlier quote of Kouki's
NH-wide 150-200% figure with our specific BC findings in the
methodology footnote of the Snowpack section. Re-rendered: 9.2 s.

DESCRIPTION: bcsnowdata added to Suggests.

Manual-survey secondary cross-check skipped — the ASWS bias
stability is clean enough at all 4 sites that the secondary
check isn't needed to support the vignette claim. Documented
as deferred in findings.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 6 of #48. The monthly GHA at .github/workflows/climate-update.yml
runs scripts/pipeline_update_edh.R, so that's the file that needs to
know about the snow vars — the workflow YAML stays unchanged.

scripts/pipeline_update_edh.R changes:

  * agg_methods extended with the 4 monthly snow natives
    (swe / snowfall / snowmelt / snow_cover); same shape as the
    7 existing 12-band-per-year vars, flow through cd_aggregate
    identically.

  * annual_vars list added for the 4 annual derived snow scalars
    (swe_max / snowfall_fraction / snowmelt_doy_50 /
    snowmelt_rate_peak); these are 1-band per year, stored in
    data/backfill/annual/, bypass cd_aggregate.

  * Step 3 now calls both backfill_edh_all.py AND backfill_edh_snow.py
    for each candidate year, and verifies all 15 outputs wrote
    (7 core monthly + 4 snow monthly + 4 snow annual). The Python
    scripts skip incomplete years individually, so a partial year
    on EDH cleanly defers all 15 to the next run.

  * Step 4 split into a monthly path (cd_aggregate from 12-band)
    and an annual path (1-band straight onto the existing COG).
    Refactored the per-(var, period) append logic into a small
    append_to_cog() helper to avoid duplicating the grid-alignment
    check between the two paths.

  * Header docstring updated to describe the new flow.

README.md: variable inventory now lists all 15 vars grouped as
core climate / snow monthly natives / snow annual derived, with a
periods note clarifying that annual-derived vars only have an
"annual" period (not the seasonal+annual schema of the monthly
natives).

parse() clean on the R script. 166 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round of vignette refinements after the first user review of the
Snowpack section:

  * Methodology paragraph: rewritten in plain language. Two-paragraph
    structure now distinguishes the two stacked error sources at any
    point comparison: (1) scale mismatch between a 80 km^2 cell average
    and a single point measurement, and (2) the Northern-Hemisphere-
    scale cell-mean bias documented by Kouki 2023. Walks the reader
    through what our QA at Pine Pass and Aiken Lake actually shows.
    Dropped the file pointer to the QA results.

  * Acronyms expanded inline. DJF/MAM/JJA/SON now reads "winter
    (December-February)" etc.

  * New "Snow per ecoregion" subsection inside Per-Ecoregion Variation.
    Two facet plots: peak SWE anomaly per ecoregion + snowmelt DOY-50
    anomaly per ecoregion. Brief intro paragraph above the plots
    pointing to the WSG x ecoregion table for watershed-group mapping;
    deeper interpretation lives in the Interpretation section.

  * Interpretation section gains a snowpack finding: snow leaving the
    region earlier (not falling less); freshet timing shifting
    uniformly across all 5 ecoregions (~1 day/decade earlier, p < 0.01
    in every ecoregion); peak SWE detectable only in the regional
    aggregate because year-to-year variability dominates at the
    ecoregion scale. Earlier draft asserted an elevation gradient that
    is NOT present in the data; corrected.

  * Salmon-migration framing removed - the FWCP Peace is non-anadromous
    (Bennett Dam blocks salmon from the Williston watershed). Replaced
    with the FWCP-accurate resident-salmonid framing (bull trout,
    Arctic grayling, mountain whitefish, rainbow trout, kokanee) and
    freshet relevance for channel morphology / spawning gravel
    mobilization / off-channel rearing habitat.

  * "## References" heading added at the end so pandoc places the
    auto-generated bibliography under an explicit section.

Render time 9.4 s, 11 citations resolve to References, 166 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@NewGraphEnvironment NewGraphEnvironment merged commit 81d96bf into main May 5, 2026
1 check failed
@NewGraphEnvironment NewGraphEnvironment deleted the 48-snow-vars branch May 5, 2026 04:23
NewGraphEnvironment added a commit that referenced this pull request May 5, 2026
The pkgdown CI on the squash-merge of #55 failed because bcsnowdata
is GitHub-only (bcgov/bcsnowdata) and pak couldn't resolve it from
CRAN/PPM. Convention in this repo (matching how bcdata and fresh are
already handled in other data-raw scripts) is that data-raw scripts
manage their own ad-hoc deps; the developer running the script installs
what's needed. data-raw/qa_snow_validation.R header now includes the
pak::pak() install instruction.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add snow-related variables (SWE, snowfall fraction, melt timing) for hydrology departure

1 participant