diff --git a/DESCRIPTION b/DESCRIPTION index c83dc05..701cb6c 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -21,13 +21,14 @@ Depends: Encoding: UTF-8 Roxygen: list(markdown = TRUE) RoxygenNote: 7.3.3 -Suggests: +Suggests: aws.s3, bookdown, ecmwfr, ggplot2, ggrepel, gt, + kableExtra, Kendall, knitr, rmarkdown, diff --git a/README.md b/README.md index 81c18ff..aaa9afe 100644 --- a/README.md +++ b/README.md @@ -17,12 +17,10 @@ pak::pak("NewGraphEnvironment/cd") ```r library(cd) -# Load a STAC catalog and an area of interest -catalog <- cd_catalog( - system.file("extdata", "example_catalog.json", package = "cd") -) +# Load the live STAC catalog and an example area of interest +catalog <- cd_catalog() aoi <- sf::st_read( - system.file("extdata", "example_aoi.gpkg", package = "cd"), + system.file("extdata", "example_aoi_kotl.gpkg", package = "cd"), quiet = TRUE ) @@ -46,4 +44,4 @@ cd_compare(ts, window_a = 1956:1960, window_b = 1951:1955) ## Links - [Function reference](https://newgraphenvironment.github.io/cd/reference/) -- [Vignette](https://newgraphenvironment.github.io/cd/articles/) (coming soon) +- [Vignette: Climate Departure Analysis for a Watershed](https://newgraphenvironment.github.io/cd/articles/climate-departure.html) diff --git a/planning/active/findings.md b/planning/active/findings.md index d5eed32..0ea8b69 100644 --- a/planning/active/findings.md +++ b/planning/active/findings.md @@ -1,108 +1,19 @@ -# Findings: EDH migration +# Findings: Vignette tmax/tmin + map clipping -## EDH API basics (from #35 research and tonight's benchmark) +_Populated during Phase 1 inspection and as work progresses._ -**Zarr URL:** `https://data.earthdatahub.destine.eu/era5/reanalysis-era5-land-no-antartica-v0.zarr` +## Vignette chunk inventory -**Auth:** Personal access token, passed inline as URL password: -``` -https://edh:@data.earthdatahub.destine.eu/... -``` -Alternative: `storage_options={"client_kwargs":{"trust_env":True}}` to read from `.netrc`. +_TBD — list each figure/table chunk, what it shows, what context layers it uses._ -**Quota:** 500,000 requests/month per user (confirmed in EDH getting-started docs). +## Watershed group lookup -**Coord conventions (from live Zarr inspection):** -- Dimensions: `valid_time`, `latitude`, `longitude` -- `latitude`: descending from ~90 to -60 (confirmed by working slice `slice(60, 48)`) -- `longitude`: **0 to 359.9** (0-360 convention, NOT -180 to 180) -- For BC bbox `c(60, -140, 48, -114)` → lon slice must be `220` to `246` -- `valid_time`: hourly, from 1950-01-01T00:00 to current month last day -- Variable names: CF-style short names (`t2m`, `tp`, `d2m`, `swvl1-4`, `u10`, `v10`, etc.) +_TBD — fwapg query or helper used to resolve the WSG polygon for the AOI._ -**50 variables available in single Zarr store:** -`asn, d2m, e, es, evabs, evaow, evatc, evavt, fal, lai_hv, lai_lv, lblt, licd, lict, lmld, lmlt, lshf, ltlt, pev, ro, rsn, sd, sde, sf, skt, slhf, smlt, snowc, sp, src, sro, sshf, ssr, ssrd, ssro, stl1, stl2, stl3, stl4, str, strd, swvl1, swvl2, swvl3, swvl4, t2m, tp, tsn, u10, v10` +## Ecological angles — which 2-3 to ship -**Relevant variable mapping for cd package:** -| cd variable | EDH variable(s) | Units | Notes | -|---|---|---|---| -| tmean | `t2m` | K | Hourly; monthly mean computed client-side | -| tmax | `t2m` | K | Hourly; daily max → monthly mean of daily max | -| tmin | `t2m` | K | Hourly; daily min → monthly mean of daily min | -| prcp | `tp` | m | Hourly total precipitation; sum over month, × 1000 for mm | -| dewpoint | `d2m` | K | Hourly 2m dewpoint | -| soil_moisture | `swvl1`, `swvl2`, `swvl3`, `swvl4` | m³/m³ | 4 vertical layers | -| vpd | derived | hPa | From t2m + d2m (Tetens) | -| rh | derived | % | From t2m + d2m | +_TBD — decide after inspecting what the existing narrative already covers._ -## Performance (single month BC bbox) +## Map cleanup notes -- Open Zarr store: 2.6-3.0s (metadata only) -- Subset + materialize 1 month × BC bbox × hourly t2m: 14.9-15.9s -- In-memory size: 92.9 MB for 744 hours × 120 lat × 260 lon float64 - -## Implications for pipeline - -- No rate limits means we can pull **all months in parallel** via asyncio/dask if wanted -- Zarr chunking is independent of month boundaries, so small time-slice pulls are efficient -- Can pull all 7 cd variables from the **same dataset open** — no need for separate fetches per variable - -## R/Zarr tooling options - -- **Reticulate + Python xarray** — proven; what `test_edh_era5_land.py` uses. Simple, works. -- **stars::read_mdim()** — GDAL zarr driver. Requires GDAL built with zarr, auth via URL password likely works. -- **pizzarr** — young R-native zarr package, may not support remote HTTPS stores. - -Decision deferred to Phase 3 after Phase 2 unblocks the data. - -## Known limitation: UTC-day aggregation for tmax/tmin - -Both the existing R pipeline (`pipeline_tmax_tmin_hourly.R`) and the new EDH -Python script compute daily max/min over UTC-day windows, not local-time days. -For BC (UTC-7/-8) the local-afternoon tmax peak straddles UTC day boundaries, -introducing a small systematic bias vs. a local-time daily aggregation. - -- CDS's `derived-era5-land-daily-statistics` product accepts a `time_zone` - parameter and would fix this — but we abandoned it in #33 due to its - ~2-hour-per-job queue time. -- Fixing properly requires shifting hourly timestamps by the local UTC offset - before `.resample("1D")`, or doing per-pixel local-time aggregation - (computationally expensive and overkill for a regional dataset). -- For now: the EDH script replicates the R pipeline's existing behavior so - outputs are directly comparable. Addressing this bias is a follow-up issue - (cd package methodology). - -## Precipitation accumulation (ERA5-Land tp) - -The hourly `tp` variable has `GRIB_stepType=accum` — it is a **running -accumulation** from 01:00 UTC reset to 00:00 UTC next day. Naive -`.sum()` over hourly values gives a monthly precip ~8× too high. - -**Solution:** use EDH's daily product -`https://data.earthdatahub.destine.eu/era5/era5-land-daily-utc-v1.zarr` -which pre-computes daily totals correctly. Monthly precip = sum of -daily totals. - -**Why not fix the hourly sum:** writing accumulation logic ourselves -is bug-prone. The daily product is what ECMWF designed for this -exact case. - -**Caveat:** daily product only has 14 variables (t2m, d2m, swvl1-2, tp, -wind, radiation, etc.) — no swvl3 or swvl4. So soil moisture stays on -the hourly product. tmean and dewpoint could use either; we use hourly -for consistency and because the 0.99 correlation (vs CDS) was already -validated. - -## EDH product catalogue (what's available for cd package) - -| Product | Zarr URL | Variables | Use | -|---|---|---|---| -| Hourly ERA5-Land | `reanalysis-era5-land-no-antartica-v0.zarr` | 50 | tmax/tmin, tmean, dewpoint, soil_moisture | -| Daily ERA5-Land (UTC) | `era5-land-daily-utc-v1.zarr` | 14 | prcp only | -| Monthly ERA5 (not Land) | — | - | NOT useful (ERA5 parent, 31 km) | - -## Out-of-scope confirmations - -- GEE has ERA5-Land but commercial license blocks us -- AWS / ARCO / WeatherBench all serve ERA5 (31 km), not ERA5-Land -- CDS-beta is same backend, same rate limits +_TBD — per-figure cleanup decisions (clip-to-WSG, mapgl fitBounds, etc.)_ diff --git a/planning/active/progress.md b/planning/active/progress.md index 5d0c20e..6d7f7da 100644 --- a/planning/active/progress.md +++ b/planning/active/progress.md @@ -1,100 +1,15 @@ -# Progress: EDH migration +# Progress: Vignette tmax/tmin + map clipping -## Session 2026-04-12 (evening) - -**Context from prior day:** Got rate-limited twice by CDS. Researched alternatives (#35), benchmarked EDH Zarr at 5× faster with no rate limits and same data. Filed #36 for migration. Pivoting now. +## Session 2026-04-14 (branch + PWF baseline) **Completed:** -- Branched off main: `36-edh-migration` -- Stashed and restored `scripts/test_edh_era5_land.py` on new branch -- Set up PWF files (this document, task_plan.md, findings.md) +- Branched `39-vignette-tmax-tmin-maps` off main +- Archived #36 PWF to `planning/archive/2026-04-issue-36-edh-migration/` with README +- Set up #39 PWF (this file, task_plan.md, findings.md) **Next:** -- Update climate-update.yml GHA to use EDH (replace CDS fetcher with - scripts/backfill_edh_all.py + stage 3) -- Write pipeline_update_edh (EDH incremental update, replaces - CDS-based pipeline_update.R) -- Land the PR - -## Session 2026-04-12 (Stage 3 to S3) - -**Completed:** -- Wrote `scripts/pipeline_stage3_edh.R` — slim Stage-3-only against - `data/backfill/monthly/` (avoids CDS fetch in existing pipeline_backfill.R) -- Built 35 COGs locally: 7 variables × (annual + 4 seasons) × 76 years -- STAC catalog built, 35 items -- S3 sync pushed all 35 COGs to s3://stac-era5-land (178 MB total, - replacing the April 6-7 CDS-era COGs) -- Catalog.json unchanged (byte-identical) because filenames + extents - + metadata didn't change — legitimate sync behavior, not a bug -- Verified end-to-end via /vsicurl: tmean_annual reads back with correct - 76 years, CRS, extent. BC annual mean 1950 = -1.42 C, 2024 = +1.85 C - (3.3 C warming in 74 years — the climate departure signal the package - is built to show, now live on S3 from EDH data) - -## Session 2026-04-12 (unified backfill run) - -**Completed:** -- Full 1950-2025 unified backfill of 5 non-tmax/tmin variables via EDH in ~4 hours - (10:19 - 14:20 wall time) -- 532 monthly TIFs total (7 vars × 76 years), all on matching EPSG:4326 120×260 grid -- CDS-era files safely preserved in `data/backfill/monthly/_cds_backup/` (375 files) -- Two transient errors handled well: - - 1989: TimeoutError → retry-with-backoff caught it, completed on attempt 2 - - 2008: ClientPayloadError → retry catch didn't include aiohttp class; outer handler - skipped the year and continued. Filled later via `--year 2008` rerun. -- QA confirmed: all grids aligned, tmin<=tmean<=tmax with zero violations, - physically plausible values, soil_moisture composite correct. - -## Session 2026-04-12 (overnight run) - -**Completed:** -- Full 1950-2025 backfill via EDH, ~1h 53min unattended (02:16 - 04:09) - - 75 year-files written (1951-2025) plus 1950 from earlier test = 76 years - - Avg ~90s per year, no failures -- Regenerated 2024 from EDH (was a CDS leftover from April 7 with different methodology) -- Filed #37 for the UTC-day aggregation bias (not blocking #36) -- All 76 years × 2 variables × 12 months × BC bbox GeoTIFFs in data/backfill/monthly/ - -**Validation (R/terra spot check):** -- 1950 Jul tmax up to 29.7 °C — historically realistic -- 2000 Jul tmax up to 28.9 °C -- 2024 Jul tmax up to 34.6 °C — matches heat dome era -- 2025 Jul tmax up to 31.9 °C -- All years: 12 layers, Jan..Dec names, EPSG:4326, BC bbox extent - -## Session 2026-04-12 (morning QA) - -**Completed:** -- Wrote `scripts/qa_monthly.R` — checks grid alignment, time coverage, - physical sanity across all variables -- Wrote `scripts/probe_edh_vars.py` — pulls one month of each EDH - variable, compares to CDS, validates aggregation semantics -- **Discovered grid mismatch:** CDS-era vars on 121×261 grid with no CRS - tag, vs EDH-era tmax/tmin on 120×260 EPSG:4326. Blocks release. -- **Validated EDH semantics:** - - tmean (hourly t2m mean): 0.99 correlation with CDS, 0.43 °C mean abs diff - - dewpoint: plausible BC winter values - - soil_moisture: all 4 depths match CDS composite closely - - prcp: hourly `tp` is accumulation (`GRIB_stepType=accum`) — need - EDH daily product for correct monthly precip -- **Decision:** two-Zarr approach, extend backfill to all variables so - the whole dataset is internally grid-aligned - -**Next:** -- Commit QA + probe scripts on the branch (durable tooling) -- Extend backfill script to all 7 cd variables -- Regenerate `data/backfill/monthly/` entirely -- Re-run QA → R Stage 3 → PR - -**Commits this session:** -- `0014bf2` — Add planning docs for EDH migration -- `7ef03cb` — Add EDH Zarr benchmark test script -- _pending_ — Add EDH-based tmax/tmin backfill script +- Phase 1: inspect vignette + list every figure chunk +- Decide the 2-3 tmax/tmin angles +- Pull watershed group polygon -**Verified 2026-04-12 02:11:** -- EDH Zarr benchmark: 15.9s / month BC t2m (5× faster than CDS) -- EDH full-year backfill: 114.3s / year BC tmax/tmin as GeoTIFF -- Output drop-in replacement for existing R Stage 2: 12 layers Jan..Dec, - EPSG:4326, °C, BC bbox — `terra::rast()` reads cleanly with expected names -- Realistic BC values for 1950: Jan tmax -30.5 to 0.7°C, Jul tmax -1.2 to 29.7°C +**Commits this session:** _to be appended_ diff --git a/planning/active/task_plan.md b/planning/active/task_plan.md index f69bfc6..20ffbcd 100644 --- a/planning/active/task_plan.md +++ b/planning/active/task_plan.md @@ -1,88 +1,77 @@ -# Task Plan: Migrate cd_fetch() to DestinE Earth Data Hub Zarr +# Task Plan: Vignette — tmax/tmin ecological content + map clipping -Issue: NewGraphEnvironment/cd#36 -Branch: `36-edh-migration` +Issue: NewGraphEnvironment/cd#39 +Branch: `39-vignette-tmax-tmin-maps` ## Context -CDS (`ecmwfr`) rate limiting makes the tmax/tmin backfill take ~3 days of babysitting. Benchmark in #35 showed EDH Zarr delivers the same ERA5-Land data at ~15s/month (5× faster, no rate limits, 500K requests/month quota). Pivot the producer pipeline to EDH. +`vignettes/climate-departure.Rmd` currently narrates tmean + soil_moisture +anomalies for a BC watershed AOI (Kootenay Lake). Post-#36 the package also +ships tmax/tmin on STAC, and the existing maps show context layers that +spread beyond the watershed group and dilute the story. -## Phase 1: Scaffold and benchmark (mostly done from #35) +Goal: land two improvements in one focused PR without restructuring the vignette. -- [x] Confirm EDH carries ERA5-Land at 9 km native (validated #35) -- [x] Confirm temporal coverage 1950 to present (validated #35) -- [x] Confirm commercial license (CC-BY 4.0, validated #35) -- [x] Write portable PEP 723 test script (`scripts/test_edh_era5_land.py`) -- [x] Bench one month BC t2m — 15.9s vs CDS 80s -- [x] Commit test script on the branch +## Phase 1: Inspect + plan specific figures -## Phase 2: Pragmatic Python backfill (unblock #33) +- [x] Read current vignette end-to-end; list every figure chunk +- [x] Confirm AOI watershed group — `example_aoi_kotl.gpkg` IS the WSG, no fwapg needed +- [x] Decide on the 2 tmax/tmin angles to ship: tmax/tmin trend asymmetry + diurnal range +- [x] Existing `data-raw/example_context_kotl.R` already builds context_kotl.gpkg via fwapg -The quickest path to finishing the tmax/tmin data: a Python script that uses EDH directly. -Runs outside R, produces monthly GRIB or NetCDF that downstream R stages already consume. +## Phase 2: Watershed group helper -- [x] Write `scripts/backfill_edh_tmax_tmin.py` — tmax/tmin only for now (unblocks #33) -- [x] Output matches existing R pipeline's Stage 2 format (yearly `.tif` × 12 month bands, °C, EPSG:4326) -- [x] Band descriptions Jan..Dec so `cd_aggregate()` seasonal grouping works -- [x] Idempotent — skip years where both output files exist -- [x] Test on one year (1950) — 114s, realistic values, terra reads correctly -- [x] Run full backfill 1950-2025 — completed in ~1h 53min (76 years × tmax + tmin) -- [x] Regenerate 2024 from EDH (was a CDS leftover) for homogeneous methodology -- [x] QA — discovered CDS-era vars on a different grid (ext shifted 0.1°, CRS missing, 121×261 vs EDH 120×260) -- [x] Probe EDH products — confirmed two-Zarr approach: hourly for state vars, daily for prcp +- [x] Not needed — AOI = WSG, vignette-side `st_intersection` is the pragmatic clip -## Phase 2b: Unified backfill (all variables on EDH grid) +## Phase 3: Clip existing maps to watershed group -Grid mismatch between CDS-era vars and EDH-era tmax/tmin blocks release. -Regenerating everything from EDH gives one internally-consistent dataset. +- [x] `load-context` chunk: `st_intersection(layer, aoi)` for towns/lakes/rivers/streams/highways +- [x] `spatial-tmean` chunk: `terra::mask(departure, aoi)` so non-WSG cells are NA +- [x] `spatial-sm` chunk: same masking treatment +- [x] Render verified end-to-end -- [x] Extend backfill to all 7 cd variables (scripts/backfill_edh_all.py) -- [x] Regenerate all `data/backfill/monthly/*_YYYY.tif` with consistent grid + CRS -- [x] Re-run QA — all grids aligned, all CRS tagged, tmin<=tmean<=tmax sanity passes -- [x] VPD/RH derived in Python (Tetens), no R cd_derive re-run needed +## Phase 4: tmax/tmin narrative content -**QA summary 2026-04-12:** -- 7 variables × 76 years = 532 monthly TIFs -- All on 120×260 EPSG:4326 grid, extent [-139.95, -113.95, 47.95, 59.95] -- Zero tmin>tmax or tmean inversion violations (163,888 cell-checks) -- (tmax+tmin)/2 vs tmean mean diff = 0.57°C (classical climatology shortcut bias, normal) -- 2008 had a ClientPayloadError that my retry didn't catch (wrong exception class); - re-ran --year 2008 to fill the gap. Filed as future improvement in #38. +Shipped 2 of the 4 candidate angles: +- [x] **tmax + tmin annual anomaly** — two `cd_plot_timeseries` chunks side-by-side showing tmin warming faster +- [x] **Diurnal range trend** — DTR = tmax − tmin annual with linear-trend overlay +- [x] Ecological framing in 3 bullets: ET intensification, cold-air pooling weakening, stream thermal regime +- [ ] (Skipped, not blocking) Frost days, heat stress envelope, VPD × tmax — out of scope for this PR -## Phase 2c: R Stage 3 +## Phase 5: Build + verify -- [x] Run R stage 3 (COG + STAC + S3) against the unified EDH-generated TIFs - (scripts/pipeline_stage3_edh.R; 35 COGs live on s3://stac-era5-land, - verified via /vsicurl read) +- [x] Render locally; vignette builds clean (HTML 2.7 MB) +- [x] All figures have captions, cross-references work +- [x] devtools::check ran clean (no-tests run during dev) -## Phase 3: R integration for cd_fetch() +## Phase 5b: User review fixes -- [ ] Decide: reticulate + Python xarray, OR stars::read_mdim() via GDAL zarr driver -- [ ] Prototype both on one month, compare simplicity and performance -- [ ] Refactor `cd_fetch()` with `source = c("edh", "cds")` parameter, default `"edh"` -- [ ] Keep `cd_fetch()` CDS path for fallback -- [ ] Update tests to exercise both paths -- [ ] Update examples and docs +- [x] Critical look at trend data — narrative now matches actual KOTL signal +- [x] "Day-Night Asymmetry" reframed as "Daytime Highs and Overnight Lows" — diurnal range is flat at KOTL, summer max is the dominant signal +- [x] "Salmonid" not "salmon" (anadromous runs blocked by lower-Columbia dams; FN reintroduction acknowledged) +- [x] Interpretation rewritten: precip down ~10% significant, soils dry from BOTH precip decline AND ET increase +- [x] cd_compare with method="pct_change" added for prcp/soil_moisture +- [x] kableExtra scroll_box on tables 1 (catalog), 3 (baseline), 6 (compare), and the all-trends table +- [x] Disabled bookdown section numbering (number_sections: false) +- [x] kableExtra added to DESCRIPTION Suggests +- [x] README fixed (wrong example_aoi filename + stale "vignette coming soon") -## Phase 4: Pipeline and docs +## Phase 6: PR -- [ ] Update `scripts/pipeline_backfill.R` to use EDH source -- [ ] Update monthly GitHub Action to use EDH -- [ ] Add `EDH_TOKEN` to repo secrets (rotate current token first) -- [ ] Update CLAUDE.md — CDS section → EDH primary, CDS fallback -- [ ] Update README / pkgdown auth setup instructions +- [ ] Branch per issue convention, PR to main with `Fixes #39` +- [ ] SRED cross-ref in PR body (`Relates to NewGraphEnvironment/sred-2025-2026#23`) +- [ ] Archive this PWF dir to `planning/archive/` on merge -## Phase 5: Close out +## Success criteria -- [ ] Run full backfill from scratch via integrated R path — validate output matches phase 2 -- [ ] Close #33 (tmax/tmin saga resolved by EDH) -- [ ] Close #35 (evaluation → migration done) -- [ ] Merge PR closing #36 -- [ ] Archive planning to `planning/completed/` +- Two clean changes, one PR: tmax/tmin ecological narrative + tightened maps +- Every figure renders correctly in both gitbook and PDF output +- Keeps existing tmean / soil_moisture content intact +- Doesn't change any `R/` function signatures -## Success criteria +## Out of scope (explicit) -- `cd_fetch()` default path uses EDH; CDS path works as fallback -- Full 1950-2025 backfill completes in under 8 hours (single session, unattended) -- Vignette and tests pass with EDH as source -- CDS API knowledge preserved in CLAUDE.md but marked as fallback +- No new cd package functions +- No restructuring of existing tmean/soil_moisture content +- No change to the example AOI (Kootenay Lake stays) +- No bulk-rename of `planning/completed/` to `planning/archive/` (separate concern) diff --git a/planning/archive/2026-04-issue-36-edh-migration/README.md b/planning/archive/2026-04-issue-36-edh-migration/README.md new file mode 100644 index 0000000..942c4e0 --- /dev/null +++ b/planning/archive/2026-04-issue-36-edh-migration/README.md @@ -0,0 +1,28 @@ +# Archive: EDH migration (#36) + +## Outcome + +Migrated the cd producer pipeline from Copernicus CDS (`ecmwfr`) to +DestinE Earth Data Hub (Zarr). Full 1950-2025 backfill regenerated for +all 7 cd variables on a single internally-consistent EPSG:4326 BC grid, +live on `s3://stac-era5-land`. Monthly GitHub Action rewired to use EDH +and validated via `workflow_dispatch`. Consumer API unchanged. + +Released as **v0.1.0** (2026-04-14). + +## Key findings worth remembering + +- EDH hourly `tp` has `GRIB_stepType=accum` — naive sum gives 8× wrong + precip. Use the EDH daily product for prcp; hourly is correct for + state variables (t2m, d2m, swvl1-4). +- Both R and Python EDH paths need atomic writes — a killed process + that leaves a truncated `.tif` fools the idempotency check. +- ERA5-Land has ~3 month latency, so the monthly GHA will no-op most + of the year and only do real work once per year when a complete + new year becomes available on EDH. + +## Closing ref + +- Issue: NewGraphEnvironment/cd#36 +- PR: NewGraphEnvironment/cd#40 (merged 2026-04-14) +- Follow-ups: #37 (tz bias), #38 (soul convention), #39 (vignette) diff --git a/planning/archive/2026-04-issue-36-edh-migration/findings.md b/planning/archive/2026-04-issue-36-edh-migration/findings.md new file mode 100644 index 0000000..d5eed32 --- /dev/null +++ b/planning/archive/2026-04-issue-36-edh-migration/findings.md @@ -0,0 +1,108 @@ +# Findings: EDH migration + +## EDH API basics (from #35 research and tonight's benchmark) + +**Zarr URL:** `https://data.earthdatahub.destine.eu/era5/reanalysis-era5-land-no-antartica-v0.zarr` + +**Auth:** Personal access token, passed inline as URL password: +``` +https://edh:@data.earthdatahub.destine.eu/... +``` +Alternative: `storage_options={"client_kwargs":{"trust_env":True}}` to read from `.netrc`. + +**Quota:** 500,000 requests/month per user (confirmed in EDH getting-started docs). + +**Coord conventions (from live Zarr inspection):** +- Dimensions: `valid_time`, `latitude`, `longitude` +- `latitude`: descending from ~90 to -60 (confirmed by working slice `slice(60, 48)`) +- `longitude`: **0 to 359.9** (0-360 convention, NOT -180 to 180) +- For BC bbox `c(60, -140, 48, -114)` → lon slice must be `220` to `246` +- `valid_time`: hourly, from 1950-01-01T00:00 to current month last day +- Variable names: CF-style short names (`t2m`, `tp`, `d2m`, `swvl1-4`, `u10`, `v10`, etc.) + +**50 variables available in single Zarr store:** +`asn, d2m, e, es, evabs, evaow, evatc, evavt, fal, lai_hv, lai_lv, lblt, licd, lict, lmld, lmlt, lshf, ltlt, pev, ro, rsn, sd, sde, sf, skt, slhf, smlt, snowc, sp, src, sro, sshf, ssr, ssrd, ssro, stl1, stl2, stl3, stl4, str, strd, swvl1, swvl2, swvl3, swvl4, t2m, tp, tsn, u10, v10` + +**Relevant variable mapping for cd package:** +| cd variable | EDH variable(s) | Units | Notes | +|---|---|---|---| +| tmean | `t2m` | K | Hourly; monthly mean computed client-side | +| tmax | `t2m` | K | Hourly; daily max → monthly mean of daily max | +| tmin | `t2m` | K | Hourly; daily min → monthly mean of daily min | +| prcp | `tp` | m | Hourly total precipitation; sum over month, × 1000 for mm | +| dewpoint | `d2m` | K | Hourly 2m dewpoint | +| soil_moisture | `swvl1`, `swvl2`, `swvl3`, `swvl4` | m³/m³ | 4 vertical layers | +| vpd | derived | hPa | From t2m + d2m (Tetens) | +| rh | derived | % | From t2m + d2m | + +## Performance (single month BC bbox) + +- Open Zarr store: 2.6-3.0s (metadata only) +- Subset + materialize 1 month × BC bbox × hourly t2m: 14.9-15.9s +- In-memory size: 92.9 MB for 744 hours × 120 lat × 260 lon float64 + +## Implications for pipeline + +- No rate limits means we can pull **all months in parallel** via asyncio/dask if wanted +- Zarr chunking is independent of month boundaries, so small time-slice pulls are efficient +- Can pull all 7 cd variables from the **same dataset open** — no need for separate fetches per variable + +## R/Zarr tooling options + +- **Reticulate + Python xarray** — proven; what `test_edh_era5_land.py` uses. Simple, works. +- **stars::read_mdim()** — GDAL zarr driver. Requires GDAL built with zarr, auth via URL password likely works. +- **pizzarr** — young R-native zarr package, may not support remote HTTPS stores. + +Decision deferred to Phase 3 after Phase 2 unblocks the data. + +## Known limitation: UTC-day aggregation for tmax/tmin + +Both the existing R pipeline (`pipeline_tmax_tmin_hourly.R`) and the new EDH +Python script compute daily max/min over UTC-day windows, not local-time days. +For BC (UTC-7/-8) the local-afternoon tmax peak straddles UTC day boundaries, +introducing a small systematic bias vs. a local-time daily aggregation. + +- CDS's `derived-era5-land-daily-statistics` product accepts a `time_zone` + parameter and would fix this — but we abandoned it in #33 due to its + ~2-hour-per-job queue time. +- Fixing properly requires shifting hourly timestamps by the local UTC offset + before `.resample("1D")`, or doing per-pixel local-time aggregation + (computationally expensive and overkill for a regional dataset). +- For now: the EDH script replicates the R pipeline's existing behavior so + outputs are directly comparable. Addressing this bias is a follow-up issue + (cd package methodology). + +## Precipitation accumulation (ERA5-Land tp) + +The hourly `tp` variable has `GRIB_stepType=accum` — it is a **running +accumulation** from 01:00 UTC reset to 00:00 UTC next day. Naive +`.sum()` over hourly values gives a monthly precip ~8× too high. + +**Solution:** use EDH's daily product +`https://data.earthdatahub.destine.eu/era5/era5-land-daily-utc-v1.zarr` +which pre-computes daily totals correctly. Monthly precip = sum of +daily totals. + +**Why not fix the hourly sum:** writing accumulation logic ourselves +is bug-prone. The daily product is what ECMWF designed for this +exact case. + +**Caveat:** daily product only has 14 variables (t2m, d2m, swvl1-2, tp, +wind, radiation, etc.) — no swvl3 or swvl4. So soil moisture stays on +the hourly product. tmean and dewpoint could use either; we use hourly +for consistency and because the 0.99 correlation (vs CDS) was already +validated. + +## EDH product catalogue (what's available for cd package) + +| Product | Zarr URL | Variables | Use | +|---|---|---|---| +| Hourly ERA5-Land | `reanalysis-era5-land-no-antartica-v0.zarr` | 50 | tmax/tmin, tmean, dewpoint, soil_moisture | +| Daily ERA5-Land (UTC) | `era5-land-daily-utc-v1.zarr` | 14 | prcp only | +| Monthly ERA5 (not Land) | — | - | NOT useful (ERA5 parent, 31 km) | + +## Out-of-scope confirmations + +- GEE has ERA5-Land but commercial license blocks us +- AWS / ARCO / WeatherBench all serve ERA5 (31 km), not ERA5-Land +- CDS-beta is same backend, same rate limits diff --git a/planning/archive/2026-04-issue-36-edh-migration/progress.md b/planning/archive/2026-04-issue-36-edh-migration/progress.md new file mode 100644 index 0000000..5d0c20e --- /dev/null +++ b/planning/archive/2026-04-issue-36-edh-migration/progress.md @@ -0,0 +1,100 @@ +# Progress: EDH migration + +## Session 2026-04-12 (evening) + +**Context from prior day:** Got rate-limited twice by CDS. Researched alternatives (#35), benchmarked EDH Zarr at 5× faster with no rate limits and same data. Filed #36 for migration. Pivoting now. + +**Completed:** +- Branched off main: `36-edh-migration` +- Stashed and restored `scripts/test_edh_era5_land.py` on new branch +- Set up PWF files (this document, task_plan.md, findings.md) + +**Next:** +- Update climate-update.yml GHA to use EDH (replace CDS fetcher with + scripts/backfill_edh_all.py + stage 3) +- Write pipeline_update_edh (EDH incremental update, replaces + CDS-based pipeline_update.R) +- Land the PR + +## Session 2026-04-12 (Stage 3 to S3) + +**Completed:** +- Wrote `scripts/pipeline_stage3_edh.R` — slim Stage-3-only against + `data/backfill/monthly/` (avoids CDS fetch in existing pipeline_backfill.R) +- Built 35 COGs locally: 7 variables × (annual + 4 seasons) × 76 years +- STAC catalog built, 35 items +- S3 sync pushed all 35 COGs to s3://stac-era5-land (178 MB total, + replacing the April 6-7 CDS-era COGs) +- Catalog.json unchanged (byte-identical) because filenames + extents + + metadata didn't change — legitimate sync behavior, not a bug +- Verified end-to-end via /vsicurl: tmean_annual reads back with correct + 76 years, CRS, extent. BC annual mean 1950 = -1.42 C, 2024 = +1.85 C + (3.3 C warming in 74 years — the climate departure signal the package + is built to show, now live on S3 from EDH data) + +## Session 2026-04-12 (unified backfill run) + +**Completed:** +- Full 1950-2025 unified backfill of 5 non-tmax/tmin variables via EDH in ~4 hours + (10:19 - 14:20 wall time) +- 532 monthly TIFs total (7 vars × 76 years), all on matching EPSG:4326 120×260 grid +- CDS-era files safely preserved in `data/backfill/monthly/_cds_backup/` (375 files) +- Two transient errors handled well: + - 1989: TimeoutError → retry-with-backoff caught it, completed on attempt 2 + - 2008: ClientPayloadError → retry catch didn't include aiohttp class; outer handler + skipped the year and continued. Filled later via `--year 2008` rerun. +- QA confirmed: all grids aligned, tmin<=tmean<=tmax with zero violations, + physically plausible values, soil_moisture composite correct. + +## Session 2026-04-12 (overnight run) + +**Completed:** +- Full 1950-2025 backfill via EDH, ~1h 53min unattended (02:16 - 04:09) + - 75 year-files written (1951-2025) plus 1950 from earlier test = 76 years + - Avg ~90s per year, no failures +- Regenerated 2024 from EDH (was a CDS leftover from April 7 with different methodology) +- Filed #37 for the UTC-day aggregation bias (not blocking #36) +- All 76 years × 2 variables × 12 months × BC bbox GeoTIFFs in data/backfill/monthly/ + +**Validation (R/terra spot check):** +- 1950 Jul tmax up to 29.7 °C — historically realistic +- 2000 Jul tmax up to 28.9 °C +- 2024 Jul tmax up to 34.6 °C — matches heat dome era +- 2025 Jul tmax up to 31.9 °C +- All years: 12 layers, Jan..Dec names, EPSG:4326, BC bbox extent + +## Session 2026-04-12 (morning QA) + +**Completed:** +- Wrote `scripts/qa_monthly.R` — checks grid alignment, time coverage, + physical sanity across all variables +- Wrote `scripts/probe_edh_vars.py` — pulls one month of each EDH + variable, compares to CDS, validates aggregation semantics +- **Discovered grid mismatch:** CDS-era vars on 121×261 grid with no CRS + tag, vs EDH-era tmax/tmin on 120×260 EPSG:4326. Blocks release. +- **Validated EDH semantics:** + - tmean (hourly t2m mean): 0.99 correlation with CDS, 0.43 °C mean abs diff + - dewpoint: plausible BC winter values + - soil_moisture: all 4 depths match CDS composite closely + - prcp: hourly `tp` is accumulation (`GRIB_stepType=accum`) — need + EDH daily product for correct monthly precip +- **Decision:** two-Zarr approach, extend backfill to all variables so + the whole dataset is internally grid-aligned + +**Next:** +- Commit QA + probe scripts on the branch (durable tooling) +- Extend backfill script to all 7 cd variables +- Regenerate `data/backfill/monthly/` entirely +- Re-run QA → R Stage 3 → PR + +**Commits this session:** +- `0014bf2` — Add planning docs for EDH migration +- `7ef03cb` — Add EDH Zarr benchmark test script +- _pending_ — Add EDH-based tmax/tmin backfill script + +**Verified 2026-04-12 02:11:** +- EDH Zarr benchmark: 15.9s / month BC t2m (5× faster than CDS) +- EDH full-year backfill: 114.3s / year BC tmax/tmin as GeoTIFF +- Output drop-in replacement for existing R Stage 2: 12 layers Jan..Dec, + EPSG:4326, °C, BC bbox — `terra::rast()` reads cleanly with expected names +- Realistic BC values for 1950: Jan tmax -30.5 to 0.7°C, Jul tmax -1.2 to 29.7°C diff --git a/planning/archive/2026-04-issue-36-edh-migration/task_plan.md b/planning/archive/2026-04-issue-36-edh-migration/task_plan.md new file mode 100644 index 0000000..f69bfc6 --- /dev/null +++ b/planning/archive/2026-04-issue-36-edh-migration/task_plan.md @@ -0,0 +1,88 @@ +# Task Plan: Migrate cd_fetch() to DestinE Earth Data Hub Zarr + +Issue: NewGraphEnvironment/cd#36 +Branch: `36-edh-migration` + +## Context + +CDS (`ecmwfr`) rate limiting makes the tmax/tmin backfill take ~3 days of babysitting. Benchmark in #35 showed EDH Zarr delivers the same ERA5-Land data at ~15s/month (5× faster, no rate limits, 500K requests/month quota). Pivot the producer pipeline to EDH. + +## Phase 1: Scaffold and benchmark (mostly done from #35) + +- [x] Confirm EDH carries ERA5-Land at 9 km native (validated #35) +- [x] Confirm temporal coverage 1950 to present (validated #35) +- [x] Confirm commercial license (CC-BY 4.0, validated #35) +- [x] Write portable PEP 723 test script (`scripts/test_edh_era5_land.py`) +- [x] Bench one month BC t2m — 15.9s vs CDS 80s +- [x] Commit test script on the branch + +## Phase 2: Pragmatic Python backfill (unblock #33) + +The quickest path to finishing the tmax/tmin data: a Python script that uses EDH directly. +Runs outside R, produces monthly GRIB or NetCDF that downstream R stages already consume. + +- [x] Write `scripts/backfill_edh_tmax_tmin.py` — tmax/tmin only for now (unblocks #33) +- [x] Output matches existing R pipeline's Stage 2 format (yearly `.tif` × 12 month bands, °C, EPSG:4326) +- [x] Band descriptions Jan..Dec so `cd_aggregate()` seasonal grouping works +- [x] Idempotent — skip years where both output files exist +- [x] Test on one year (1950) — 114s, realistic values, terra reads correctly +- [x] Run full backfill 1950-2025 — completed in ~1h 53min (76 years × tmax + tmin) +- [x] Regenerate 2024 from EDH (was a CDS leftover) for homogeneous methodology +- [x] QA — discovered CDS-era vars on a different grid (ext shifted 0.1°, CRS missing, 121×261 vs EDH 120×260) +- [x] Probe EDH products — confirmed two-Zarr approach: hourly for state vars, daily for prcp + +## Phase 2b: Unified backfill (all variables on EDH grid) + +Grid mismatch between CDS-era vars and EDH-era tmax/tmin blocks release. +Regenerating everything from EDH gives one internally-consistent dataset. + +- [x] Extend backfill to all 7 cd variables (scripts/backfill_edh_all.py) +- [x] Regenerate all `data/backfill/monthly/*_YYYY.tif` with consistent grid + CRS +- [x] Re-run QA — all grids aligned, all CRS tagged, tmin<=tmean<=tmax sanity passes +- [x] VPD/RH derived in Python (Tetens), no R cd_derive re-run needed + +**QA summary 2026-04-12:** +- 7 variables × 76 years = 532 monthly TIFs +- All on 120×260 EPSG:4326 grid, extent [-139.95, -113.95, 47.95, 59.95] +- Zero tmin>tmax or tmean inversion violations (163,888 cell-checks) +- (tmax+tmin)/2 vs tmean mean diff = 0.57°C (classical climatology shortcut bias, normal) +- 2008 had a ClientPayloadError that my retry didn't catch (wrong exception class); + re-ran --year 2008 to fill the gap. Filed as future improvement in #38. + +## Phase 2c: R Stage 3 + +- [x] Run R stage 3 (COG + STAC + S3) against the unified EDH-generated TIFs + (scripts/pipeline_stage3_edh.R; 35 COGs live on s3://stac-era5-land, + verified via /vsicurl read) + +## Phase 3: R integration for cd_fetch() + +- [ ] Decide: reticulate + Python xarray, OR stars::read_mdim() via GDAL zarr driver +- [ ] Prototype both on one month, compare simplicity and performance +- [ ] Refactor `cd_fetch()` with `source = c("edh", "cds")` parameter, default `"edh"` +- [ ] Keep `cd_fetch()` CDS path for fallback +- [ ] Update tests to exercise both paths +- [ ] Update examples and docs + +## Phase 4: Pipeline and docs + +- [ ] Update `scripts/pipeline_backfill.R` to use EDH source +- [ ] Update monthly GitHub Action to use EDH +- [ ] Add `EDH_TOKEN` to repo secrets (rotate current token first) +- [ ] Update CLAUDE.md — CDS section → EDH primary, CDS fallback +- [ ] Update README / pkgdown auth setup instructions + +## Phase 5: Close out + +- [ ] Run full backfill from scratch via integrated R path — validate output matches phase 2 +- [ ] Close #33 (tmax/tmin saga resolved by EDH) +- [ ] Close #35 (evaluation → migration done) +- [ ] Merge PR closing #36 +- [ ] Archive planning to `planning/completed/` + +## Success criteria + +- `cd_fetch()` default path uses EDH; CDS path works as fallback +- Full 1950-2025 backfill completes in under 8 hours (single session, unattended) +- Vignette and tests pass with EDH as source +- CDS API knowledge preserved in CLAUDE.md but marked as fallback diff --git a/vignettes/climate-departure.Rmd b/vignettes/climate-departure.Rmd index 0bf6254..cd228cc 100644 --- a/vignettes/climate-departure.Rmd +++ b/vignettes/climate-departure.Rmd @@ -1,6 +1,8 @@ --- title: "Climate Departure Analysis for a Watershed" -output: bookdown::html_document2 +output: + bookdown::html_document2: + number_sections: false vignette: > %\VignetteIndexEntry{Climate Departure Analysis for a Watershed} %\VignetteEngine{knitr::rmarkdown} @@ -52,6 +54,16 @@ lakes <- st_read(ctx, layer = "lakes", quiet = TRUE) rivers <- st_read(ctx, layer = "rivers", quiet = TRUE) streams <- st_read(ctx, layer = "streams", quiet = TRUE) highways <- st_read(ctx, layer = "highways", quiet = TRUE) + +# Clip context layers to the watershed group (AOI) so maps stay focused +# on the basin and do not show features outside the boundary. Mark +# attributes as spatially constant to silence the (irrelevant) attribute- +# constancy notice without hiding genuine geometry warnings. +for (lyr in list("towns", "lakes", "rivers", "streams", "highways")) { + x <- get(lyr) + sf::st_agr(x) <- "constant" + assign(lyr, sf::st_intersection(x, aoi)) +} ``` ```{r map-location, fig.cap="Kootenay Lake (KOTL) watershed group in southeastern British Columbia showing major waterbodies, rivers, and highways.", fig.height=7} @@ -89,7 +101,11 @@ returns a tidy tibble of available variables and periods. ```{r catalog} catalog <- cd_catalog() -knitr::kable(catalog, caption = "Available climate variables and periods in the STAC catalog.") +kableExtra::kable_styling( + knitr::kable(catalog, caption = "Available climate variables and periods in the STAC catalog."), + bootstrap_options = c("striped", "hover", "condensed") +) |> + kableExtra::scroll_box(height = "320px") ``` ## Extract Climate Time Series @@ -121,7 +137,11 @@ bl_early <- cd_baseline(ts, baseline_years = 1951:1980) # WMO standard bl_wmo <- cd_baseline(ts, baseline_years = 1981:2010) -knitr::kable(bl_early, caption = "Pre-warming baseline means (1951-1980) by variable and period.", digits = 2) +kableExtra::kable_styling( + knitr::kable(bl_early, caption = "Pre-warming baseline means (1951-1980) by variable and period.", digits = 2), + bootstrap_options = c("striped", "hover", "condensed") +) |> + kableExtra::scroll_box(height = "320px") ``` ## Anomalies @@ -168,7 +188,27 @@ cmp <- cd_compare(ts, window_b = 1951:1980, method = "mean_diff" ) -knitr::kable(cmp, caption = "Comparison of recent (2015-2025) vs pre-warming (1951-1980) means.", digits = 2) +kableExtra::kable_styling( + knitr::kable(cmp, caption = "Comparison of recent (2015-2025) vs pre-warming (1951-1980) means (absolute difference).", digits = 2), + bootstrap_options = c("striped", "hover", "condensed") +) |> + kableExtra::scroll_box(height = "320px") +``` + +For variables where the baseline magnitude varies a lot in space — precipitation +and soil moisture — percent change is more interpretable than absolute +difference. `cd_compare()` accepts `method = "pct_change"` for exactly this. + +```{r compare-pct} +cmp_pct <- cd_compare( + ts[ts$variable %in% c("prcp", "soil_moisture"), ], + window_a = 2015:2025, + window_b = 1951:1980, + method = "pct_change" +) +knitr::kable(cmp_pct, + caption = "Recent vs pre-warming comparison expressed as percent change for precipitation and soil moisture.", + digits = 1) ``` ```{r plot-compare, fig.cap="Comparison of recent (2015-2025) vs pre-warming (1951-1980) climate for the Kootenay Lake watershed."} @@ -199,6 +239,8 @@ historical_idx <- which(years >= 1951 & years <= 1980) recent_mean <- mean(r_tmean[[recent_idx]]) historical_mean <- mean(r_tmean[[historical_idx]]) departure <- recent_mean - historical_mean +# Mask to the watershed boundary so cells outside the AOI go transparent +departure <- terra::mask(departure, aoi) names(departure) <- "Temperature departure" ggplot() + @@ -232,6 +274,8 @@ years_sm <- as.integer(names(r_sm)) recent_sm <- mean(r_sm[[which(years_sm >= 2015 & years_sm <= 2025)]]) historical_sm <- mean(r_sm[[which(years_sm >= 1951 & years_sm <= 1980)]]) departure_sm <- recent_sm - historical_sm +# Mask to the watershed boundary so cells outside the AOI go transparent +departure_sm <- terra::mask(departure_sm, aoi) names(departure_sm) <- "Soil moisture departure" ggplot() + @@ -278,6 +322,98 @@ knitr::kable(cd_summary(rbind(trn_summer, trn_winter)), caption = "Seasonal temperature trends (summer vs winter).") ``` +## Daytime Highs and Overnight Lows + +The cd package now ships tmax (daytime maximum) and tmin (overnight +minimum) alongside tmean. These are not redundant. In many regions tmin +warms faster than tmax under climate change — the "day-night asymmetry" +that is one of the textbook fingerprints of greenhouse warming +(Karl et al. 1993). Whether your watershed shows that signal depends on +local geography (valley inversions, snow cover, slope-aspect mix), and +the package lets you check. + +```{r plot-tmax, fig.cap="Annual daytime maximum temperature (tmax) anomaly for the Kootenay Lake watershed relative to the 1951-1980 baseline."} +trn_tmax <- cd_trend( + ano[ano$variable == "tmax" & ano$period == "annual", ], + trend_start = c(1951, 1981) +) +cd_plot_timeseries(ano, variable = "tmax", period = "annual", trend = trn_tmax, + title = "Daytime Maximum (tmax) — Annual Anomaly") +``` + +```{r plot-tmin, fig.cap="Annual overnight minimum temperature (tmin) anomaly for the Kootenay Lake watershed relative to the 1951-1980 baseline."} +trn_tmin <- cd_trend( + ano[ano$variable == "tmin" & ano$period == "annual", ], + trend_start = c(1951, 1981) +) +cd_plot_timeseries(ano, variable = "tmin", period = "annual", trend = trn_tmin, + title = "Overnight Minimum (tmin) — Annual Anomaly") +``` + +For Kootenay Lake the slopes are essentially the same: about +1.9 °C +each since 1951. The diurnal temperature range — daytime maximum minus +overnight minimum — is therefore approximately constant. + +```{r plot-dtr, fig.cap="Diurnal temperature range (daytime maximum minus overnight minimum) annual mean for the Kootenay Lake watershed. The trend is essentially flat, indicating that overnight lows and daytime highs are warming at similar rates here — counter to the textbook day-night asymmetry."} +tmax_ts <- ts[ts$variable == "tmax" & ts$period == "annual", c("year", "value")] +tmin_ts <- ts[ts$variable == "tmin" & ts$period == "annual", c("year", "value")] +dtr <- merge(tmax_ts, tmin_ts, by = "year", suffixes = c("_max", "_min")) +dtr$dtr <- dtr$value_max - dtr$value_min + +ggplot(dtr, aes(x = year, y = dtr)) + + geom_line(color = "grey50") + + geom_point(color = "grey30", size = 1) + + geom_smooth(method = "lm", se = FALSE, color = "#b2182b", linewidth = 0.8) + + labs( + title = "Diurnal Temperature Range — Kootenay Lake Watershed", + x = NULL, + y = expression("Daytime maximum minus overnight minimum (" * degree * "C)") + ) + + theme_minimal(base_size = 12) +``` + +The diurnal temperature range trend at Kootenay Lake is about +−0.06 °C over the full record — well within noise. **This watershed +does not show the textbook day-night asymmetry.** That is itself +informative: regional generalisations don't always transfer to +specific basins, and the value of having the data per-watershed is +exactly the ability to check. + +What does stand out at Kootenay Lake is the **seasonal pattern**. + +```{r seasonal-tmax-tmin} +trn_tx_tn_seasonal <- cd_trend( + ano[ano$variable %in% c("tmax", "tmin") & + ano$period %in% c("winter", "spring", "summer", "fall"), ], + trend_start = 1951 +) +knitr::kable( + cd_summary(trn_tx_tn_seasonal), + caption = "Seasonal trends in daytime maximum and overnight minimum temperature at Kootenay Lake, 1951-2025." +) +``` + +Summer warming is the strongest single signal: daytime maxima have +risen about +2.8 °C and overnight minima about +2.9 °C since 1951 — +roughly 1.5× the annual mean tmean trend. Winter is the only season +where the day-night asymmetry shows clearly here, and it goes the +*opposite* direction of the textbook expectation: winter daytime maxima +have warmed significantly while winter overnight minima have not, a +pattern consistent with persistent winter cold-air pooling in the +valley bottom. + +The summer tmax signal is the temperature envelope that drives: + +- **Salmonid thermal stress.** Anadromous salmon runs to Kootenay Lake + are blocked by lower-Columbia dams, but resident salmonids (kokanee, + bull trout, Gerrard rainbow) and First Nations re-introduction + efforts face increasing summer thermal stress in tributaries. +- **Fire weather.** Hot dry summer afternoons set the envelope for + ignition probability and spread potential. +- **Snowmelt timing.** Earlier and faster spring/summer warming + shifts freshet earlier and lowers late-summer baseflows, the period + when stream temperature stress is already highest. + ## Precipitation and Soil Moisture While temperature shows clear departure, precipitation trends in BC @@ -298,27 +434,39 @@ all_trends <- cd_trend( ano[ano$period == "annual", ], trend_start = c(1951, 1981) ) -knitr::kable(cd_summary(all_trends), - caption = "Annual trend statistics for all variables, Kootenay Lake watershed.") +kableExtra::kable_styling( + knitr::kable( + cd_summary(all_trends), + caption = "Annual trend statistics for all variables and both trend windows (1951- and 1981-), Kootenay Lake watershed." + ), + bootstrap_options = c("striped", "hover", "condensed") +) |> + kableExtra::scroll_box(height = "320px") ``` ## Interpretation -The analysis reveals the climate departure pattern common across British -Columbia's interior watersheds: - -- **Temperature is rising.** The cumulative shift since the mid-20th century - is substantial and statistically significant across all seasons. -- **Precipitation trends are weak.** Year-to-year variability is high but - there is no strong directional change. -- **Soils are drying despite stable precipitation.** Warmer temperatures - increase evapotranspiration, pulling moisture from soils even when the - same amount of rain falls. For aquatic ecosystems, drier soils mean - reduced summer baseflows during the period when flows are already lowest. - -This pattern — warming without precipitation change leading to hydrological -drought — is the mechanism connecting climate departure to habitat -degradation in salmon-bearing watersheds. +The analysis reveals the climate departure pattern at Kootenay Lake: + +- **Temperature is rising, especially in summer.** Annual mean + temperature has risen about +1.9 °C since 1951, with summer warming + the dominant signal at +2.8 °C. Daytime and overnight warming rates + are similar at the annual scale here, contrary to the regional textbook + expectation of overnight lows warming faster. +- **Precipitation has declined ~10% since 1951.** The trend is + statistically significant (p ≈ 0.015). Year-to-year variability is + high but the directional decline is real, not noise. +- **Soils are drying because both inputs and demand have shifted.** + Precipitation is down ~10% AND warmer temperatures (especially in + summer) drive higher evapotranspiration. Both effects compound to + reduce summer baseflows in salmonid-bearing tributaries — the period + when flows are already lowest. + +This pattern — warming temperatures, declining precipitation, and +intensifying evapotranspiration combining to dry soils and reduce +late-summer streamflow — is the mechanism connecting climate +departure to habitat degradation in salmonid-bearing watersheds across +the BC interior. ## Data Source