Problem
cd_fetch() currently pulls all climate variables from the CDS (ecmwfr) API. Today's experience (#33) and benchmark (#35) showed:
- CDS has aggressive rate limits — sustained ~10 files/hour after the initial ~70-file allowance
- Every request queues server-side and counts against quota, even when polling fails
- The tmax/tmin backfill will take ~3 days of babysitting to complete via CDS
- CDS returns GRIB per-month per-variable, requiring 912 requests per variable for the 1950-2025 backfill
Solution
Migrate cd_fetch() to DestinE Earth Data Hub (EDH) Zarr as the primary data source.
Validated in #35 (benchmark):
- Same product (ERA5-Land, 9 km native, 1950-present)
- Same license (CC-BY 4.0, commercial OK)
- One Zarr store contains all 50 ERA5-Land variables
- 15.9 seconds per month BC bbox vs ~80s via CDS
- 500K requests/month quota — effectively unlimited for our use
- No queueing, no polling, no rate-limit babysitting
- Full backfill ~4 hours unattended vs ~3 days via CDS
Zarr URL: https://data.earthdatahub.destine.eu/era5/reanalysis-era5-land-no-antartica-v0.zarr
Scope
Phase 1: New EDH fetcher alongside CDS
- Add
cd_fetch_edh() or refactor cd_fetch() with a source parameter
- Use xarray + zarr via reticulate, OR pure R via
stars::read_mdim() (GDAL zarr driver) — evaluate both
- Token via
EDH_TOKEN env var (already in ~/.Renviron)
- Variable mapping: EDH
t2m → our tmean/tmax/tmin inputs, tp → prcp, d2m → dewpoint, swvl1-4 → soil_moisture, etc.
- Maintain existing output format (monthly COG or intermediate NetCDF) so downstream stages are untouched
Phase 2: Finish the backfill via EDH
Phase 3: Decide on CDS role
- Option A: drop CDS entirely
- Option B: keep CDS as fallback for operational redundancy (both serve the same ERA5-Land data)
- Option C: keep CDS only for near-real-time updates if EDH has more lag than CDS
Phase 4: Update docs and pipeline
- CLAUDE.md — update CDS API section, mention EDH primary
- README + pkgdown — EDH auth setup instructions
- Monthly GitHub Action — switch to EDH
- Secrets — add
EDH_TOKEN to repo secrets (rotate the one we've been using in chat)
Out of scope
- Derived variables (
vpd, rh) — still computed locally, no source change needed
- Downstream (
cd_derive, cd_aggregate, cd_cog_write, cd_stac_catalog, cd_s3_push) — unaffected if intermediate format stays the same
Risks
- EDH reliability / uptime — we're adopting a single provider. Mitigated by keeping CDS as fallback (option B above).
- Zarr chunking may not align with month boundaries, so a "one month pull" could fetch slightly more bytes than needed. In practice fine — quota is generous.
- R zarr tooling is less mature than Python. Reticulate + Python xarray is the pragmatic path; pure R via
stars/GDAL is cleaner if it works.
Tracking
Relates to #33 (tmax/tmin operational backfill)
Relates to #35 (alternative source evaluation — SUPERSEDED by this migration)
Relates to NewGraphEnvironment/sred-2025-2026#23
Problem
cd_fetch()currently pulls all climate variables from the CDS (ecmwfr) API. Today's experience (#33) and benchmark (#35) showed:Solution
Migrate
cd_fetch()to DestinE Earth Data Hub (EDH) Zarr as the primary data source.Validated in #35 (benchmark):
Zarr URL:
https://data.earthdatahub.destine.eu/era5/reanalysis-era5-land-no-antartica-v0.zarrScope
Phase 1: New EDH fetcher alongside CDS
cd_fetch_edh()or refactorcd_fetch()with asourceparameterstars::read_mdim()(GDAL zarr driver) — evaluate bothEDH_TOKENenv var (already in~/.Renviron)t2m→ our tmean/tmax/tmin inputs,tp→ prcp,d2m→ dewpoint,swvl1-4→ soil_moisture, etc.Phase 2: Finish the backfill via EDH
Phase 3: Decide on CDS role
Phase 4: Update docs and pipeline
EDH_TOKENto repo secrets (rotate the one we've been using in chat)Out of scope
vpd,rh) — still computed locally, no source change neededcd_derive,cd_aggregate,cd_cog_write,cd_stac_catalog,cd_s3_push) — unaffected if intermediate format stays the sameRisks
stars/GDAL is cleaner if it works.Tracking
Relates to #33 (tmax/tmin operational backfill)
Relates to #35 (alternative source evaluation — SUPERSEDED by this migration)
Relates to NewGraphEnvironment/sred-2025-2026#23