Skip to content

feat: ERA5 + GLDAS NetCDF forcing support#79

Merged
DankerMu merged 1 commit into
masterfrom
feat/era5-gldas-forcing
Feb 14, 2026
Merged

feat: ERA5 + GLDAS NetCDF forcing support#79
DankerMu merged 1 commit into
masterfrom
feat/era5-gldas-forcing

Conversation

@DankerMu
Copy link
Copy Markdown
Owner

Summary

Extend NetcdfForcingProvider to support three forcing products (CMFD2, ERA5, GLDAS), all regression-verified against CSV baselines.

ERA5 changes

  • Fix end-boundary file enumeration: add lookahead day for accumulated field forward-differencing (tp, ssr)
  • Implement forward-difference conversion for accumulated fields (tp → mm/day, ssr → W/m²) with reset handling and small-negative tolerance
  • Apply CSV quantization rules (precip 4dp, temp 2dp, RH 4dp, wind 2dp + 0.05 floor, radiation integer) to ERA5 branch
  • Set RADIATION_KIND=SWNET for ERA5 (ssr is net shortwave)

GLDAS (new, PRODUCT=GLDAS)

  • Per-timestep file layout: {year}/{doy}/GLDAS_NOAH025_3H.A{yyyymmdd}.{hhmm}.021.nc4
  • Efficient timestep enumeration via direct path construction (no directory scan)
  • NcVarPointMeta: lightweight shared-handle reader to avoid re-opening same .nc4 for each variable
  • _FillValue water-mask handling: stations on missing cells auto-remap to nearest valid grid cell
  • Variable mapping: Rainf_f_tavg→mm/day, Tair_f_inst→°C, Qair_f_inst+Psurf_f_inst→RH, Wind_f_inst→m/s, SWdown_f_tavg→W/m²

NLDAS assessment

  • GRIB format (.grb) incompatible with current NetCDF architecture
  • North America only coverage — does not cover QHH test domain
  • Marked as out-of-scope for this phase

Verification (QHH 10-day)

Product NetCDF Run CSV Baseline Forcing max_abs_diff Legacy .dat
CMFD2 0 SHA256 identical
ERA5 0 SHA256 identical
GLDAS 0 SHA256 identical

Files changed

  • src/classes/NetcdfForcingProvider.cpp: +837/-113 lines (ERA5 fixes + GLDAS implementation)

Test plan

  • make shud NETCDF=1 compiles successfully
  • QHH 10-day run with ERA5 forcing completes (The successful end.)
  • QHH 10-day run with GLDAS forcing completes (The successful end.)
  • ERA5 forcing regression: max_abs_diff = 0 vs CSV baseline
  • GLDAS forcing regression: max_abs_diff = 0 vs CSV baseline
  • ERA5 legacy output .dat SHA256-identical to CSV baseline
  • GLDAS legacy output .dat SHA256-identical to CSV baseline
  • CMFD2 baseline run unbroken (re-verified)

🤖 Generated with Claude Code

…d parity

Extend NetcdfForcingProvider to support three forcing products (CMFD2, ERA5, GLDAS),
all verified to produce identical forcing values and legacy output compared to CSV baselines.

ERA5 changes:
- Fix end-boundary file enumeration: add lookahead day for accumulated field
  forward-differencing (tp, ssr) so the last simulation timestep has a valid increment
- Implement forward-difference conversion for accumulated fields (tp → mm/day,
  ssr → W/m²) with reset handling and small-negative tolerance
- Apply CSV quantization rules (precip 4dp, temp 2dp, RH 4dp, wind 2dp + 0.05
  floor, radiation integer) to ERA5 branch for baseline parity
- Set RADIATION_KIND=SWNET for ERA5 (ssr is net shortwave, not downward)

GLDAS (new, PRODUCT=GLDAS):
- Per-timestep file layout: {year}/{doy}/GLDAS_NOAH025_3H.A{yyyymmdd}.{hhmm}.021.nc4
- Efficient timestep enumeration via direct path construction (no directory scan)
- NcVarPointMeta: lightweight shared-handle reader to avoid re-opening the same
  .nc4 file for each of 6 variables per timestep
- _FillValue water-mask handling: stations mapping to missing cells are automatically
  remapped to nearest valid grid cell (logged)
- Variable mapping: Rainf_f_tavg(kg/m²/s)→mm/day, Tair_f_inst(K)→°C,
  Qair_f_inst+Psurf_f_inst→RH (same formula as CMFD2), Wind_f_inst→m/s,
  SWdown_f_tavg→W/m²
- CSV quantization and threshold rules applied consistently

Verification (QHH 10-day, all three products):
- Forcing: max_abs_diff = 0 (CSV baseline vs NetCDF, all 5 variables)
- Legacy output: .dat files SHA256-identical between CSV and NetCDF runs
- Baseline (CMFD2 CSV) unbroken

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@DankerMu
Copy link
Copy Markdown
Owner Author

Code Review: PR #79 — ERA5 + GLDAS NetCDF Forcing Support

Score: 4/5 — Nitpicks 🤓
Decision: PASS

Verification Evidence

  • QHH 10-day: ERA5/GLDAS/CMFD2 all pass (The successful end.)
  • Forcing regression: max_abs_diff = 0 (all 3 products, CSV vs NetCDF)
  • Legacy output: .dat SHA256-identical between CSV baseline and NetCDF runs
  • CI: Build (baseline) ✅, Build (NETCDF=1) ✅

Issues Found

🟡 MEDIUM (tracked for follow-up, not blocking given verification pass)

🟡 MEDIUM | Correctness | src/classes/NetcdfForcingProvider.cpp:1666

GLDAS timestep enumeration / time-axis tail semantics: initGldasTimesteps() uses floor(sim_end_min / 180) which may include an extra boundary step or leave nextTimeMin() returning NA for the last record. Current QHH 10-day run is unaffected (verified), but edge cases at exact 3-hour boundaries could be brittle. Recommend: document "N intervals require N+1 boundary file" or compute nextTimeMin/maxTimeMin from known dt=180 when next is missing.

🔵 LOW (non-blocking)

🔵 LOW | Quality | src/classes/NetcdfForcingProvider.cpp:2087

NC_VAR_SP is opened but never used for ERA5. Either remove or wire into a calculation.

🔵 LOW | Correctness | src/classes/NetcdfForcingProvider.cpp:1840

_FillValue remap checks validity only via TEMP; if masks differ across variables, could fail in readPointFromOpenFile. Consider validating against all required vars during remap.

🔵 LOW | Quality | src/classes/NetcdfForcingProvider.cpp:881

nc_close() return codes ignored (existing pattern). Consider checking/logging in debug builds.

🔵 LOW | Documentation | src/classes/NetcdfForcingProvider.cpp:2451

Provider is stateful (open handles + mutable caches). Worth documenting as "not thread-safe" (fine under current single-thread movePointer() execution model).

Positive Aspects

  • GLDAS single nc_open() per timestep + NcVarPointMeta reuse avoids per-variable file-open overhead
  • ERA5 accumulated tp/ssr conversion with reset handling + negative-delta tolerance using actual dt_sec
  • _FillValue-aware station remap for GLDAS water-mask cells is a practical robustness improvement
  • CSV quantization/clamping rules aligned across CMFD2/ERA5/GLDAS branches
  • Error messages include file/var/index/station context for debugging

@DankerMu DankerMu merged commit 752d27a into master Feb 14, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant