Extract bulk-fetch safeguards into shared scripts/_lib.py#52
Merged
Conversation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ckfill scripts Closes the gaps in #38. preflight_single_instance, with_retry, atomic write_geotiff, log, get_token, MONTH_NAMES move from backfill_edh_all.py into a new shared scripts/_lib.py. New backup_before_delete() helper codifies the on-disk pattern from data/backfill/monthly/_cds_backup/ — no call sites yet, ready for the snow-vars script (#48) if a regen is needed. backfill_edh_tmax_tmin.py was missing all three safeguards; now imports the same helpers, with_retry wraps the zarr open and the .compute() calls, write_geotiff replaces the inline non-atomic to_geotiff_raster. Smoke-tested both scripts with --year 1950 (idempotent-skip path) and verified the pgrep guard rejects a concurrent second instance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes Layer 1 of #38 — bulk-fetch safeguards. Extracts the existing pgrep guard, retry-with-backoff, and atomic write helpers from
scripts/backfill_edh_all.pyinto a new sharedscripts/_lib.py, then applies them to the siblingscripts/backfill_edh_tmax_tmin.py(which previously had none of the three safeguards). Adds a newbackup_before_delete()helper that codifies the on-disk pattern fromdata/backfill/monthly/_cds_backup/.Net result:
-156/+277LOC, mostly relocation. Both production bulk-fetch scripts now share one source of truth for the safeguards, and the snow-vars backfill script we'll write for #48 inherits them for free viafrom _lib import ….Surprise finding during exploration
The issue checklist named three "missing" safeguards in
backfill_edh_all.py. Plan-mode exploration found that two of them — pgrep guard andwith_retry— were already in place via commits5bf1b34and6f66a01. The issue was partially stale. The third (backup_before_delete) was de facto in operational use on disk (375 files indata/backfill/monthly/_cds_backup/hand-moved during the EDH migration) but never codified as code. The sibling scriptbackfill_edh_tmax_tmin.pywas missing all three.So the actual scope landed as: extract existing helpers, ship a
backup_before_delete()codification, port everything tobackfill_edh_tmax_tmin.py.Changes
New:
scripts/_lib.pypreflight_single_instance(name)— parameterized pgrep guard (each script passes its own basename)with_retry(fn, ...)— exponential backoff onOSError/ConnectionError/TimeoutErrorwrite_geotiff(da, out_path, band_names=MONTH_NAMES)— atomic.tmp+os.replacelog(msg)— timestamped, flushedget_token()— EDH token from env or~/.Renvironbackup_before_delete(files, backup_subdir="_backup")— new helper, no overwritesMONTH_NAMESconstantRefactored:
scripts/backfill_edh_all.pyfrom _lib import …, drop the now-redundant local copiespreflight_single_instance("backfill_edh_all")(parameterized)Ported:
scripts/backfill_edh_tmax_tmin.pypreflight_single_instance("backfill_edh_tmax_tmin")at top ofmain()with_retryaroundxr.open_datasetand each.compute()callwrite_geotiffreplaces the inline non-atomicto_geotiff_rasterlog(...)replacesprint(f"[{time.strftime(...)}] ...")callsTest plan
python3 -c "import ast; ast.parse(...)"clean on all three filesuv run scripts/backfill_edh_all.py --year 1950— opens both Zarrs underwith_retry, hits idempotent-skip path, exits cleanuv run scripts/backfill_edh_tmax_tmin.py --year 1950— same, single hourly Zarrbackfill_edh_tmax_tmin --year 2026in background, second concurrent instance ABORTed with both pids reported in the messageos./subprocess./sys./rasterio./rioxarrayreferences remain in either bulk-fetch script after the imports were trimmedbulk-fetch-safeguards.mdconvention withscripts/_lib.pyas the worked exampleOut of scope
--regenCLI flag on either backfill script that wiresbackup_before_deleteinto a real call site. Helper ships unused; first real call site lands with Add snow-related variables (SWE, snowfall fraction, melt timing) for hydrology departure #48 if its aggregation method requires re-running existing year files.cdis R-first; Python scripts are utility-tier with PEP 723 inline deps. Adding lint/test infra is a separate decision.probe_edh_vars.pyandtest_edh_era5_land.py. Both are one-shot validation scripts; the safeguards target production bulk-fetch.Notes
#48 (snow variables) will be the first downstream caller of
_lib.py. Methodology pinned in issue comment: daily resolution sourced fromera5-land-daily-utc-v1.zarr, 7-day rolling sum of dailysmltforsnowmelt_rate_peak, daily product preferred over hourly to dodge thestepType=accumtrap that bittpin #36.Fixes #38
Relates to NewGraphEnvironment/sred-2025-2026#23
🤖 Generated with Claude Code