You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Full disclosure: this is a Claude-only review — no human second pass. Bugs and blockers should be real, but the nitpicks may be overzealous in places. Take the polish-level items with a grain of salt.
Overview
Restructures the pipeline for O(100k)-image runs: per-exposure work dirs, a new run_job_sp_canfar_v2.0.bash driver dispatching bit-coded jobs across tile- and exposure-level runners, and a new exp_utils.get_exp_output_files helper that lets tile-level modules discover files produced by per-exposure runners. Also: read_ext_cat module (ASCII SExtractor → FITS-LDAC, enables using Stephen Gwyn's external UNIONS catalogue for tile detection); Dockerfile rewrite (base swapped to images.canfar.net/skaha/astroml:latest, cdsclient → astroquery, new Dockerfile.jupyter); Vizier retry loop with server fallback.
Scope: 54 files, +26.5k/−306. Stripping the 23k-line r-band tile list leaves ~3400 lines of actual code change.
Blocking
scripts/sh/init_run_v2.0.sh:108 — syntax error, script will not parse.echo " ├── exp/ missing the closing quote; bash -n init_run_v2.0.sh reports syntax error near unexpected token '(' on line 111. Since this is the first script users run for v2.0, it's a hard block.
Dockerfile.jupyter:4 — FROM shapepipe-base references an image that isn't built by anything in the repo or CI. Commit history shows Dockerfile.base was intentionally removed without updating this. Clean builds will fail. Either restore Dockerfile.base and wire it into the workflow, or change to FROM ghcr.io/cosmostat/shapepipe:<tag>.
pyproject.toml:33 — "setuptools<81" with no rationale. Almost certainly a workaround for a transitive build regression (skyproj / pyccl / similar). Add an inline # comment naming the cause, so it can be unpinned later when upstream catches up.
pyproject.toml:13 — stray #"shear_psf_leakage", dangling above the dependencies = [...] list. Delete or move inside with a reason.
Bugs & risks
src/shapepipe/modules/mask_package/mask.py:512 — Vizier.SERVER = server mutates class-level state. Fine in a single process, but under SMP parallelism two workers can stomp on each other mid-query. Either construct a fresh Vizier(server=...) per call (if supported by the installed astroquery), or serialize Vizier access. Same pattern in scripts/python/create_star_cat.py.
src/shapepipe/modules/read_ext_cat_package/read_ext_cat.py:223–224 — silent ID overflow.tile_id = int(parts[0]) * 1000 + int(parts[1]) assumes parts[1] < 1000. CFIS dec indices are 3-digit today but that's a floor, not a ceiling — a future parts[1] == 1000 would collide with parts[0] + 1. Add an assertion or widen the multiplier.
src/shapepipe/pipeline/dependency_handler.py:30 — def __init__(..., exe_to_module={}) mutable default arg. The existing dependencies=[], executables=[] also have this, but new code shouldn't propagate. Use None and normalize inside.
scripts/sh/job_sp_canfar_v2.0.bash:170 — fragile path walk.export SP_EXP=$(realpath "$SP_RUN/../../../exp") assumes exactly three directories between SP_RUN and the v2.0 root. If invoked from a scratch copy or test tree, SP_EXP silently points elsewhere. Pass explicitly via env var or argument.
Duplicated Vizier retry logic.create_star_cat.py and mask.py carry near-identical server lists, timeouts, and backoff loops. Factor into one helper (cs_util or shapepipe.utilities.vizier) before they drift.
Code quality
scripts/sh/job_sp_canfar_v2.0.bash:206–226 — the command function's else branch reads $4, $5, $6 from the caller. But all call sites pass 2 args (via command_sp). The branch appears dead. Either remove or document what it was for.
scripts/sh/job_sp_canfar_v2.0.bash:250–255 — command_sp is a pure passthrough to command. Delete the wrapper.
src/shapepipe/modules/read_ext_cat_runner.py:33–34 — docstring says "runs multi-epoch post-processing to add per-exposure HDUs", but make_post_process is only called when MAKE_POST_PROCESS = True in the config. Note the optionality.
Module name read_ext_cat is vague — this is specifically an ASCII-SExtractor → FITS-LDAC converter. Something like read_ext_sexcat_runner would signal scope.
scripts/sh/init_run_v2.0.sh:61 — sed 's/CFIS\.\([0-9]*\)\..*/\1/' silently emits the original line for non-matches. A grep -oE pipeline would fail loudly.
scripts/sh/run_job_sp_canfar_v2.0.bash:427–428 — hardcoded CONDA_PREFIX=$HOME/.conda/envs/shapepipe is fine for CANFAR but silently not-exists elsewhere.
Performance
read_ext_cat.py:232 loads each tile image fully into RAM via hdul[0].data.astype(np.float32). CFIS tiles are ~320 MB, so OK, but the .astype forces a full copy. memmap=True + per-vignet slicing would halve peak memory per worker.
get_exp_output_files does a glob per exposure per tile-level invocation. For O(100k) images this is O(N_exp) globs per tile per job. Fine on fast storage; worth watching on slow shared mounts.
Tests
Zero new tests.exp_utils.get_exp_output_files is trivial to test with tmpdirs; read_ext_cat.make_ldac_from_ascii can round-trip a synthetic catalog. Same gap we've flagged on #702 and #699 — worth naming as a pattern and resolving.
Positives worth naming
The exp_utils abstraction is clean and well-documented.
_check_executable now including the module name in the error message is a nice UX win.
merge_headers_runner's dual-mode (tile-level via EXP_BASE_DIR vs per-exposure) is a clean extension rather than a fork.
Dockerfile base switch to images.canfar.net/skaha/astroml is sensible — avoids rebuilding the scientific stack and cuts build time significantly.
The Vizier retry logic itself (servers + backoff) is a correct fix for flaky astroquery behaviour, pending the concurrency caveat above.
Recommendation
Request changes on the two blockers (init_run_v2.0.sh syntax, Dockerfile.jupyter base) and the two pyproject.toml hygiene items. The rest is worth filing but not gating.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Reviewer Checklist
developbranch