V2.0 by martinkilbinger · Pull Request #706 · CosmoStat/shapepipe

martinkilbinger · 2026-04-07T14:08:00Z

Summary

New directory structure to better handle runs with O(100,000) images.
Updated job scripts to run tile-based jobs including required exposures.
Dockerfile.jupyter to deply jupyter notebook session to canfar science portal

Reviewer Checklist

Reviewers should tick the following boxes before approving and merging the PR.

cailmdaley · 2026-04-21T06:19:30Z

Full disclosure: this is a Claude-only review — no human second pass. Bugs and blockers should be real, but the nitpicks may be overzealous in places. Take the polish-level items with a grain of salt.

Overview

Restructures the pipeline for O(100k)-image runs: per-exposure work dirs, a new run_job_sp_canfar_v2.0.bash driver dispatching bit-coded jobs across tile- and exposure-level runners, and a new exp_utils.get_exp_output_files helper that lets tile-level modules discover files produced by per-exposure runners. Also: read_ext_cat module (ASCII SExtractor → FITS-LDAC, enables using Stephen Gwyn's external UNIONS catalogue for tile detection); Dockerfile rewrite (base swapped to images.canfar.net/skaha/astroml:latest, cdsclient → astroquery, new Dockerfile.jupyter); Vizier retry loop with server fallback.

Scope: 54 files, +26.5k/−306. Stripping the 23k-line r-band tile list leaves ~3400 lines of actual code change.

Blocking

scripts/sh/init_run_v2.0.sh:108 — syntax error, script will not parse. echo " ├── exp/ missing the closing quote; bash -n init_run_v2.0.sh reports syntax error near unexpected token '(' on line 111. Since this is the first script users run for v2.0, it's a hard block.
Dockerfile.jupyter:4 — FROM shapepipe-base references an image that isn't built by anything in the repo or CI. Commit history shows Dockerfile.base was intentionally removed without updating this. Clean builds will fail. Either restore Dockerfile.base and wire it into the workflow, or change to FROM ghcr.io/cosmostat/shapepipe:<tag>.
pyproject.toml:33 — "setuptools<81" with no rationale. Almost certainly a workaround for a transitive build regression (skyproj / pyccl / similar). Add an inline # comment naming the cause, so it can be unpinned later when upstream catches up.
pyproject.toml:13 — stray #"shear_psf_leakage", dangling above the dependencies = [...] list. Delete or move inside with a reason.

Bugs & risks

src/shapepipe/modules/mask_package/mask.py:512 — Vizier.SERVER = server mutates class-level state. Fine in a single process, but under SMP parallelism two workers can stomp on each other mid-query. Either construct a fresh Vizier(server=...) per call (if supported by the installed astroquery), or serialize Vizier access. Same pattern in scripts/python/create_star_cat.py.
src/shapepipe/modules/read_ext_cat_package/read_ext_cat.py:223–224 — silent ID overflow. tile_id = int(parts[0]) * 1000 + int(parts[1]) assumes parts[1] < 1000. CFIS dec indices are 3-digit today but that's a floor, not a ceiling — a future parts[1] == 1000 would collide with parts[0] + 1. Add an assertion or widen the multiplier.
src/shapepipe/pipeline/dependency_handler.py:30 — def __init__(..., exe_to_module={}) mutable default arg. The existing dependencies=[], executables=[] also have this, but new code shouldn't propagate. Use None and normalize inside.
scripts/sh/job_sp_canfar_v2.0.bash:170 — fragile path walk. export SP_EXP=$(realpath "$SP_RUN/../../../exp") assumes exactly three directories between SP_RUN and the v2.0 root. If invoked from a scratch copy or test tree, SP_EXP silently points elsewhere. Pass explicitly via env var or argument.
Duplicated Vizier retry logic. create_star_cat.py and mask.py carry near-identical server lists, timeouts, and backoff loops. Factor into one helper (cs_util or shapepipe.utilities.vizier) before they drift.

Code quality

scripts/sh/job_sp_canfar_v2.0.bash:206–226 — the command function's else branch reads $4, $5, $6 from the caller. But all call sites pass 2 args (via command_sp). The branch appears dead. Either remove or document what it was for.
scripts/sh/job_sp_canfar_v2.0.bash:250–255 — command_sp is a pure passthrough to command. Delete the wrapper.
src/shapepipe/modules/read_ext_cat_runner.py:33–34 — docstring says "runs multi-epoch post-processing to add per-exposure HDUs", but make_post_process is only called when MAKE_POST_PROCESS = True in the config. Note the optionality.
Module name read_ext_cat is vague — this is specifically an ASCII-SExtractor → FITS-LDAC converter. Something like read_ext_sexcat_runner would signal scope.
scripts/sh/init_run_v2.0.sh:61 — sed 's/CFIS\.$[0-9]*$\..*/\1/' silently emits the original line for non-matches. A grep -oE pipeline would fail loudly.
scripts/sh/run_job_sp_canfar_v2.0.bash:427–428 — hardcoded CONDA_PREFIX=$HOME/.conda/envs/shapepipe is fine for CANFAR but silently not-exists elsewhere.

Performance

read_ext_cat.py:232 loads each tile image fully into RAM via hdul[0].data.astype(np.float32). CFIS tiles are ~320 MB, so OK, but the .astype forces a full copy. memmap=True + per-vignet slicing would halve peak memory per worker.
get_exp_output_files does a glob per exposure per tile-level invocation. For O(100k) images this is O(N_exp) globs per tile per job. Fine on fast storage; worth watching on slow shared mounts.

Tests

Zero new tests. exp_utils.get_exp_output_files is trivial to test with tmpdirs; read_ext_cat.make_ldac_from_ascii can round-trip a synthetic catalog. Same gap we've flagged on #702 and #699 — worth naming as a pattern and resolving.

Positives worth naming

The exp_utils abstraction is clean and well-documented.
_check_executable now including the module name in the error message is a nice UX win.
merge_headers_runner's dual-mode (tile-level via EXP_BASE_DIR vs per-exposure) is a clean extension rather than a fork.
Dockerfile base switch to images.canfar.net/skaha/astroml is sensible — avoids rebuilding the scientific stack and cuts build time significantly.
The Vizier retry logic itself (servers + backoff) is a correct fix for flaky astroquery behaviour, pending the concurrency caveat above.

Recommendation

Request changes on the two blockers (init_run_v2.0.sh syntax, Dockerfile.jupyter base) and the two pyproject.toml hygiene items. The rest is worth filing but not gating.

martinkilbinger added 17 commits March 30, 2026 08:48

added r-band tile list

a7e8cdd

added init v2.0 script

a74d290

v2.0 scripts

333b677

addd proj libraries for skyproj (Dockerfile)

f4c4f95

Added 202604 tiles

c9a3655

pipeline canfar updates

3595368

v2.0: running jobs 1 2

30bb5f6

Merge remote-tracking branch 'upstream/develop' into v2.0

609b99d

running Fe

1fe4410

running v2.0 to split exp merge headers

10d39fa

remove cdsclient from Dockerfile, now using astroquery

696d48a

Merge branch 'v2.0' of github.com:martinkilbinger/shapepipe-1 into v2.0

96cfcc0

v2.0 masking updated and fixed

84bbd7d

Merge remote-tracking branch 'origin/v2.0' into v2.0

8a4affc

Created common Dockerfile.base for both standard and jupyter Dockerfiles

8e7cd1a

Removed Dockerfile.base

9261488

Fixing Dockerfile

57972e5

martinkilbinger self-assigned this Apr 7, 2026

martinkilbinger added the enhancement New feature or request label Apr 7, 2026

martinkilbinger added this to ShapePipe Dev Apr 7, 2026

github-project-automation bot moved this to To do in ShapePipe Dev Apr 7, 2026

martinkilbinger requested a review from cailmdaley April 7, 2026 14:08

martinkilbinger added 4 commits April 8, 2026 13:32

v2.0 running until -j 32

f7e0334

Merge remote-tracking branch 'origin/v2.0' into v2.0

3bff6b3

towards 128

cd3e8b4

v2.0 running until job 256, UNIONS (external) cat

be58050

martinkilbinger added 4 commits April 21, 2026 14:50

Fixed missing packages in pyproject, and proper use of base Dockerfile

d7049ec

Merge remote-tracking branch 'origin/v2.0' into v2.0

6815ee1

trying to fix CD

1f6c0b5

Merge remote-tracking branch 'upstream/develop' into v2.0

2a31621

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V2.0#706

V2.0#706
martinkilbinger wants to merge 25 commits intoCosmoStat:developfrom
martinkilbinger:v2.0

martinkilbinger commented Apr 7, 2026

Uh oh!

cailmdaley commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

martinkilbinger commented Apr 7, 2026

Summary

Reviewer Checklist

Uh oh!

cailmdaley commented Apr 21, 2026

Overview

Blocking

Bugs & risks

Code quality

Performance

Tests

Positives worth naming

Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants