Skip to content

Add chunk_mode for on-the-fly grid chunking + wiki updates#35

Merged
oksuzian merged 1 commit intoMu2e:mainfrom
oksuzian:field-off-option
Apr 22, 2026
Merged

Add chunk_mode for on-the-fly grid chunking + wiki updates#35
oksuzian merged 1 commit intoMu2e:mainfrom
oksuzian:field-off-option

Conversation

@oksuzian
Copy link
Copy Markdown
Collaborator

json2jobdef: new input_data shape {: {"chunk_lines": N}} counts lines and sets njobs at submit time; stores tbs.chunk_mode so each grid worker sed-extracts its own slice from the cvmfs source at runtime — full N-way parallelism with zero chunk staging.

prod_utils: runtime chunk extraction with shlex-quoted sed; file:// protocol for dir: inloc; push_data takes track_parents:bool so dir: parents (not SAM-registered) don't fail pushOutput's printJson --parents.

jobdef: PBISequence source type now requires either inputs+merge_factor or chunk_mode at jobdef time, preventing the cryptic fileNames:@nil failure that would otherwise surface inside mu2e.

Tests: 21 new tests (181 total) covering chunk_mode configuration, source.runNumber sequencing, and event_id_per_index evaluation.

Wiki: chunk_mode documented as canonical PBI shape; input-data- chunk-mode page added; overview refreshed with scale-test open question; lint-2026-04-22 filed.

json2jobdef: new input_data shape {<cvmfs-path>: {"chunk_lines": N}}
counts lines and sets njobs at submit time; stores tbs.chunk_mode so
each grid worker sed-extracts its own slice from the cvmfs source at
runtime — full N-way parallelism with zero chunk staging.

prod_utils: runtime chunk extraction with shlex-quoted sed; file://
protocol for dir: inloc; push_data takes track_parents:bool so dir:
parents (not SAM-registered) don't fail pushOutput's printJson
--parents.

jobdef: PBISequence source type now requires either inputs+merge_factor
or chunk_mode at jobdef time, preventing the cryptic fileNames:@nil
failure that would otherwise surface inside mu2e.

Tests: 21 new tests (181 total) covering chunk_mode configuration,
source.runNumber sequencing, and event_id_per_index evaluation.

Wiki: chunk_mode documented as canonical PBI shape; input-data-
chunk-mode page added; overview refreshed with scale-test open
question; lint-2026-04-22 filed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@oksuzian oksuzian merged commit dcd25fd into Mu2e:main Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant