Add chunk_mode for on-the-fly grid chunking + wiki updates#35
Merged
Add chunk_mode for on-the-fly grid chunking + wiki updates#35
Conversation
json2jobdef: new input_data shape {<cvmfs-path>: {"chunk_lines": N}}
counts lines and sets njobs at submit time; stores tbs.chunk_mode so
each grid worker sed-extracts its own slice from the cvmfs source at
runtime — full N-way parallelism with zero chunk staging.
prod_utils: runtime chunk extraction with shlex-quoted sed; file://
protocol for dir: inloc; push_data takes track_parents:bool so dir:
parents (not SAM-registered) don't fail pushOutput's printJson
--parents.
jobdef: PBISequence source type now requires either inputs+merge_factor
or chunk_mode at jobdef time, preventing the cryptic fileNames:@nil
failure that would otherwise surface inside mu2e.
Tests: 21 new tests (181 total) covering chunk_mode configuration,
source.runNumber sequencing, and event_id_per_index evaluation.
Wiki: chunk_mode documented as canonical PBI shape; input-data-
chunk-mode page added; overview refreshed with scale-test open
question; lint-2026-04-22 filed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
json2jobdef: new input_data shape {: {"chunk_lines": N}} counts lines and sets njobs at submit time; stores tbs.chunk_mode so each grid worker sed-extracts its own slice from the cvmfs source at runtime — full N-way parallelism with zero chunk staging.
prod_utils: runtime chunk extraction with shlex-quoted sed; file:// protocol for dir: inloc; push_data takes track_parents:bool so dir: parents (not SAM-registered) don't fail pushOutput's printJson --parents.
jobdef: PBISequence source type now requires either inputs+merge_factor or chunk_mode at jobdef time, preventing the cryptic fileNames:@nil failure that would otherwise surface inside mu2e.
Tests: 21 new tests (181 total) covering chunk_mode configuration, source.runNumber sequencing, and event_id_per_index evaluation.
Wiki: chunk_mode documented as canonical PBI shape; input-data- chunk-mode page added; overview refreshed with scale-test open question; lint-2026-04-22 filed.