Skip to content

Use safe concatenation.#758

Merged
riga merged 2 commits intomasterfrom
fix/safe_concaat
Nov 13, 2025
Merged

Use safe concatenation.#758
riga merged 2 commits intomasterfrom
fix/safe_concaat

Conversation

@riga
Copy link
Copy Markdown
Member

@riga riga commented Nov 13, 2025

In awkward, arrays that are the result of a mask being applied to a larger array are not copies. They appear to have a shorter length (in the desired dimension) but internally, they still seem to reference the original, unmodified data (as can be seen when printing the number of bytes via array.nbytes, which still reflects the original value).

Using these arrays in ak.concatenate operations results in the original data being concatenated as well, which is undesired in most use cases in cf as it needlessly leads to excessive memory footprints.

Wrapping these masked arrays by ak.Array or ak.copy does not apply the initial mask, but just returns a view or full copy of the array (which, again, still contains a reference to the unmodified data). Only ak.to_packed fully resolves masks (in addition to yet un-materialized columns, which it was originally intended for). Fortunately, it does not copy arrays that are already fully materialized.

This PR adds a util ak_concatenate_safe that does the same as ak.concatenate, but wraps each input array by ak.to_packed, avoiding all the aforementioned issues.

@riga riga requested a review from mafrahm November 13, 2025 14:03
@riga riga self-assigned this Nov 13, 2025
@riga riga added priority-high High priority stuff fix Fixes a bug labels Nov 13, 2025
@riga riga moved this to In Progress in columnflow v0.3.1 Nov 13, 2025
Copy link
Copy Markdown
Contributor

@mafrahm mafrahm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one question for my own understanding

@riga riga merged commit ffa1494 into master Nov 13, 2025
10 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in columnflow v0.3.1 Nov 13, 2025
@riga riga deleted the fix/safe_concaat branch November 13, 2025 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix Fixes a bug priority-high High priority stuff

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants