Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per-datatype rechunking control #272

Merged
merged 2 commits into from Jun 15, 2020
Merged

Per-datatype rechunking control #272

merged 2 commits into from Jun 15, 2020

Conversation

JelleAalbers
Copy link
Member

This allows a multi-output plugin to specify which of its output dtypes should be rechunked. You can set the rechunk_on_save attribute to:

  • True: rechunk all outputs
  • False: do not rechunk anything
  • (immutable)dict mapping dtypes -> True/False, per-datatype control.

The main use case is PulseProcessing in straxen, which outputs small monitoring datatypes (pulse_counts, veto_regions) besides the big records dtype. We can't rechunk records since that would make their expensive saving operation non-parallelizable, but we should rechunk pulse_counts and veto_regions to avoid saving many 3 kB files that give rucio a headache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant