New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix define runs and allow storing of superruns #472
Conversation
Please do not update with the master branch for the moment. |
… into fix_define_runs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Daniel, I look forward to using this functionality, I have quite some questions and suggestions as per below.
Co-authored-by: Joran Angevaare <jorana@nikhef.nl>
Co-authored-by: Joran Angevaare <jorana@nikhef.nl>
Co-authored-by: Joran Angevaare <jorana@nikhef.nl>
Co-authored-by: Joran Angevaare <jorana@nikhef.nl>
Co-authored-by: Joran Angevaare <jorana@nikhef.nl>
Co-authored-by: Joran Angevaare <jorana@nikhef.nl>
@WenzDaniel maybe one thing I did not think of before, but would superruns allow for post-combining nv + tpc runs? I think / hope so, it would be extremely useful! |
Uff hard question. I have to admit I do know. I never planed on creating superruns based on in time overlapping runs. But there are some additional challenges beside the technical once. E.g. the time alignment between the different detectors which is different for each subrun. |
… into fix_define_runs
I addressed all outstanding comments. I will have a last look tomorrow morning with a fresh pair of eyes. After that we can merge. |
Okay I am happy to merge if there are no other comments. |
Nice! Thanks Daniel, for bookkeeping, can you mention what happens if we query in between subruns? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Daniel for the changes
# Make subruns if they do not exist, since we do not | ||
# want to store data twice in case we store the superrun | ||
# we have to deactivate the storage converter mode. | ||
stc_mode = self.context_config['storage_converter'] | ||
self.context_config['storage_converter'] = False | ||
self.make(list(sub_run_spec.keys()), d) | ||
self.context_config['storage_converter'] = stc_mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you not wanted to remove this?
@@ -747,10 +759,15 @@ def concat_loader(*args, **kwargs): | |||
to_compute[d] = p | |||
for dep_d in p.depends_on: | |||
check_cache(dep_d) | |||
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -51,7 +51,8 @@ def __init__(self, | |||
allow_lazy=True, | |||
max_workers=None, | |||
max_messages=4, | |||
timeout=60): | |||
timeout=60, | |||
is_superrun=False,): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_superrun=False,): | |
is_superrun=False, | |
): |
Feel free to merge 👍 |
What is the problem / what does the code in this PR do
Superruns are composed of many smaller sub-runs. They can be used to group runs for some given logical structure. So far we have not used them since due to some changes in the past the function define_run in run_selection.py broke. With this PR I would like to achieve two things
Can you briefly describe how it works?
Regarding point 2. I had to add some changes. I extended our chunks module such that we can concatenate chunks of different run_ids. For this purpose I added the following changes (+ some other changes for run definition and selection):
Chunks:
transform_chunk_to_superrun_chunk
which converts a regular chunk into a superrun chunkContext:
Storage common:
Can you give a minimal working example (or illustrate with a figure)?
I made a notebook in which I compare the performance of 400 test subruns compared to a single superrun. I also checked the performance impact when applying cuts via cut-plugins. In average we get a speed boost of about a factor of ~5-20.
Please also see the corresponding straxen PR: https://github.com/XENONnT/straxen/pull/554/files which includes an additional example notebook.