Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix define runs and allow storing of superruns #472

Merged
merged 68 commits into from Jul 15, 2021
Merged

Conversation

WenzDaniel
Copy link
Collaborator

@WenzDaniel WenzDaniel commented Jun 17, 2021

What is the problem / what does the code in this PR do
Superruns are composed of many smaller sub-runs. They can be used to group runs for some given logical structure. So far we have not used them since due to some changes in the past the function define_run in run_selection.py broke. With this PR I would like to achieve two things

  1. Fix define runs such that we can define superruns again.
  2. Add the possibility not only to create superruns on the run-metadata level, but also to "create" superruns by loading and rechunking the data of the subruns and storing them to disk.

Can you briefly describe how it works?
Regarding point 2. I had to add some changes. I extended our chunks module such that we can concatenate chunks of different run_ids. For this purpose I added the following changes (+ some other changes for run definition and selection):

Chunks:

  • Chunks of superruns have an additional field called subruns with min/max start/end of the all subrun chunks included in the corresponding superrun chunk
  • A new function transform_chunk_to_superrun_chunk which converts a regular chunk into a superrun chunk
  • concatanate chunks was also updated accordingly.

Context:

  • New option if true allow storing of new rechunked superrun chunks.
  • Omit lineage check for individual subruns this may collide with Fix lineage if per run default is not allowed #483
  • Fixed define_run according to latest changes. I left the code to make superruns from data although it does not fully work yet. If needed I will make the required changes in a later PR.
  • scan_runs added superrun support

Storage common:

  • Support for subrun chunk superrun chunk transformation

Can you give a minimal working example (or illustrate with a figure)?
I made a notebook in which I compare the performance of 400 test subruns compared to a single superrun. I also checked the performance impact when applying cuts via cut-plugins. In average we get a speed boost of about a factor of ~5-20.

Please also see the corresponding straxen PR: https://github.com/XENONnT/straxen/pull/554/files which includes an additional example notebook.

@WenzDaniel
Copy link
Collaborator Author

Please do not update with the master branch for the moment.

@WenzDaniel WenzDaniel marked this pull request as draft June 17, 2021 12:43
Copy link
Member

@JoranAngevaare JoranAngevaare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Daniel, I look forward to using this functionality, I have quite some questions and suggestions as per below.

docs/source/advanced/superrun.rst Show resolved Hide resolved
strax/chunk.py Outdated Show resolved Hide resolved
strax/chunk.py Outdated Show resolved Hide resolved
strax/chunk.py Outdated Show resolved Hide resolved
strax/chunk.py Outdated Show resolved Hide resolved
tests/test_superruns.py Show resolved Hide resolved
tests/test_superruns.py Show resolved Hide resolved
tests/test_superruns.py Outdated Show resolved Hide resolved
tests/test_superruns.py Outdated Show resolved Hide resolved
tests/test_superruns.py Outdated Show resolved Hide resolved
WenzDaniel and others added 6 commits July 13, 2021 14:06
Co-authored-by: Joran Angevaare <jorana@nikhef.nl>
Co-authored-by: Joran Angevaare <jorana@nikhef.nl>
Co-authored-by: Joran Angevaare <jorana@nikhef.nl>
Co-authored-by: Joran Angevaare <jorana@nikhef.nl>
Co-authored-by: Joran Angevaare <jorana@nikhef.nl>
Co-authored-by: Joran Angevaare <jorana@nikhef.nl>
@JoranAngevaare
Copy link
Member

@WenzDaniel maybe one thing I did not think of before, but would superruns allow for post-combining nv + tpc runs? I think / hope so, it would be extremely useful!

@WenzDaniel
Copy link
Collaborator Author

Uff hard question. I have to admit I do know. I never planed on creating superruns based on in time overlapping runs. But there are some additional challenges beside the technical once. E.g. the time alignment between the different detectors which is different for each subrun.

@WenzDaniel
Copy link
Collaborator Author

I addressed all outstanding comments. I will have a last look tomorrow morning with a fresh pair of eyes. After that we can merge.

@WenzDaniel
Copy link
Collaborator Author

Okay I am happy to merge if there are no other comments.

@JoranAngevaare
Copy link
Member

Nice! Thanks Daniel, for bookkeeping, can you mention what happens if we query in between subruns?

Copy link
Member

@JoranAngevaare JoranAngevaare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Daniel for the changes

Comment on lines +700 to +706
# Make subruns if they do not exist, since we do not
# want to store data twice in case we store the superrun
# we have to deactivate the storage converter mode.
stc_mode = self.context_config['storage_converter']
self.context_config['storage_converter'] = False
self.make(list(sub_run_spec.keys()), d)
self.context_config['storage_converter'] = stc_mode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you not wanted to remove this?

@@ -747,10 +759,15 @@ def concat_loader(*args, **kwargs):
to_compute[d] = p
for dep_d in p.depends_on:
check_cache(dep_d)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@@ -51,7 +51,8 @@ def __init__(self,
allow_lazy=True,
max_workers=None,
max_messages=4,
timeout=60):
timeout=60,
is_superrun=False,):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
is_superrun=False,):
is_superrun=False,
):

@JoranAngevaare
Copy link
Member

Feel free to merge 👍

@WenzDaniel WenzDaniel merged commit a43c55a into master Jul 15, 2021
@WenzDaniel WenzDaniel deleted the fix_define_runs branch July 15, 2021 09:02
JoranAngevaare added a commit that referenced this pull request Jul 15, 2021
@JoranAngevaare JoranAngevaare mentioned this pull request Jul 15, 2021
JoranAngevaare added a commit that referenced this pull request Jul 15, 2021
* Fix #472

* Update run_selection.py

* Update run_selection.py
@JoranAngevaare JoranAngevaare changed the title Fix define runs and allow storing of superuns Fix define runs and allow storing of superruns Jul 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants