Fix define runs and allow storing of superruns #472

WenzDaniel · 2021-06-17T12:38:30Z

What is the problem / what does the code in this PR do
Superruns are composed of many smaller sub-runs. They can be used to group runs for some given logical structure. So far we have not used them since due to some changes in the past the function define_run in run_selection.py broke. With this PR I would like to achieve two things

Fix define runs such that we can define superruns again.
Add the possibility not only to create superruns on the run-metadata level, but also to "create" superruns by loading and rechunking the data of the subruns and storing them to disk.

Can you briefly describe how it works?
Regarding point 2. I had to add some changes. I extended our chunks module such that we can concatenate chunks of different run_ids. For this purpose I added the following changes (+ some other changes for run definition and selection):

Chunks:

Chunks of superruns have an additional field called subruns with min/max start/end of the all subrun chunks included in the corresponding superrun chunk
A new function transform_chunk_to_superrun_chunk which converts a regular chunk into a superrun chunk
concatanate chunks was also updated accordingly.

Context:

New option if true allow storing of new rechunked superrun chunks.
Omit lineage check for individual subruns this may collide with Fix lineage if per run default is not allowed #483
Fixed define_run according to latest changes. I left the code to make superruns from data although it does not fully work yet. If needed I will make the required changes in a later PR.
scan_runs added superrun support

Storage common:

Support for subrun chunk superrun chunk transformation

Can you give a minimal working example (or illustrate with a figure)?
I made a notebook in which I compare the performance of 400 test subruns compared to a single superrun. I also checked the performance impact when applying cuts via cut-plugins. In average we get a speed boost of about a factor of ~5-20.

Please also see the corresponding straxen PR: https://github.com/XENONnT/straxen/pull/554/files which includes an additional example notebook.

WenzDaniel · 2021-06-17T12:39:07Z

Please do not update with the master branch for the moment.

… into fix_define_runs

…o datetime

…th superruns)

JoranAngevaare

Thanks Daniel, I look forward to using this functionality, I have quite some questions and suggestions as per below.

docs/source/advanced/superrun.rst

strax/chunk.py

tests/test_superruns.py

Co-authored-by: Joran Angevaare <jorana@nikhef.nl>

JoranAngevaare · 2021-07-13T12:57:18Z

@WenzDaniel maybe one thing I did not think of before, but would superruns allow for post-combining nv + tpc runs? I think / hope so, it would be extremely useful!

WenzDaniel · 2021-07-13T13:16:57Z

Uff hard question. I have to admit I do know. I never planed on creating superruns based on in time overlapping runs. But there are some additional challenges beside the technical once. E.g. the time alignment between the different detectors which is different for each subrun.

… into fix_define_runs

WenzDaniel · 2021-07-14T17:41:44Z

I addressed all outstanding comments. I will have a last look tomorrow morning with a fresh pair of eyes. After that we can merge.

WenzDaniel · 2021-07-15T05:49:22Z

Okay I am happy to merge if there are no other comments.

JoranAngevaare · 2021-07-15T08:24:19Z

Nice! Thanks Daniel, for bookkeeping, can you mention what happens if we query in between subruns?

JoranAngevaare

Thanks Daniel for the changes

JoranAngevaare · 2021-07-15T08:25:43Z

strax/context.py

+                # Make subruns if they do not exist, since we do not 
+                # want to store data twice in case we store the superrun
+                # we have to deactivate the storage converter mode.
+                stc_mode = self.context_config['storage_converter']
+                self.context_config['storage_converter'] = False
                self.make(list(sub_run_spec.keys()), d)
+                self.context_config['storage_converter'] = stc_mode


Did you not wanted to remove this?

JoranAngevaare · 2021-07-15T08:25:52Z

strax/context.py

@@ -747,10 +759,15 @@ def concat_loader(*args, **kwargs):
                to_compute[d] = p
                for dep_d in p.depends_on:
                    check_cache(dep_d)
-
+            


Suggested change

JoranAngevaare · 2021-07-15T08:26:19Z

strax/processor.py

@@ -51,7 +51,8 @@ def __init__(self,
                 allow_lazy=True,
                 max_workers=None,
                 max_messages=4,
-                 timeout=60):
+                 timeout=60,
+                 is_superrun=False,):


Suggested change

is_superrun=False,):

is_superrun=False,

):

JoranAngevaare · 2021-07-15T08:28:03Z

Feel free to merge 👍

* Fix #472 * Update run_selection.py * Update run_selection.py

JoranAngevaare and others added 14 commits February 23, 2021 00:39

Merge remote-tracking branch 'origin/master' into stable

37ddc4e

Merge remote-tracking branch 'origin/master' into stable

151986d

Merge branch 'master' into stable

6ed8ab9

Merge tag 'v0.15.0' into stable

9b1fc4c

Merge branch 'master' into stable

749c4e7

Merge tag 'v0.15.3' into stable

083bf96

Some further modifications

dba8f04

Add super runid to chunks and allow rechunking for superruns

343aabe

Update chunk boundaries in saver for superruns

60a38c8

TEST HACK NEEDS TO BE REMOVED

d9e7fc8

Renamed superrun_id

8d42167

Allow rechunking when storing superruns

adabbee

Changed superrun check

846eed6

Allow to merge superruns of the same data_kind

7655eb4

WenzDaniel mentioned this pull request Jun 17, 2021

Superruns (documentation) XENONnT/straxen#554

Merged

WenzDaniel marked this pull request as draft June 17, 2021 12:43

WenzDaniel and others added 13 commits June 18, 2021 01:22

Fixed merge-only

42b7539

Disable lineage check for subruns in superruns

43653d4

Skip plugin and lineage for subruns

1c782da

Fix subrun_id

0d905c1

Merge branch 'fix_define_runs' of https://github.com/AxFoundation/strax…

0cd10fa

… into fix_define_runs

Revert

2485820

Inherit plugin of subruns from superrun

65c1122

Set lineage check per subrun to true

7287eeb

Fixed run selection for superruns changed run metadata from unix ns t…

17e3b59

…o datetime

Write run_id as run name if does not exist (needed for select runs wi…

c4007cd

…th superruns)

Added some basic tests for superruns

89de311

Fixed is_store for multi_run and added skip lineage check for superruns

1ec9c24

Sort subrun_ids in superrun

d12daec

WenzDaniel and others added 2 commits July 13, 2021 12:27

Merge branch 'master' into fix_define_runs

a0b1dcf

Change to lineage chaching system

d547a71

JoranAngevaare reviewed Jul 13, 2021

View reviewed changes

WenzDaniel and others added 6 commits July 13, 2021 14:06

Update strax/chunk.py

f08dbd1

Co-authored-by: Joran Angevaare <jorana@nikhef.nl>

Update strax/context.py

78dcc4c

Co-authored-by: Joran Angevaare <jorana@nikhef.nl>

Update tests/test_superruns.py

16904c6

Co-authored-by: Joran Angevaare <jorana@nikhef.nl>

Update strax/context.py

9b23e67

Co-authored-by: Joran Angevaare <jorana@nikhef.nl>

Update strax/run_selection.py

ab17142

Co-authored-by: Joran Angevaare <jorana@nikhef.nl>

Update strax/storage/common.py

e865296

Co-authored-by: Joran Angevaare <jorana@nikhef.nl>

WenzDaniel and others added 7 commits July 14, 2021 10:37

Addressed simple comments

991d4ff

Merge branch 'fix_define_runs' of https://github.com/AxFoundation/strax…

58a20f6

… into fix_define_runs

Remove sorting from concatenate

429a9dd

Add chunk property is_superrun. Refactor subruns update

cd930ff

Add missing test that subruns are sorted in run-meta

cac3ed4

Changed continuity check for superruns

4fa3c22

Added test for chunk properties. Added chunk properties to doc

3e55622

Disable rechunking of superruns if set in context

6c86a0d

JoranAngevaare approved these changes Jul 15, 2021

View reviewed changes

WenzDaniel merged commit a43c55a into master Jul 15, 2021

WenzDaniel deleted the fix_define_runs branch July 15, 2021 09:02

JoranAngevaare added a commit that referenced this pull request Jul 15, 2021

Fix #472

dc430bc

JoranAngevaare mentioned this pull request Jul 15, 2021

Fix #472 #488

Merged

JoranAngevaare added a commit that referenced this pull request Jul 15, 2021

Fix #472 (#488)

929e1ed

* Fix #472 * Update run_selection.py * Update run_selection.py

JoranAngevaare changed the title ~~Fix define runs and allow storing of superuns~~ Fix define runs and allow storing of superruns Jul 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix define runs and allow storing of superruns #472

Fix define runs and allow storing of superruns #472

WenzDaniel commented Jun 17, 2021 •

edited

WenzDaniel commented Jun 17, 2021

JoranAngevaare left a comment

JoranAngevaare commented Jul 13, 2021

WenzDaniel commented Jul 13, 2021

WenzDaniel commented Jul 14, 2021

WenzDaniel commented Jul 15, 2021

JoranAngevaare commented Jul 15, 2021

JoranAngevaare left a comment

JoranAngevaare Jul 15, 2021

JoranAngevaare Jul 15, 2021

JoranAngevaare Jul 15, 2021

JoranAngevaare commented Jul 15, 2021

Fix define runs and allow storing of superruns #472

Fix define runs and allow storing of superruns #472

Conversation

WenzDaniel commented Jun 17, 2021 • edited

WenzDaniel commented Jun 17, 2021

JoranAngevaare left a comment

Choose a reason for hiding this comment

JoranAngevaare commented Jul 13, 2021

WenzDaniel commented Jul 13, 2021

WenzDaniel commented Jul 14, 2021

WenzDaniel commented Jul 15, 2021

JoranAngevaare commented Jul 15, 2021

JoranAngevaare left a comment

Choose a reason for hiding this comment

JoranAngevaare Jul 15, 2021

Choose a reason for hiding this comment

JoranAngevaare Jul 15, 2021

Choose a reason for hiding this comment

JoranAngevaare Jul 15, 2021

Choose a reason for hiding this comment

JoranAngevaare commented Jul 15, 2021

WenzDaniel commented Jun 17, 2021 •

edited