Wrapper for get_iter to get accumulated results. #253

WenzDaniel · 2020-04-08T08:11:45Z

This function allows to get an array with accumulated results for the specified field names. All other redundant fields are returned only a single time. A typical use case of this feature is e.g. the hitfinder acceptance, for which one is rather interested in how many hits did I find per threshold per pmt per calibration run rather than per chunk.

Note: The doc string has to be finished. I saw in the docstring of get_iter and get_array the {get_docs}-statement, but I do not have any clue how this works.

…l be used with the old nVETO DAQreader.

JelleAalbers

Hi Daniel, looks like a good addition, thanks! I added a few comments with suggestions and possible issues, but happy to merge afterwards.

strax/context.py

WenzDaniel · 2020-04-20T14:56:08Z

strax/context.py

+            chunk_res = function(data, fields)
+            for key, value in chunk_res.items():
+                if not nchunks:
+                    res[key] = value
+                else:
+                    res[key] += value


I think I implemented now your suggestions. Though, I did not see how I could implement the user specific function argument that one could compute e.g. the std or mean of a parameter over the entire run. I think here you still have to use get_iter.

JelleAalbers · 2020-04-27T16:46:02Z

Thanks Daniel! I also made a few changes, see below. I'm realizing this is quite a versatile analysis tool... surprised we never added something like this to strax earlier.

Support more function types. They can now either return:
- A dict or record array (fields will be accumulated as before);
- A flat array or scalar (result will be accumulated under 'result');
- None (nothing is accumulated, the function will just be run over the data).
function and fields are now optional: if you do not pass them, we will use the identity function and accumulate all fields
Added the n_rows field to the output
By default the function does not take fields anymore; though you can pass a flag to re-enable that behaviour.
Storing the first entry of non-accumulated fields is now also optional and off by default.
Minor changes for robustness, e.g. if fields is a string we will interpret it as a 1-tuple string, if the first chunk is empty we won't try to get the first row from it, etc.

Here are a few examples, assuming this setup:

import multihist
import straxen
import matplotlib.pyplot as plt

st = straxen.contexts.xenon1t_dali()
run_id = '180215_1029'

Count pulses in a channel

# Count pulses in a channel
acc = st.accumulate(
    run_id, 
    'records',
    selection_str='(channel == 126) & (record_i == 0)')
acc['n_rows']

gives 7355. Note the record_i == 0 selection to make sure multi-record pulses are counted only once.

Very rough pulse shape

# obtain acc as before
plt.plot(acc['data'] / acc['n_rows'])

gives

The rise at late times should due to records that are more than just one single PE, e.g. in S1s or S2s, or photons with afterpulses. A (slightly) better pulse shape analysis is here, a much better once is here.

Make a (channel, height) 2D histogram for lone hits

mh = multihist.Histdd(dimensions=(
    ('channel', np.arange(249) -0.5),
    ('height', np.linspace(0, 200, 100))))

st.accumulate(run_id, 
              'lone_hits', 
              function=mh.add)
mh.plot()

This shows the use of supporting functions that don't return anything.

Plot the lone hit rate per channel

h = multihist.Hist1d(bins=np.arange(249))
acc = st.accumulate(run_id, 
                    'lone_hits', 
                    function=lambda x : h.add(x['channel']))
h /= (acc['end'] - acc['start']) / 1e9
straxen.plot_pmts(h.histogram, log_scale=True, vmin=20,
                  xenon1t=True, show_tpc=False, r=52, 
                  label='Lone hit rate [Hz]')

Standard deviation of peak areas

Approximately (average of stdevs per chunk):

acc = st.accumulate(run_id, 'peaks', function=lambda x: x['area'].std())
acc['result'] / acc['n_chunks']

gives 32338.845703125.

Or exactly in two passes:

acc = st.accumulate(run_id, 'peaks')
mean_area = acc['area'] / acc['n_rows']

sum_sq = st.accumulate(run_id, 'peaks', function=lambda x: (x['area'] - mean_area)**2)['result']
(sum_sq/acc['n_rows'])**0.5

gives 32793.147536563476.

The last two examples show the use of supporting functions that return a flat array.

JelleAalbers

If you're OK with my changes Daniel, I'm ready to merge

JelleAalbers · 2020-04-27T16:57:57Z

I keep changing my mind about the name... ;-) How about just accumulate instead of the somewhat awkward get_accumulated? I know some other methods start with get_, but accumulate is an actual verb while array and df are not.

WenzDaniel · 2020-04-28T06:52:24Z

Wow I am impressed. I like the changes, good job. :-) For me it is fine and we can merge, as I see you already extensively tested the function ;-)

About the name, I totally see your points and thought the same. I think it is fine if we just go with accumulate but we should then advertise it a bit. Otherwise I could imagine people will simply overlook it.

JelleAalbers · 2020-04-28T07:24:12Z

Thanks Daniel! Yes, we would need to make sure others know about this, maybe by making a new straxen tutorial or extending one of the old ones.

WenzDaniel and others added 6 commits March 11, 2020 13:51

Last push before changing to new strax version. This version can stil…

4351bb4

…l be used with the old nVETO DAQreader.

Merge remote-tracking branch 'upstream/master'

2b0a89b

Merge remote-tracking branch 'upstream/master'

c5aa0c6

Merge remote-tracking branch 'upstream/master'

7667df6

Merge branch 'master' of https://github.com/WenzDaniel/strax

2042023

Added new get_accum_array which allows to get an accumlated result.

26f7ca6

JelleAalbers reviewed Apr 13, 2020

View reviewed changes

strax/context.py Outdated Show resolved Hide resolved

strax/context.py Outdated Show resolved Hide resolved

strax/context.py Outdated Show resolved Hide resolved

strax/context.py Outdated Show resolved Hide resolved

WenzDaniel added 3 commits April 20, 2020 12:42

Extended docstring, changed return to dict

3694784

Changed the way the function is called.

2b2c6d3

Updated the way the result is accumulated

3dfab51

WenzDaniel commented Apr 20, 2020

View reviewed changes

JelleAalbers added 4 commits April 27, 2020 15:52

Defaults, handle empty data

a8e9990

Merge branch 'master' into update_get_accumulate

cd93e9a

Support more function types

7696731

Remove unnecessary pass to make codefactor happy again

26cb15e

JelleAalbers approved these changes Apr 27, 2020

View reviewed changes

Changed name

37f4047

JelleAalbers merged commit f842fea into AxFoundation:master Apr 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrapper for get_iter to get accumulated results. #253

Wrapper for get_iter to get accumulated results. #253

WenzDaniel commented Apr 8, 2020

JelleAalbers left a comment •

edited

WenzDaniel Apr 20, 2020

JelleAalbers commented Apr 27, 2020 •

edited

JelleAalbers left a comment

JelleAalbers commented Apr 27, 2020

WenzDaniel commented Apr 28, 2020

JelleAalbers commented Apr 28, 2020

Wrapper for get_iter to get accumulated results. #253

Wrapper for get_iter to get accumulated results. #253

Conversation

WenzDaniel commented Apr 8, 2020

JelleAalbers left a comment • edited

Choose a reason for hiding this comment

WenzDaniel Apr 20, 2020

Choose a reason for hiding this comment

JelleAalbers commented Apr 27, 2020 • edited

JelleAalbers left a comment

Choose a reason for hiding this comment

JelleAalbers commented Apr 27, 2020

WenzDaniel commented Apr 28, 2020

JelleAalbers commented Apr 28, 2020

JelleAalbers left a comment •

edited

JelleAalbers commented Apr 27, 2020 •

edited