Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrapper for get_iter to get accumulated results. #253

Merged
merged 14 commits into from Apr 28, 2020

Conversation

WenzDaniel
Copy link
Collaborator

This function allows to get an array with accumulated results for the specified field names. All other redundant fields are returned only a single time. A typical use case of this feature is e.g. the hitfinder acceptance, for which one is rather interested in how many hits did I find per threshold per pmt per calibration run rather than per chunk.

Note: The doc string has to be finished. I saw in the docstring of get_iter and get_array the {get_docs}-statement, but I do not have any clue how this works.

Copy link
Member

@JelleAalbers JelleAalbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Daniel, looks like a good addition, thanks! I added a few comments with suggestions and possible issues, but happy to merge afterwards.

strax/context.py Outdated Show resolved Hide resolved
strax/context.py Outdated Show resolved Hide resolved
strax/context.py Outdated Show resolved Hide resolved
strax/context.py Outdated Show resolved Hide resolved
strax/context.py Outdated
Comment on lines 952 to 957
chunk_res = function(data, fields)
for key, value in chunk_res.items():
if not nchunks:
res[key] = value
else:
res[key] += value
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I implemented now your suggestions. Though, I did not see how I could implement the user specific function argument that one could compute e.g. the std or mean of a parameter over the entire run. I think here you still have to use get_iter.

@JelleAalbers
Copy link
Member

JelleAalbers commented Apr 27, 2020

Thanks Daniel! I also made a few changes, see below. I'm realizing this is quite a versatile analysis tool... surprised we never added something like this to strax earlier.

  • Support more function types. They can now either return:
    • A dict or record array (fields will be accumulated as before);
    • A flat array or scalar (result will be accumulated under 'result');
    • None (nothing is accumulated, the function will just be run over the data).
  • function and fields are now optional: if you do not pass them, we will use the identity function and accumulate all fields
  • Added the n_rows field to the output
  • By default the function does not take fields anymore; though you can pass a flag to re-enable that behaviour.
  • Storing the first entry of non-accumulated fields is now also optional and off by default.
  • Minor changes for robustness, e.g. if fields is a string we will interpret it as a 1-tuple string, if the first chunk is empty we won't try to get the first row from it, etc.

Here are a few examples, assuming this setup:

import multihist
import straxen
import matplotlib.pyplot as plt

st = straxen.contexts.xenon1t_dali()
run_id = '180215_1029'

Count pulses in a channel

# Count pulses in a channel
acc = st.accumulate(
    run_id, 
    'records',
    selection_str='(channel == 126) & (record_i == 0)')
acc['n_rows']

gives 7355. Note the record_i == 0 selection to make sure multi-record pulses are counted only once.

Very rough pulse shape

# obtain acc as before
plt.plot(acc['data'] / acc['n_rows'])

gives

index

The rise at late times should due to records that are more than just one single PE, e.g. in S1s or S2s, or photons with afterpulses. A (slightly) better pulse shape analysis is here, a much better once is here.

Make a (channel, height) 2D histogram for lone hits

mh = multihist.Histdd(dimensions=(
    ('channel', np.arange(249) -0.5),
    ('height', np.linspace(0, 200, 100))))

st.accumulate(run_id, 
              'lone_hits', 
              function=mh.add)
mh.plot()

index_1

This shows the use of supporting functions that don't return anything.

Plot the lone hit rate per channel

h = multihist.Hist1d(bins=np.arange(249))
acc = st.accumulate(run_id, 
                    'lone_hits', 
                    function=lambda x : h.add(x['channel']))
h /= (acc['end'] - acc['start']) / 1e9
straxen.plot_pmts(h.histogram, log_scale=True, vmin=20,
                  xenon1t=True, show_tpc=False, r=52, 
                  label='Lone hit rate [Hz]')

index_2

Standard deviation of peak areas

Approximately (average of stdevs per chunk):

acc = st.accumulate(run_id, 'peaks', function=lambda x: x['area'].std())
acc['result'] / acc['n_chunks']

gives 32338.845703125.

Or exactly in two passes:

acc = st.accumulate(run_id, 'peaks')
mean_area = acc['area'] / acc['n_rows']

sum_sq = st.accumulate(run_id, 'peaks', function=lambda x: (x['area'] - mean_area)**2)['result']
(sum_sq/acc['n_rows'])**0.5

gives 32793.147536563476.

The last two examples show the use of supporting functions that return a flat array.

Copy link
Member

@JelleAalbers JelleAalbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're OK with my changes Daniel, I'm ready to merge

@JelleAalbers
Copy link
Member

I keep changing my mind about the name... ;-) How about just accumulate instead of the somewhat awkward get_accumulated? I know some other methods start with get_, but accumulate is an actual verb while array and df are not.

@WenzDaniel
Copy link
Collaborator Author

Wow I am impressed. I like the changes, good job. :-) For me it is fine and we can merge, as I see you already extensively tested the function ;-)

About the name, I totally see your points and thought the same. I think it is fine if we just go with accumulate but we should then advertise it a bit. Otherwise I could imagine people will simply overlook it.

@JelleAalbers
Copy link
Member

Thanks Daniel! Yes, we would need to make sure others know about this, maybe by making a new straxen tutorial or extending one of the old ones.

@JelleAalbers JelleAalbers merged commit f842fea into AxFoundation:master Apr 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants