New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrapper for get_iter to get accumulated results. #253
Conversation
…l be used with the old nVETO DAQreader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Daniel, looks like a good addition, thanks! I added a few comments with suggestions and possible issues, but happy to merge afterwards.
strax/context.py
Outdated
chunk_res = function(data, fields) | ||
for key, value in chunk_res.items(): | ||
if not nchunks: | ||
res[key] = value | ||
else: | ||
res[key] += value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I implemented now your suggestions. Though, I did not see how I could implement the user specific function argument that one could compute e.g. the std or mean of a parameter over the entire run. I think here you still have to use get_iter.
Thanks Daniel! I also made a few changes, see below. I'm realizing this is quite a versatile analysis tool... surprised we never added something like this to strax earlier.
Here are a few examples, assuming this setup: import multihist
import straxen
import matplotlib.pyplot as plt
st = straxen.contexts.xenon1t_dali()
run_id = '180215_1029' Count pulses in a channel # Count pulses in a channel
acc = st.accumulate(
run_id,
'records',
selection_str='(channel == 126) & (record_i == 0)')
acc['n_rows'] gives 7355. Note the record_i == 0 selection to make sure multi-record pulses are counted only once. Very rough pulse shape # obtain acc as before
plt.plot(acc['data'] / acc['n_rows']) gives The rise at late times should due to records that are more than just one single PE, e.g. in S1s or S2s, or photons with afterpulses. A (slightly) better pulse shape analysis is here, a much better once is here. Make a (channel, height) 2D histogram for lone hits mh = multihist.Histdd(dimensions=(
('channel', np.arange(249) -0.5),
('height', np.linspace(0, 200, 100))))
st.accumulate(run_id,
'lone_hits',
function=mh.add)
mh.plot() This shows the use of supporting functions that don't return anything. Plot the lone hit rate per channel h = multihist.Hist1d(bins=np.arange(249))
acc = st.accumulate(run_id,
'lone_hits',
function=lambda x : h.add(x['channel']))
h /= (acc['end'] - acc['start']) / 1e9
straxen.plot_pmts(h.histogram, log_scale=True, vmin=20,
xenon1t=True, show_tpc=False, r=52,
label='Lone hit rate [Hz]') Standard deviation of peak areas Approximately (average of stdevs per chunk): acc = st.accumulate(run_id, 'peaks', function=lambda x: x['area'].std())
acc['result'] / acc['n_chunks'] gives 32338.845703125. Or exactly in two passes:
gives 32793.147536563476. The last two examples show the use of supporting functions that return a flat array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're OK with my changes Daniel, I'm ready to merge
I keep changing my mind about the name... ;-) How about just |
Wow I am impressed. I like the changes, good job. :-) For me it is fine and we can merge, as I see you already extensively tested the function ;-) About the name, I totally see your points and thought the same. I think it is fine if we just go with accumulate but we should then advertise it a bit. Otherwise I could imagine people will simply overlook it. |
Thanks Daniel! Yes, we would need to make sure others know about this, maybe by making a new straxen tutorial or extending one of the old ones. |
This function allows to get an array with accumulated results for the specified field names. All other redundant fields are returned only a single time. A typical use case of this feature is e.g. the hitfinder acceptance, for which one is rather interested in how many hits did I find per threshold per pmt per calibration run rather than per chunk.
Note: The doc string has to be finished. I saw in the docstring of get_iter and get_array the {get_docs}-statement, but I do not have any clue how this works.