New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor concat and get data #430
Refactor concat and get data #430
Conversation
strax/processing/hitlets.py
Outdated
if np.all(data == 0): | ||
res[:] = np.nan | ||
else: | ||
inter, amps = strax.highest_density_region(data, | ||
fractions_desired, _buffer_size=_buffer_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I know one should not add more than one feature into a PR, but I am currently optimizing some splitting parameters and I run into this issue. strax.highest_density_region
behaves as expected and raises an ValueError. However, it is not desired to stop the entire processing because of that. Hence, the function returns now np.nan in case the HDR is not defined.
# Check that negative data does not raise: | ||
res = strax.processing.hitlets.highest_density_region_width(np.array([0, -1, -2]), | ||
np.array([0.5]), | ||
fractionl_edges=True) | ||
assert np.all(np.isnan(res)), 'For empty data HDR is not defined, should return np.nan!' | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and added the corresponding test.
def test_not_defined_get_fhwm(): | ||
# This is a specific unity test for some edge-cases in which the full | ||
# width half maximum is not defined. | ||
odd_hitlets = np.zeros(4, dtype=strax.hitlet_with_data_dtype(10)) | ||
odd_hitlets[0]['data'][:5] = [2, 2, 3, 2, 2] | ||
odd_hitlets[0]['length'] = 5 | ||
odd_hitlets[1]['data'][:2] = [5, 5] | ||
odd_hitlets[1]['length'] = 2 | ||
odd_hitlets[2]['length'] = 3 | ||
odd_hitlets[3]['data'][:3] = [-1, -2, 0] | ||
odd_hitlets[3]['length'] = 3 | ||
|
||
for oh in odd_hitlets: | ||
res = strax.get_fwxm(oh) | ||
mes = (f'get_fxhm returned {res} for {oh["data"][:oh["length"]]}!' | ||
'However, the FWHM is not defined and the return should be nan!' | ||
) | ||
assert np.all(np.isnan(res)), mes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, Idea how this slipped through the last review. I removed now the indentation level and also added a test for negative waveforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Daniel, I think it looks very good indeed. I like these changes a lot.
Perhaps now is a good time to get rid of the _split_hitlets
as it is the same as the _split_peaks
function
Co-authored-by: Joran Angevaare <joranangevaare@gmail.com>
…zDaniel/strax into refactor_concat_and_get_data
Okay, I addressed your comments, besides the ones which will require a larger refactoring. |
What is the problem / what does the code in this PR do
In this PR Tianyu and I (mostly Tianyu) refactored the get_hitlet_data function. Tianyus proposal makes the function much more flexible as we search no longer for the hitlet data via the
record_i
field but using time intervals instead. This change allows a much more general usage of the function. For this reason I added a "header" part which checks for different possibilities in calling this function. The corresponding tests have been added too.Beside get_hitlet_data I refactored a bit concat_overlaping_hits as it was also part of the proposal. However, I did not manage to modify the function such, that it mimics the full behavior of the old method. But since the new method would lead to a performance drop of about 30% I decided to keep the old one.
A more thorough discussion can be found in this note.
Open points: