Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor concat and get data #430

Merged

Conversation

WenzDaniel
Copy link
Collaborator

@WenzDaniel WenzDaniel commented Apr 26, 2021

What is the problem / what does the code in this PR do
In this PR Tianyu and I (mostly Tianyu) refactored the get_hitlet_data function. Tianyus proposal makes the function much more flexible as we search no longer for the hitlet data via the record_i field but using time intervals instead. This change allows a much more general usage of the function. For this reason I added a "header" part which checks for different possibilities in calling this function. The corresponding tests have been added too.

Beside get_hitlet_data I refactored a bit concat_overlaping_hits as it was also part of the proposal. However, I did not manage to modify the function such, that it mimics the full behavior of the old method. But since the new method would lead to a performance drop of about 30% I decided to keep the old one.

A more thorough discussion can be found in this note.

Open points:

  • Updated plugins, set correct channel offset for mv and nv.
  • Updated hitlet splitting
  • Add some test for hitlet splitting.
  • Test process some data

@WenzDaniel WenzDaniel marked this pull request as ready for review April 27, 2021 08:22
Comment on lines 506 to 510
if np.all(data == 0):
res[:] = np.nan
else:
inter, amps = strax.highest_density_region(data,
fractions_desired, _buffer_size=_buffer_size)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I know one should not add more than one feature into a PR, but I am currently optimizing some splitting parameters and I run into this issue. strax.highest_density_region behaves as expected and raises an ValueError. However, it is not desired to stop the entire processing because of that. Hence, the function returns now np.nan in case the HDR is not defined.

Comment on lines +257 to +262
# Check that negative data does not raise:
res = strax.processing.hitlets.highest_density_region_width(np.array([0, -1, -2]),
np.array([0.5]),
fractionl_edges=True)
assert np.all(np.isnan(res)), 'For empty data HDR is not defined, should return np.nan!'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and added the corresponding test.

Comment on lines +370 to +387
def test_not_defined_get_fhwm():
# This is a specific unity test for some edge-cases in which the full
# width half maximum is not defined.
odd_hitlets = np.zeros(4, dtype=strax.hitlet_with_data_dtype(10))
odd_hitlets[0]['data'][:5] = [2, 2, 3, 2, 2]
odd_hitlets[0]['length'] = 5
odd_hitlets[1]['data'][:2] = [5, 5]
odd_hitlets[1]['length'] = 2
odd_hitlets[2]['length'] = 3
odd_hitlets[3]['data'][:3] = [-1, -2, 0]
odd_hitlets[3]['length'] = 3

for oh in odd_hitlets:
res = strax.get_fwxm(oh)
mes = (f'get_fxhm returned {res} for {oh["data"][:oh["length"]]}!'
'However, the FWHM is not defined and the return should be nan!'
)
assert np.all(np.isnan(res)), mes
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, Idea how this slipped through the last review. I removed now the indentation level and also added a test for negative waveforms.

Copy link
Member

@JoranAngevaare JoranAngevaare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Daniel, I think it looks very good indeed. I like these changes a lot.

Perhaps now is a good time to get rid of the _split_hitlets as it is the same as the _split_peaks function

strax/dtypes.py Show resolved Hide resolved
strax/dtypes.py Show resolved Hide resolved
strax/processing/hitlets.py Outdated Show resolved Hide resolved
strax/processing/hitlets.py Show resolved Hide resolved
strax/processing/hitlets.py Show resolved Hide resolved
strax/processing/hitlets.py Show resolved Hide resolved
strax/processing/hitlets.py Show resolved Hide resolved
strax/processing/peak_splitting.py Show resolved Hide resolved
strax/processing/peak_splitting.py Show resolved Hide resolved
strax/processing/peak_splitting.py Show resolved Hide resolved
@WenzDaniel
Copy link
Collaborator Author

Okay, I addressed your comments, besides the ones which will require a larger refactoring.

@WenzDaniel WenzDaniel merged commit e5b0b42 into AxFoundation:master Apr 30, 2021
@JoranAngevaare JoranAngevaare mentioned this pull request May 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants