-
Notifications
You must be signed in to change notification settings - Fork 228
Add error messaging around use of get data in templates #3501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
zm711
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix my own typo haha.
h-mayorquin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
JoeZiminski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! am currently updating some old teaching code to sorting analyzer so appreciating the utility of such additions 🙏
| AnalyzerExtension that select some random spikes. | ||
| AnalyzerExtension that select somes random spikes. | ||
| This allows for a subsampling of spikes for further calculations and is important | ||
| for managing that amount of memory and speed of computation in the analyzer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very nice addition, could 'for further calculations' and the below ' This will be used by the waveforms/templates extensions.' be combined and emphasise that this choice could potentially have important consequences for results. e.g. 'The samples spikes will be used for calculating waveforms and templates and as such determine many downstream parameters (e.g. quality metrics). Therefore it is important that spikes a sufficient number of spikes are sampled and that these are distributed evenly through the dataset'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm torn. I was just trying to clarify. I prefer
one line short summary
params
return
examples
notes
And do that big explanation in the notes. The statement you're making I would put into notes, but I didn't want to change stuff too much although since I made the diff it's my fault you looked.
The real point of this PR is just to improve error messaging. I think we could improve further the docstrings etc in a separate PR (and this is a reminder for me to focus on my task so we don't get bogged down on other stuff for a PR with a specific purpose :) ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good, happy to leave this for another day. It's easier said than done to stick religiously to one change in a PR, there are always some appealing changes to be made that inevitably catch your eye!
| error_msg = ( | ||
| f"You have entered an operator {operator} in your `operators` argument which is " | ||
| f"not supported. Please use any of ['average', 'std', 'median', 'mad'] instead." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nice, in that case discard above comment 😆
| ] | ||
| if len(bad_operator_list) > 0: | ||
| raise ValueError( | ||
| f"Computing templates with operators {bad_operator_list} requires the 'waveforms' extension" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting I assumed waveforms always needed to be computed for templates, what other ways ways of doing it are there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should dig into the templates code. Alessio + Sam ( I think it was just them but maybe Heberto too) developed an accumulator method that uses the waveforms to make an average without saving them so you would
analyzer.compute(['random_spikes', 'templates']) and it will read the waveforms from random spikes while making the templates then discard them to save on memory. I think the only think it breaks would be doing PCA later, but it is way less storage intensive since you don't save all the extra waveforms. It is limited in the types of operators it can do though as you can see from the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep this accumulator was a great stuf during the analyzer dev.
the only drawback is that MAD cannot be computed that way. but saving ram and disk space is cool.
you can see the idea in waveform_tools.py
every worker accumulate snippet in parralel and the sum + divide is done at teh end.
this is quite fast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool! will check that out
| bad_operator_list = [ | ||
| operator for operator in self.params["operators"] if operator not in ("average", "std") | ||
| ] | ||
| if len(bad_operator_list) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe if any(bad_operator_list) is slightly more readable (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is stylistic. I don't use any that often in code I think checking the length of the list is just my default. I don't know what others think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's clear enough :)
Co-authored-by: Joe Ziminski <55797454+JoeZiminski@users.noreply.github.com>
|
ok for me. |
Joe's comments have been addressed
This is a light changing of the error messaging in the core extension file along with some docstring touch ups for numpydoc.
Related to #3495