New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH, MRG] Add post-hoc modifications #270
Conversation
what does this fix about #268? :-) Did I miss something? |
Oops sorry! Fixed now! |
@@ -1392,3 +1417,66 @@ def plot_epochs(self, epochs, scalings=None, title=''): | |||
epochs=epochs, | |||
epoch_colors=epoch_colors, scalings=scalings, | |||
title='') | |||
|
|||
def drop_epochs_with_adjacent_channel_interpolation( | |||
self, ch_adjacency, ch_names): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make this a function, not method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not operate on the reject log? This seems cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hold on ... I am not done with review. I think I accidentally forgot to "start a review" so you start getting comments until I am finished. Still forming my thoughts :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I think the drop_bad_xxx
methods begin to venture into the territory of new science. I'm worried about recommending algorithms to users that have not been properly tested on various datasets. The idea of overriding the reject_log
and IO for that class is great though
np.nansum(bads, axis=0) > self.labels.shape[0] * interp_thresh)[0] | ||
self.labels[:, interp] = 2 | ||
|
||
def drop_epochs_with_bads(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems very aggressive ... I propose leaving this out of the API for now. We can revisit it later if there is more demand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the harm in making it an option. In a use case where you have lots of noisy data (as I do), I think it's better to be on the safe side. I think the false positives make more sense with MEG where the ratio of channels to epochs is usually much greater. In EEG, it seems to make more sense to be to reject any channel that's a bit off.
# %% | ||
# As a last, optional step, we can do post-doc corrections to be even | ||
# more stringent in order to favor rejecting epochs over keeping channels | ||
# marked bad by ``autoreject`` that, based on cross-validation, are | ||
# considered false positive bad channels. | ||
|
||
reject_log = ar.get_reject_log(epochs) | ||
|
||
# %% | ||
# We can observe that, for a few sensors, | ||
# there are very few green squares and mostly red and blue squares. | ||
# The autoreject algorithm is keeping the data represented by a | ||
# red square and considering it a false positive based on the | ||
# cross-validation scores. For consistency, we can choose to do a | ||
# post-hoc modification to mark the entire channel for interpolation. | ||
# We would do this for consistency because whether the square is | ||
# blue or red depends on the other sensors in that epoch; if there | ||
# are worse sensors in that epoch, the sensor will be marked red as | ||
# a false positive and, if not, it will be marked blue. In some cases | ||
# it may be preferable to interpolate this channel to avoid this effect. | ||
|
||
# if more than half bad (red or blue), interpolate | ||
reject_log.interpolate_bads(interp_thresh=0.5) | ||
reject_log.plot('horizontal') | ||
|
||
# %% | ||
# Additionally we can chose to drop instead of interpolating any epoch | ||
# with adjacent bad channels as this will effect the accuracy of the | ||
# interpolation. Similarly, this should be done on a case-by-case basis | ||
# based on the montage of sensors and the analysis. | ||
|
||
ch_adjacency, ch_names = mne.channels.find_ch_adjacency(epochs.info, 'eeg') | ||
reject_log.drop_epochs_with_adjacent_channel_interpolation( | ||
ch_adjacency, ch_names) | ||
reject_log.plot('horizontal') | ||
|
||
# %% | ||
# Lastly, if we want to avoid any bad sensors with being left in our data, | ||
# even on a single epoch, we can drop all epochs with a red square | ||
# for a bad sensor. | ||
|
||
reject_log.drop_epochs_with_bads() | ||
reject_log.plot('horizontal') | ||
|
||
# %% | ||
# Finally, we can apply the modified reject log to our epoch data. Note | ||
# that since this example data is not very clean, we have lost most of | ||
# our epochs; applying these post-hoc corrections should be done as | ||
# needed in cases where keeping potentially bad data is not desirable. | ||
|
||
epochs_ar = ar.transform(epochs, reject_log=reject_log) | ||
plt.figure() | ||
plt.plot(evoked_bad.times, evoked_bad.data.T * 1e6, 'r', zorder=-1) | ||
epochs_ar.average().plot(axes=plt.gca()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message in the rest of the tutorial seemed to suggest that you need to conserve data, but now you are proposing methods that are more aggressively dropping data, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the tutorial, I would suggest something simple like:
reject_log.labels['EEG 001', :] = 2
then pass:
ar.transform(epochs, reject_log=reject_log)
I feel this is more broadly applicable and would cover your use cases as well as any future ones. Not super convinced about drop_epochs_xxx
because they drop so many trials.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message in the rest of the tutorial seemed to suggest that you need to conserve data, but now you are proposing methods that are more aggressively dropping data, no?
I think this is the big point that justifies the posthoc corrections; they are only more stringent so you'll never decrease the reliability of the results so long as you compensate by collecting more trials.
I think there is also some nuance; we were conserving data with blinks which, in my opinion and definitely in published papers, can be removed effectively by ICA but differences in peak-to-peak amplitude have a much less clear and removable cause. My preferences is to remove independent components that I can confidently say are non-neural but, depending on the data and use, reject data from a likely non-neural source that can't be removed easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this is more broadly applicable and would cover your use cases as well as any future ones. Not super convinced about drop_epochs_xxx because they drop so many trials.
There are cases where you might want 20 good trial rather than 50 good and okay trials. Also, an even more rigorous method would be to compare stricter vs less-strict post-hoc corrections/autoreject settings. I think this should be facilitated to validate the method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But autoreject is all about automatic interpolation and saving data ... advocating posthoc methods to drop trials seems a bit like defeating it's purpose. I'm not against adding it for future validation, but maybe the tutorial should focus on API rather than new untested methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just logically being more conservative doesn't need validation; you're using a proper subset of the trials you were accepting before so if they were good enough for the first validation, they logically must be good enough in this case. Furthermore validated against what? RANSAC and FASTER? There's no ground truth there either, it's all a house of cards built off whatever algorithm happened to be picked first. I made this PR because for this dataset in the example, there were glaring internal inconsistencies mentioned in the issue. Yes, the validation means something but first and foremost should always be inspecting the dataset and thinking through what makes sense; keeping 50 epochs marked as bad and interpolating 50 epochs out of 110 epochs for one channel just plain doesn't make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just logically being more conservative doesn't need validation;
if you drop more trials, your average will be more noisy. You need to improve the quality of data, which may or may not improve when bad data is removed. This requires defining metrics of quality and comparing them. It's not as simple as saying "dropping more trials is better".
Furthermore validated against what? RANSAC and FASTER? There's no ground truth there either, it's all a house of cards built off whatever algorithm happened to be picked first.
This is a question the person validating needs to answer :) The additions must improve upon the state-of-the-art in some measurable way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you drop more trials you will have fewer epochs unless you collect more data. I think that's the key point, you don't know how much data the user has and autoreject
is really agnostic to that (i.e. you don't require 40 trials for it to run) so really what autoreject
does is trial-by-trial validation. If you look at it that way, dropping more trials is better because then there are fewer false positives potentially and you just have to collect more data to make up for the missing trials.
Improving on the state-of-the-art makes sense in a machine learning context like classifying images but it's really not very convincing when it comes to M/EEG trial rejection. There is no benchmark-- you can say a picture has a cat in it and almost everyone will agree but there is no real way to say what a bad trial is other than relative to other channels, which doesn't have a ground truth as far as how different it has to be. I think the main point is that dressing it up as a theoretically grounded approach when it's really just a best-guess approximation is misleading, and the consequence of that--suggesting users take the output of autoreject
as the truth, rather than tweaking outputs and comparing is counterproductive in my opinion.
if ch_adjacency[bad_idx, bad_idx2]: | ||
self.bad_epochs[i] = True | ||
|
||
def interpolate_bads(self, interp_thresh=0.5): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the default should be 0.8 or so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
closes #217 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you feel strongly about this I'll change it but it seems odd to me to keep a channel that you are interpolating for 79% of trials. The other 21% of trials aren't going to match and so I wouldn't recommend doing that.
autoreject/autoreject.py
Outdated
# adjust to adjacency matrix | ||
bad_idx = ch_names.index(bad_ch) | ||
bad_idx2 = ch_names.index(bad_ch2) | ||
if ch_adjacency[bad_idx, bad_idx2]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this should be something like:
"if more than x% of neighboring channels are bad, then drop" ... remember that bad channels still left are actually not that bad because we sort by amplitude before interpolating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're not that bad but they are over the peak-to-peak threshold for that channel...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is a use-case where you want to keep data where one neighbor is bad but not when two neighbors are bad, it's very uncommon that two neighbors are bad, I think at that point you need to toss the recording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MEG/EEG data is highly correlated. The rank is often much lower than the dimension. You'd be surprised what you can recover by interpolation :) I think the proper approach is to design something based on explained variance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 There's no ground truth since the channel is bad, you don't know if what you're recovering is biological even if it is plausible and explains a lot of the variance. The assumption in that statement is that 16 channels are just as good as 32 or 64 which I disagree with in the general case (although true for specific use-cases).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can do outlier detection from the distribution of explained variance for each channel. Again, I think you're missing the point. The goal is to improve the quality of data ... this may come from interpolation, from not dropping a bad epoch (because you drop good channels along with it), or from doing nanmeans, robust regression etc. Unless you quantify somehow it's anyone's guess what an ad hoc method does
Now we're running into issues because it seems like you don't agree on the premise that any post-hoc modification is acceptable, which is what we discussed in the issue so it's a bit frustrating to bring it up now that the code is written. Since validation using the datasets in the NeuroImage paper is not accessible (or at least would be a lot of work), that presents a rather large barrier to any substantial changes to |
I think post-hoc modifications might be necessary in some circumstances but I'm suggesting that you expose a more general API that allows users to inspect the data, modify the reject_log labels and save them. This was part of your PR which I think is useful and I'd be happy to accept into autoreject. Our point of disagreement relates to the two algorithms you proposed. My first instinct was that it might be helpful but after some thinking I realized that it might open a can of worms as they drop so many trials. You can still use them on your dataset if you simply do: from autoreject import RejectLog
class FancyRejectLog(RejectLog):
def __init__(self, reject_log):
# instantiate your fancyrejectlog using reject_log
def new_method1(self, epochs):
# implement the method here
pass
def new_method2(self, epochs):
pass
fancy_reject_log = FancyRejectLog(ar.reject_log)
fancy_reject_log.new_method1(epochs)
ar.transform(epochs, reject_log=fancy_reject_log) having implemented the |
Fixes #269 and #266.