Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH, MRG] Add post-hoc modifications #270

Closed
wants to merge 8 commits into from

Conversation

alexrockhill
Copy link
Contributor

@alexrockhill alexrockhill commented Apr 7, 2022

Fixes #269 and #266.

@sappelhoff
Copy link
Collaborator

what does this fix about #268? :-) Did I miss something?

@alexrockhill
Copy link
Contributor Author

what does this fix about #268? :-) Did I miss something?

Oops sorry! Fixed now!

@@ -1392,3 +1417,66 @@ def plot_epochs(self, epochs, scalings=None, title=''):
epochs=epochs,
epoch_colors=epoch_colors, scalings=scalings,
title='')

def drop_epochs_with_adjacent_channel_interpolation(
self, ch_adjacency, ch_names):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this a function, not method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not operate on the reject log? This seems cleaner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hold on ... I am not done with review. I think I accidentally forgot to "start a review" so you start getting comments until I am finished. Still forming my thoughts :)

autoreject/autoreject.py Outdated Show resolved Hide resolved
Copy link
Member

@jasmainak jasmainak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I think the drop_bad_xxx methods begin to venture into the territory of new science. I'm worried about recommending algorithms to users that have not been properly tested on various datasets. The idea of overriding the reject_log and IO for that class is great though

np.nansum(bads, axis=0) > self.labels.shape[0] * interp_thresh)[0]
self.labels[:, interp] = 2

def drop_epochs_with_bads(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems very aggressive ... I propose leaving this out of the API for now. We can revisit it later if there is more demand

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the harm in making it an option. In a use case where you have lots of noisy data (as I do), I think it's better to be on the safe side. I think the false positives make more sense with MEG where the ratio of channels to epochs is usually much greater. In EEG, it seems to make more sense to be to reject any channel that's a bit off.

Comment on lines +200 to +253
# %%
# As a last, optional step, we can do post-doc corrections to be even
# more stringent in order to favor rejecting epochs over keeping channels
# marked bad by ``autoreject`` that, based on cross-validation, are
# considered false positive bad channels.

reject_log = ar.get_reject_log(epochs)

# %%
# We can observe that, for a few sensors,
# there are very few green squares and mostly red and blue squares.
# The autoreject algorithm is keeping the data represented by a
# red square and considering it a false positive based on the
# cross-validation scores. For consistency, we can choose to do a
# post-hoc modification to mark the entire channel for interpolation.
# We would do this for consistency because whether the square is
# blue or red depends on the other sensors in that epoch; if there
# are worse sensors in that epoch, the sensor will be marked red as
# a false positive and, if not, it will be marked blue. In some cases
# it may be preferable to interpolate this channel to avoid this effect.

# if more than half bad (red or blue), interpolate
reject_log.interpolate_bads(interp_thresh=0.5)
reject_log.plot('horizontal')

# %%
# Additionally we can chose to drop instead of interpolating any epoch
# with adjacent bad channels as this will effect the accuracy of the
# interpolation. Similarly, this should be done on a case-by-case basis
# based on the montage of sensors and the analysis.

ch_adjacency, ch_names = mne.channels.find_ch_adjacency(epochs.info, 'eeg')
reject_log.drop_epochs_with_adjacent_channel_interpolation(
ch_adjacency, ch_names)
reject_log.plot('horizontal')

# %%
# Lastly, if we want to avoid any bad sensors with being left in our data,
# even on a single epoch, we can drop all epochs with a red square
# for a bad sensor.

reject_log.drop_epochs_with_bads()
reject_log.plot('horizontal')

# %%
# Finally, we can apply the modified reject log to our epoch data. Note
# that since this example data is not very clean, we have lost most of
# our epochs; applying these post-hoc corrections should be done as
# needed in cases where keeping potentially bad data is not desirable.

epochs_ar = ar.transform(epochs, reject_log=reject_log)
plt.figure()
plt.plot(evoked_bad.times, evoked_bad.data.T * 1e6, 'r', zorder=-1)
epochs_ar.average().plot(axes=plt.gca())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message in the rest of the tutorial seemed to suggest that you need to conserve data, but now you are proposing methods that are more aggressively dropping data, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the tutorial, I would suggest something simple like:

reject_log.labels['EEG 001', :] = 2

then pass:

ar.transform(epochs, reject_log=reject_log)

I feel this is more broadly applicable and would cover your use cases as well as any future ones. Not super convinced about drop_epochs_xxx because they drop so many trials.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message in the rest of the tutorial seemed to suggest that you need to conserve data, but now you are proposing methods that are more aggressively dropping data, no?

I think this is the big point that justifies the posthoc corrections; they are only more stringent so you'll never decrease the reliability of the results so long as you compensate by collecting more trials.

I think there is also some nuance; we were conserving data with blinks which, in my opinion and definitely in published papers, can be removed effectively by ICA but differences in peak-to-peak amplitude have a much less clear and removable cause. My preferences is to remove independent components that I can confidently say are non-neural but, depending on the data and use, reject data from a likely non-neural source that can't be removed easily.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this is more broadly applicable and would cover your use cases as well as any future ones. Not super convinced about drop_epochs_xxx because they drop so many trials.

There are cases where you might want 20 good trial rather than 50 good and okay trials. Also, an even more rigorous method would be to compare stricter vs less-strict post-hoc corrections/autoreject settings. I think this should be facilitated to validate the method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But autoreject is all about automatic interpolation and saving data ... advocating posthoc methods to drop trials seems a bit like defeating it's purpose. I'm not against adding it for future validation, but maybe the tutorial should focus on API rather than new untested methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just logically being more conservative doesn't need validation; you're using a proper subset of the trials you were accepting before so if they were good enough for the first validation, they logically must be good enough in this case. Furthermore validated against what? RANSAC and FASTER? There's no ground truth there either, it's all a house of cards built off whatever algorithm happened to be picked first. I made this PR because for this dataset in the example, there were glaring internal inconsistencies mentioned in the issue. Yes, the validation means something but first and foremost should always be inspecting the dataset and thinking through what makes sense; keeping 50 epochs marked as bad and interpolating 50 epochs out of 110 epochs for one channel just plain doesn't make sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just logically being more conservative doesn't need validation;

if you drop more trials, your average will be more noisy. You need to improve the quality of data, which may or may not improve when bad data is removed. This requires defining metrics of quality and comparing them. It's not as simple as saying "dropping more trials is better".

Furthermore validated against what? RANSAC and FASTER? There's no ground truth there either, it's all a house of cards built off whatever algorithm happened to be picked first.

This is a question the person validating needs to answer :) The additions must improve upon the state-of-the-art in some measurable way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you drop more trials you will have fewer epochs unless you collect more data. I think that's the key point, you don't know how much data the user has and autoreject is really agnostic to that (i.e. you don't require 40 trials for it to run) so really what autoreject does is trial-by-trial validation. If you look at it that way, dropping more trials is better because then there are fewer false positives potentially and you just have to collect more data to make up for the missing trials.

Improving on the state-of-the-art makes sense in a machine learning context like classifying images but it's really not very convincing when it comes to M/EEG trial rejection. There is no benchmark-- you can say a picture has a cat in it and almost everyone will agree but there is no real way to say what a bad trial is other than relative to other channels, which doesn't have a ground truth as far as how different it has to be. I think the main point is that dressing it up as a theoretically grounded approach when it's really just a best-guess approximation is misleading, and the consequence of that--suggesting users take the output of autoreject as the truth, rather than tweaking outputs and comparing is counterproductive in my opinion.

if ch_adjacency[bad_idx, bad_idx2]:
self.bad_epochs[i] = True

def interpolate_bads(self, interp_thresh=0.5):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the default should be 0.8 or so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

closes #217 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you feel strongly about this I'll change it but it seems odd to me to keep a channel that you are interpolating for 79% of trials. The other 21% of trials aren't going to match and so I wouldn't recommend doing that.

autoreject/tests/test_autoreject.py Outdated Show resolved Hide resolved
autoreject/autoreject.py Outdated Show resolved Hide resolved
# adjust to adjacency matrix
bad_idx = ch_names.index(bad_ch)
bad_idx2 = ch_names.index(bad_ch2)
if ch_adjacency[bad_idx, bad_idx2]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this should be something like:

"if more than x% of neighboring channels are bad, then drop" ... remember that bad channels still left are actually not that bad because we sort by amplitude before interpolating

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're not that bad but they are over the peak-to-peak threshold for that channel...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a use-case where you want to keep data where one neighbor is bad but not when two neighbors are bad, it's very uncommon that two neighbors are bad, I think at that point you need to toss the recording.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MEG/EEG data is highly correlated. The rank is often much lower than the dimension. You'd be surprised what you can recover by interpolation :) I think the proper approach is to design something based on explained variance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 There's no ground truth since the channel is bad, you don't know if what you're recovering is biological even if it is plausible and explains a lot of the variance. The assumption in that statement is that 16 channels are just as good as 32 or 64 which I disagree with in the general case (although true for specific use-cases).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can do outlier detection from the distribution of explained variance for each channel. Again, I think you're missing the point. The goal is to improve the quality of data ... this may come from interpolation, from not dropping a bad epoch (because you drop good channels along with it), or from doing nanmeans, robust regression etc. Unless you quantify somehow it's anyone's guess what an ad hoc method does

autoreject/tests/test_autoreject.py Outdated Show resolved Hide resolved
@alexrockhill
Copy link
Contributor Author

Now we're running into issues because it seems like you don't agree on the premise that any post-hoc modification is acceptable, which is what we discussed in the issue so it's a bit frustrating to bring it up now that the code is written. Since validation using the datasets in the NeuroImage paper is not accessible (or at least would be a lot of work), that presents a rather large barrier to any substantial changes to autoreject. The dataset I was working with seemed like autoreject had some performance issues as far as internal consistency as discussed so I thought I would try and add the option to address that for other people but it's not essential for my analysis to work. I think a good way for autoreject to continue to improve would be by allowing people to test modifications and while maintaining access to the published, validated version but it works well enough as is so it's fine with me to just leave it be.

@jasmainak
Copy link
Member

I think post-hoc modifications might be necessary in some circumstances but I'm suggesting that you expose a more general API that allows users to inspect the data, modify the reject_log labels and save them. This was part of your PR which I think is useful and I'd be happy to accept into autoreject.

Our point of disagreement relates to the two algorithms you proposed. My first instinct was that it might be helpful but after some thinking I realized that it might open a can of worms as they drop so many trials. You can still use them on your dataset if you simply do:

from autoreject import RejectLog

class FancyRejectLog(RejectLog):

   def __init__(self, reject_log):
      # instantiate your fancyrejectlog using reject_log

   def new_method1(self, epochs):
       # implement the method here
       pass

   def new_method2(self, epochs):
      pass

fancy_reject_log = FancyRejectLog(ar.reject_log)
fancy_reject_log.new_method1(epochs)
ar.transform(epochs, reject_log=fancy_reject_log)

having implemented the ar.transform(..., reject_log=reject_log) in this PR. This way, more sophisticated methods can be allowed to test and mature a little before deciding to incorporate into autoreject. Decidedly, the threshold for accepting new algorithms into the codebase is high and this has been the case for many years. I have multiple ideas to improve the algorithm but have resisted from adding new changes due to @agramfort 's insistence on quantitative validation which is a fair ask IMHO. We're not dressing up autoreject as a theoretically grounded approach ... it's something that's based on intuitions and works empirically because it's been tested quantitatively across many datasets :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] Add transform to RejectLog
3 participants