[ENH, MRG] Add post-hoc modifications #270

alexrockhill · 2022-04-07T23:45:34Z

Fixes #269 and #266.

sappelhoff · 2022-04-08T06:49:06Z

what does this fix about #268? :-) Did I miss something?

alexrockhill · 2022-04-08T15:36:36Z

what does this fix about #268? :-) Did I miss something?

Oops sorry! Fixed now!

jasmainak · 2022-04-08T19:35:34Z

autoreject/autoreject.py

@@ -1392,3 +1417,66 @@ def plot_epochs(self, epochs, scalings=None, title=''):
            epochs=epochs,
            epoch_colors=epoch_colors, scalings=scalings,
            title='')
+
+    def drop_epochs_with_adjacent_channel_interpolation(
+            self, ch_adjacency, ch_names):


I would make this a function, not method.

Why not operate on the reject log? This seems cleaner.

hold on ... I am not done with review. I think I accidentally forgot to "start a review" so you start getting comments until I am finished. Still forming my thoughts :)

autoreject/autoreject.py

jasmainak

Looks good. I think the drop_bad_xxx methods begin to venture into the territory of new science. I'm worried about recommending algorithms to users that have not been properly tested on various datasets. The idea of overriding the reject_log and IO for that class is great though

jasmainak · 2022-04-08T19:46:23Z

autoreject/autoreject.py

+            np.nansum(bads, axis=0) > self.labels.shape[0] * interp_thresh)[0]
+        self.labels[:, interp] = 2
+
+    def drop_epochs_with_bads(self):


this seems very aggressive ... I propose leaving this out of the API for now. We can revisit it later if there is more demand

I don't see the harm in making it an option. In a use case where you have lots of noisy data (as I do), I think it's better to be on the safe side. I think the false positives make more sense with MEG where the ratio of channels to epochs is usually much greater. In EEG, it seems to make more sense to be to reject any channel that's a bit off.

jasmainak · 2022-04-08T19:49:13Z

examples/plot_autoreject_workflow.py

+# %%
+# As a last, optional step, we can do post-doc corrections to be even
+# more stringent in order to favor rejecting epochs over keeping channels
+# marked bad by ``autoreject`` that, based on cross-validation, are
+# considered false positive bad channels.
+
+reject_log = ar.get_reject_log(epochs)
+
+# %%
+# We can observe that, for a few sensors,
+# there are very few green squares and mostly red and blue squares.
+# The autoreject algorithm is keeping the data represented by a
+# red square and considering it a false positive based on the
+# cross-validation scores. For consistency, we can choose to do a
+# post-hoc modification to mark the entire channel for interpolation.
+# We would do this for consistency because whether the square is
+# blue or red depends on the other sensors in that epoch; if there
+# are worse sensors in that epoch, the sensor will be marked red as
+# a false positive and, if not, it will be marked blue. In some cases
+# it may be preferable to interpolate this channel to avoid this effect.
+
+# if more than half bad (red or blue), interpolate
+reject_log.interpolate_bads(interp_thresh=0.5)
+reject_log.plot('horizontal')
+
+# %%
+# Additionally we can chose to drop instead of interpolating any epoch
+# with adjacent bad channels as this will effect the accuracy of the
+# interpolation. Similarly, this should be done on a case-by-case basis
+# based on the montage of sensors and the analysis.
+
+ch_adjacency, ch_names = mne.channels.find_ch_adjacency(epochs.info, 'eeg')
+reject_log.drop_epochs_with_adjacent_channel_interpolation(
+    ch_adjacency, ch_names)
+reject_log.plot('horizontal')
+
+# %%
+# Lastly, if we want to avoid any bad sensors with being left in our data,
+# even on a single epoch, we can drop all epochs with a red square
+# for a bad sensor.
+
+reject_log.drop_epochs_with_bads()
+reject_log.plot('horizontal')
+
+# %%
+# Finally, we can apply the modified reject log to our epoch data. Note
+# that since this example data is not very clean, we have lost most of
+# our epochs; applying these post-hoc corrections should be done as
+# needed in cases where keeping potentially bad data is not desirable.
+
+epochs_ar = ar.transform(epochs, reject_log=reject_log)
+plt.figure()
+plt.plot(evoked_bad.times, evoked_bad.data.T * 1e6, 'r', zorder=-1)
+epochs_ar.average().plot(axes=plt.gca())


The message in the rest of the tutorial seemed to suggest that you need to conserve data, but now you are proposing methods that are more aggressively dropping data, no?

For the tutorial, I would suggest something simple like:

reject_log.labels['EEG 001', :] = 2

then pass:

ar.transform(epochs, reject_log=reject_log)

I feel this is more broadly applicable and would cover your use cases as well as any future ones. Not super convinced about drop_epochs_xxx because they drop so many trials.

The message in the rest of the tutorial seemed to suggest that you need to conserve data, but now you are proposing methods that are more aggressively dropping data, no?

I think this is the big point that justifies the posthoc corrections; they are only more stringent so you'll never decrease the reliability of the results so long as you compensate by collecting more trials.

I think there is also some nuance; we were conserving data with blinks which, in my opinion and definitely in published papers, can be removed effectively by ICA but differences in peak-to-peak amplitude have a much less clear and removable cause. My preferences is to remove independent components that I can confidently say are non-neural but, depending on the data and use, reject data from a likely non-neural source that can't be removed easily.

I feel this is more broadly applicable and would cover your use cases as well as any future ones. Not super convinced about drop_epochs_xxx because they drop so many trials.

There are cases where you might want 20 good trial rather than 50 good and okay trials. Also, an even more rigorous method would be to compare stricter vs less-strict post-hoc corrections/autoreject settings. I think this should be facilitated to validate the method.

But autoreject is all about automatic interpolation and saving data ... advocating posthoc methods to drop trials seems a bit like defeating it's purpose. I'm not against adding it for future validation, but maybe the tutorial should focus on API rather than new untested methods.

Just logically being more conservative doesn't need validation; you're using a proper subset of the trials you were accepting before so if they were good enough for the first validation, they logically must be good enough in this case. Furthermore validated against what? RANSAC and FASTER? There's no ground truth there either, it's all a house of cards built off whatever algorithm happened to be picked first. I made this PR because for this dataset in the example, there were glaring internal inconsistencies mentioned in the issue. Yes, the validation means something but first and foremost should always be inspecting the dataset and thinking through what makes sense; keeping 50 epochs marked as bad and interpolating 50 epochs out of 110 epochs for one channel just plain doesn't make sense.

Just logically being more conservative doesn't need validation;

if you drop more trials, your average will be more noisy. You need to improve the quality of data, which may or may not improve when bad data is removed. This requires defining metrics of quality and comparing them. It's not as simple as saying "dropping more trials is better".

Furthermore validated against what? RANSAC and FASTER? There's no ground truth there either, it's all a house of cards built off whatever algorithm happened to be picked first.

This is a question the person validating needs to answer :) The additions must improve upon the state-of-the-art in some measurable way.

If you drop more trials you will have fewer epochs unless you collect more data. I think that's the key point, you don't know how much data the user has and autoreject is really agnostic to that (i.e. you don't require 40 trials for it to run) so really what autoreject does is trial-by-trial validation. If you look at it that way, dropping more trials is better because then there are fewer false positives potentially and you just have to collect more data to make up for the missing trials.

Improving on the state-of-the-art makes sense in a machine learning context like classifying images but it's really not very convincing when it comes to M/EEG trial rejection. There is no benchmark-- you can say a picture has a cat in it and almost everyone will agree but there is no real way to say what a bad trial is other than relative to other channels, which doesn't have a ground truth as far as how different it has to be. I think the main point is that dressing it up as a theoretically grounded approach when it's really just a best-guess approximation is misleading, and the consequence of that--suggesting users take the output of autoreject as the truth, rather than tweaking outputs and comparing is counterproductive in my opinion.

jasmainak · 2022-04-08T19:50:09Z

autoreject/autoreject.py

+                        if ch_adjacency[bad_idx, bad_idx2]:
+                            self.bad_epochs[i] = True
+
+    def interpolate_bads(self, interp_thresh=0.5):


I think the default should be 0.8 or so.

closes #217 ?

If you feel strongly about this I'll change it but it seems odd to me to keep a channel that you are interpolating for 79% of trials. The other 21% of trials aren't going to match and so I wouldn't recommend doing that.

autoreject/tests/test_autoreject.py

autoreject/autoreject.py

jasmainak · 2022-04-08T21:00:06Z

autoreject/autoreject.py

+                        # adjust to adjacency matrix
+                        bad_idx = ch_names.index(bad_ch)
+                        bad_idx2 = ch_names.index(bad_ch2)
+                        if ch_adjacency[bad_idx, bad_idx2]:


I feel this should be something like:

"if more than x% of neighboring channels are bad, then drop" ... remember that bad channels still left are actually not that bad because we sort by amplitude before interpolating

They're not that bad but they are over the peak-to-peak threshold for that channel...

I don't think there is a use-case where you want to keep data where one neighbor is bad but not when two neighbors are bad, it's very uncommon that two neighbors are bad, I think at that point you need to toss the recording.

MEG/EEG data is highly correlated. The rank is often much lower than the dimension. You'd be surprised what you can recover by interpolation :) I think the proper approach is to design something based on explained variance

-1 There's no ground truth since the channel is bad, you don't know if what you're recovering is biological even if it is plausible and explains a lot of the variance. The assumption in that statement is that 16 channels are just as good as 32 or 64 which I disagree with in the general case (although true for specific use-cases).

you can do outlier detection from the distribution of explained variance for each channel. Again, I think you're missing the point. The goal is to improve the quality of data ... this may come from interpolation, from not dropping a bad epoch (because you drop good channels along with it), or from doing nanmeans, robust regression etc. Unless you quantify somehow it's anyone's guess what an ad hoc method does

autoreject/tests/test_autoreject.py

alexrockhill · 2022-04-11T20:55:13Z

Looks reasonable to me https://output.circle-artifacts.com/output/job/8a185b09-cf12-4ede-9e76-df677a648a9e/artifacts/0/html/auto_examples/plot_autoreject_workflow.html#sphx-glr-auto-examples-plot-autoreject-workflow-py, okay to merge @jasmainak?

alexrockhill · 2022-04-12T17:44:53Z

Now we're running into issues because it seems like you don't agree on the premise that any post-hoc modification is acceptable, which is what we discussed in the issue so it's a bit frustrating to bring it up now that the code is written. Since validation using the datasets in the NeuroImage paper is not accessible (or at least would be a lot of work), that presents a rather large barrier to any substantial changes to autoreject. The dataset I was working with seemed like autoreject had some performance issues as far as internal consistency as discussed so I thought I would try and add the option to address that for other people but it's not essential for my analysis to work. I think a good way for autoreject to continue to improve would be by allowing people to test modifications and while maintaining access to the published, validated version but it works well enough as is so it's fine with me to just leave it be.

jasmainak · 2022-04-12T22:05:22Z

I think post-hoc modifications might be necessary in some circumstances but I'm suggesting that you expose a more general API that allows users to inspect the data, modify the reject_log labels and save them. This was part of your PR which I think is useful and I'd be happy to accept into autoreject.

Our point of disagreement relates to the two algorithms you proposed. My first instinct was that it might be helpful but after some thinking I realized that it might open a can of worms as they drop so many trials. You can still use them on your dataset if you simply do:

from autoreject import RejectLog

class FancyRejectLog(RejectLog):

   def __init__(self, reject_log):
      # instantiate your fancyrejectlog using reject_log

   def new_method1(self, epochs):
       # implement the method here
       pass

   def new_method2(self, epochs):
      pass

fancy_reject_log = FancyRejectLog(ar.reject_log)
fancy_reject_log.new_method1(epochs)
ar.transform(epochs, reject_log=fancy_reject_log)

having implemented the ar.transform(..., reject_log=reject_log) in this PR. This way, more sophisticated methods can be allowed to test and mature a little before deciding to incorporate into autoreject. Decidedly, the threshold for accepting new algorithms into the codebase is high and this has been the case for many years. I have multiple ideas to improve the algorithm but have resisted from adding new changes due to @agramfort 's insistence on quantitative validation which is a fair ask IMHO. We're not dressing up autoreject as a theoretically grounded approach ... it's something that's based on intuitions and works empirically because it's been tested quantitatively across many datasets :)

alexrockhill added 2 commits April 7, 2022 16:44

add post-hoc modifications

8cf9cb3

add PR number

b748a90

alexrockhill added 3 commits April 8, 2022 08:48

fix issues

a24c1fa

more fixes

1413486

one more fix

beafba7

jasmainak reviewed Apr 8, 2022

View reviewed changes

autoreject/autoreject.py Show resolved Hide resolved

jasmainak reviewed Apr 8, 2022

View reviewed changes

autoreject/autoreject.py Show resolved Hide resolved

jasmainak reviewed Apr 8, 2022

View reviewed changes

autoreject/autoreject.py Outdated Show resolved Hide resolved

mainak suggestions

cc92fb8

jasmainak reviewed Apr 8, 2022

View reviewed changes

alexrockhill added 2 commits April 8, 2022 15:46

mainak review

63cafea

unaddressed comments

622ac81

alexrockhill closed this Apr 12, 2022

jasmainak mentioned this pull request Apr 14, 2022

[MRG] add reject_log option to ar.transform #273

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH, MRG] Add post-hoc modifications #270

[ENH, MRG] Add post-hoc modifications #270

alexrockhill commented Apr 7, 2022 •

edited

sappelhoff commented Apr 8, 2022

alexrockhill commented Apr 8, 2022

jasmainak Apr 8, 2022

alexrockhill Apr 8, 2022

jasmainak Apr 8, 2022

jasmainak left a comment

jasmainak Apr 8, 2022

alexrockhill Apr 8, 2022 •

edited

jasmainak Apr 8, 2022

jasmainak Apr 8, 2022

alexrockhill Apr 8, 2022

alexrockhill Apr 8, 2022

jasmainak Apr 12, 2022

alexrockhill Apr 12, 2022

jasmainak Apr 12, 2022

alexrockhill Apr 12, 2022

jasmainak Apr 8, 2022

jasmainak Apr 8, 2022

alexrockhill Apr 8, 2022

jasmainak Apr 8, 2022

alexrockhill Apr 8, 2022

alexrockhill Apr 8, 2022

jasmainak Apr 12, 2022

alexrockhill Apr 12, 2022

jasmainak Apr 12, 2022

alexrockhill commented Apr 11, 2022

alexrockhill commented Apr 12, 2022

jasmainak commented Apr 12, 2022

[ENH, MRG] Add post-hoc modifications #270

[ENH, MRG] Add post-hoc modifications #270

Conversation

alexrockhill commented Apr 7, 2022 • edited

sappelhoff commented Apr 8, 2022

alexrockhill commented Apr 8, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasmainak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexrockhill Apr 8, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexrockhill commented Apr 11, 2022

alexrockhill commented Apr 12, 2022

jasmainak commented Apr 12, 2022

alexrockhill commented Apr 7, 2022 •

edited

alexrockhill Apr 8, 2022 •

edited