Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] PSD of Ragged Epochs #12315

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Conversation

alexrockhill
Copy link
Contributor

It would be nice to be able to compute a power spectral density across epochs of different lengths using the duration in the annotation attribute of a raw object. Here is one solution trying to reuse as much code as possible. However, this solution seems a bit inelegant (and I didn't handle rejecting epochs by annotation properly with the drop_log and I took the whole raw data array when it would be more memory efficient just to take the slice of interest if tmin and/or tmax are provided, again because the solution was inelegant and that would have made it even clunkier). Thoughts? Is this the best way to accomplish this and I should just rework it a bit more or maybe a separate function (e.g. raw.compute_ragged_epochs_spectrum) would be better?

@larsoner
Copy link
Member

Thoughts? Is this the best way to accomplish this and I should just rework it a bit more or maybe a separate function (e.g. raw.compute_ragged_epochs_spectrum) would be better?

To me ideally we would handle different-length Epochs gracefully first, then downstream methods like PSD/TFR etc. can follow suit. It seems like if we go from the other end and start with a raw.compute_psd(..., ragged_epochs=True) it seems like eventually we'll need to deprecate it in favor of Epochs(raw, ..., allow_ragged=True).compute_psd() or so.

See also:

Coming back to the idea of starting from Epochs, it seems like keeping our underlying representation as an ndarray of shape (n_epochs, n_channels, n_time_points_max) for example where the data array is a np.masked_array might not be too terrible to implement. If we end up wanting to go this route, it might be good to have a dev meeting to figure out some details.

If this approach does potentially appeal to you, could you try first using #12311 with Epochs(raw, tmin=min(tmins), max=max(tmaxs)) or so first to then think about if you could limit compute_psd to just the correct tmin, tmax window if it would work?

@alexrockhill
Copy link
Contributor Author

alexrockhill commented Dec 19, 2023

Hmmm I read #3533 and see the plethora of issues and maintenance burden that accompany supporting variable duration epochs. I can only really think of one use case for variable epochs where they are comparable--that is some kind of steady-state task like cuing the participant to do a repetitive grasping motion and they actually move for a variable period of time with an FFT-based analaysis. For the analysis, you could limit your data to the shortest period and take a smaller slice out of all the longer duration trials but this seems very wasteful of data since, as I understand it, FFT methods should be slightly noisier for shorter duration trials but generally comparable across trials of different duration (this should be checked in an analysis, clearly). That seems like a clear use-case whereas time-frequency and ERP and everything else I can think of seems like it would fit within the existing epoch framework (i.e. you'd want to crop to equal lengths or interpolate before comparing anyway so that the dimensions match up). So to me, this seems like much more of a one-off kind of thing.

I drafted the PR instead of continuing to work on it because it seemed so inelegant and I thought it would be nice to get feedback so thanks for that. I'm -0.5 on more general ragged epoch support though, I think this could be kept small in scope. Does that sound reasonable or is it important to support more general cases?

@larsoner
Copy link
Member

There are API/implementation issues with ragged epochs. Here you'll have at least some of those... plus a reimplementation of epoching itself, which will bring its own set of issues. So I'm not sure in the end whether or not the current smaller-scope idea will have a more favorable pain-of-maintenance-to-added-functionality ratio :)

@alexrockhill
Copy link
Contributor Author

That's a good point. I agree that re-epoching is a huge pain.

Perhaps all of it can be kept at a very high level and each event can be temporarily pulled from the raw into an evoked object, the PSD of that computed and then aggregated into EpochsSpectrum. That seems potentially inefficient but I think you have to loop anyway. You can definitely pre-allocate memory by doing a fencepost as in how this PR does it currently so maybe there isn't much to gain efficiency-wise by modifying existing functions as done here. I can draft that as an alternative to the current changes in this PR if that doesn't sound too hacky.

@larsoner
Copy link
Member

Perhaps all of it can be kept at a very high level and each event can be temporarily pulled from the raw into an evoked object, the PSD of that computed and then aggregated into EpochsSpectrum. That seems potentially inefficient but I think you have to loop anyway. You can definitely pre-allocate memory by doing a fencepost as in how this PR does it currently so maybe there isn't much to gain efficiency-wise by modifying existing functions as done here. I can draft that as an alternative to the current changes in this PR if that doesn't sound too hacky.

Basically you will end up needing to reimplement the logic and operations of Epochs without using Epochs to do it. So not only is there code dup but it'll be incomplete, not support stuff like reject and flat and reject_by_annotations and worrying about if the whole epoch can fit at the stard and end, and on and on (Epochs does a lot of stuff!). And if you try to add more and more of these then you eventually will end up where you could have added variable-length support to Epochs in the first place. So to me this will end up being a bit hacky I think :(

@drammock
Copy link
Member

I'm -0.5 on more general ragged epoch support though, I think this could be kept small in scope. Does that sound reasonable or is it important to support more general cases?

I think if we're going to support any analyses of variable-length trials/epochs in module code, then we should do so in a way that addresses the general case.

I can only really think of one use case for variable epochs where they are comparable

Anecdotally: #3533 describes a use case that I've personally needed, and off the top of my head I can recall at least two separate questions about it on the forum (plus @jona-sassenhagen used to do such analyses IIRC). Namely, stimuli are (variable-length) spoken sentences, and the analysis target is a keyword that occurs (at varying latency) somewhere other than at sentence onset, but the baseline period is pre-sentence-onset. It's not an unusual design in the language-processing world. Even if the approach in #3533 doesn't get you all the way to doing statistics on that data (the temporal alignment won't be right), it lets you do epoching/baselining and apply the inverse, so you can visualize your data much more easily for exploration / sanity checks.

@alexrockhill
Copy link
Contributor Author

Ok that all seems very convincing for general variable length epochs support. I can give @larsoner 's suggestion about masked arrays a go, don't seem too daunting.

@alexrockhill
Copy link
Contributor Author

I'm going to wait until #12311 is merged to attempt this because I think the natural constructor would be to use the annotations with allow_ragged=True to signal that you should actually consider the annotations.duration. I looked into masked_arrays, I haven't used them really but I am (probably overly) optimistic that it actually wouldn't be that bad.

@alexrockhill
Copy link
Contributor Author

import numpy as np
import numpy.ma as ma
import mne
from mne.datasets import sample

data_path = sample.data_path()
meg_path = data_path / "MEG" / "sample"
fname_raw = meg_path / "sample_audvis_raw.fif"
raw = mne.io.read_raw_fif(fname_raw)
raw.load_data()
events = mne.find_events(raw)
raw.pick_channels(raw.ch_names[:2])

epochs = mne.Epochs(raw, events[:3], tmin=-0.1, tmax=0.1)
epochs.load_data()
mask = np.zeros_like(epochs.get_data())
mask[0, 0, :50] = 1
mask[1, 0, 50:] = 1
mask[:, 0, 100:] = 1
epochs._data = ma.masked_array(epochs._data, mask=mask, fill_value=np.nan)
epochs.plot()

evoked = epochs.average()
evoked.plot()

epochs.save('tmp-epo.fif')
epochs2 = mne.read_epochs('tmp-epo.fif')

from scipy.signal import spectrogram
psd = spectrogram(epochs._data[0, 0], epochs.info['sfreq'])[2]
print(psd)

Hmmm so it looks like the upstream libraries (e.g. scipy don't properly handle masked arrays). This might be more of a PR to scipy since otherwise some things work. Looks like evoked.plot just works right away.

So I think things that will have to check work:

  • psd_array_welch
  • psd_array_multitaper
  • evoked.plot
  • evoked.plot_image
  • epochs.plot
  • epochs.plot_image
  • epochs.plot_topo_image
  • IO (maybe not possible? at least to formats other than h5)

That seems like the majority of things that would need to work and doesn't seem bad besides the upstream PRs which might be a big headache. A bit odd the things that do and don't work, it seems like `matplotlib` based works great but the `qt` based stuff maybe more issues.

@agramfort
Copy link
Member

agramfort commented Dec 20, 2023 via email

@alexrockhill
Copy link
Contributor Author

Hmm good point, probably something better as a discussion than a reply thread

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants