Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with tedpca #949

Closed
BahmanTahayori opened this issue Jun 6, 2023 · 6 comments · Fixed by #950
Closed

Problem with tedpca #949

BahmanTahayori opened this issue Jun 6, 2023 · 6 comments · Fixed by #950
Labels
bug issues describing a bug or error found in the project

Comments

@BahmanTahayori
Copy link
Contributor

BahmanTahayori commented Jun 6, 2023

Summary

When tedpca is set to a number (float/integer) in tedana 23 an error is reported and the process stops. Here is an example when tedpca is set to 0.99.

tedana: error: argument --tedpca: invalid choice: 0.99 (choose from 'mdl', 'kic', 'aic')

@handwerkerd handwerkerd added the bug issues describing a bug or error found in the project label Jun 6, 2023
@handwerkerd
Copy link
Member

Thank you for noticing this. I think I see what changed. We added the following line to the command line argument parser to limit string inputs, but it prevented users from giving numbers as inputs. Our test used the API, which doesn't have this issue. I'll try to get this fixed soon.

https://github.com/ME-ICA/tedana/blob/9a3b83f21a245e4c0fbf1c8f97ceb9b0c46b52c0/tedana/workflows/tedana.py#LL153C8-L153C8

@handwerkerd
Copy link
Member

Please check and confirm #950 addresses this issue.

I also added in code to skip the dimensionality reduction step, but I realized it caused additional problems so I removed it. My added code would just give the maximum number of components, which would likely make ICA fail since many components would just be gaussian noise. I think your existing method of setting --tedpca=0.99 (or 0.999?) might actually be the best option. You'll effectively model the full dataset, but remove dimensions that are just gaussian. As an added benefit the reduced number of components would effectively be your loss in degrees of freedom from an earlier MP-PCA or NORDIC process. That is, if the full time series has 300 volumes and 99.9% of the variance is in 100 components, that means the PCA-based denoising process effectively removed 200 degrees of freedom. tedana would not keep track of that because it add that noise back in, but you'd be able to track and use that information as appropriate.

@BahmanTahayori
Copy link
Contributor Author

Thanks Dan,

when we wanted to apply changes, I tried to skip PCA but I ran into some issues, therefore, I decided to set the tedpca to a number close to 1 (I used 0.9999 but the results are similar to 0.99). A good number of components have empty maps and will not be considered. I agree with you that manipulating the pca with --tedpca is a good option. Thanks for your help.

@Lestropie
Copy link
Contributor

@handwerkerd Can you provide any additional detail into the additional problems caused by skipping the dimensionality reduction step? I'd still like to pursue that prospect, even if it's a little tricky code-wise.

I'm a little hesitant about setting a variance threshold "close to 100%" and hoping that it does a reasonable job. If there's a lot of low-variance components due to a prior PCA, then a small change in that variance threshold eg. 99% -> 99.9% -> 99.99% might actually change the number of components considerably, and I don't have a feel for what influence that might have on the ICA. So having a full bypass and dealing with classification exclusively at the ICA is still appealing.

@handwerkerd
Copy link
Member

@Lestropie I think setting the threshold to 99% only makes sense if estimated Gaussian noise was removed in an earlier preprocessing step. I think you are aware of that context, but I'm realizing it might not be obvious to others reading this thread.

Ideally, the PCA + dimensionality estimation step should identify the minimum number of components/dimensions required to model the parts of the signal that are not Gaussian noise. That would be my ultimate goal. From a practical perspective, I'm not sure how much that matters. The practical requirement is to end up with a manageable number of components (i.e. <1/3 of the total number of volumes in my typical experience with fMRI) and where ICA reliably converges. Within tedana, any component that is not modeled by PCA is retained in the signal so there is no risk of removing task-locked information in excluded low-variance PCA components. Additionally, both of the current decision tree option skew towards keeping low variance ICA components rather than losing statistical degrees of freedom.

That means that getting dimensionality estimation right means the ICA might be more accurate and opens ways to more aggressively remove undesired noise, but the worst case of getting it slightly wrong is not remove some noise.

FWIW, The kundu method for pca dimensionality estimation in tedana takes advantage of multi-echo information to remove PCA components that are likely to be thermal noise. I think taking advantage of multi-echo information in this step has a lot of potental, but that option is implemented in a confusing and slightly brittle manner so I tend not to use it. Extending that concept, one of my projects is to see if there's a way to use a generalization of ICA to create components that are more purely T2* or S0 weighted and then making noise removal easier and more effective. Some early work in that direction is here: https://fim.nimh.nih.gov/sites/default/files/handwerker_multiechodenoising_ohbm2018_small.pdf

Does this answer your question?

@Lestropie
Copy link
Contributor

There's a lot of discussions to potentially branch out from here, but it all diverges further and further from the original purpose of this issue listing. What I might do is let this thread die naturally, explore some data more closely with @BahmanTahayori, and if I still think there's merit in the prospect of a full bypass (or maybe something else, eg. a tailored PCA rank estimation heuristic for our use case) then I'll re-raise in its own thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug issues describing a bug or error found in the project
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants