-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate metrics based on decision tree #969
Conversation
I still need to update a bunch of tests and the reclassification workflow, since many things depend on the old class. |
One thing I'm realizing here is that the current approach of passing the selector into decision functions and modifying it within them is unfortunately fragile. Given that the decision functions primarily modify the component table, I think they could be refactored to just accept the comptable and modify that. |
@tsalo I'm realizing that to specify the external metrics in the decision tree json, like you requested in tsalo#14 it will be much easier if we merge this first. I've identified a few of the breaking test issues and I think I can get this working, but it's going to intersect with a bunch of the recent updates to main. Do you want to align with |
I think some of the |
@eurunuela Do you have time to review this in the near-ish future? Both Taylor & I worked on this, so it could use one more pair of eyes. |
Absolutely! I'll have a look at it next week. |
@eurunuela Flagging again in case you have a chance to look this over. The dev call is next week and I'd like to know if there's anything in this PR that needs discussion there. |
Yes, sorry. Will definitely review this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@tsalo The last round of edits were more me than you. You should probably do the last round of checks to make sure you're happy with this & then merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed a couple of small typos and one step that seems extraneous.
# Create a re-initialized selector object if rerunning | ||
selector = ComponentSelector(tree) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be necessary. If it is, then we should open an issue to fix it in a future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was necessary because selector.select
looks for the last node in the decision tree that was run and then continues to run the rest of the nodes. This is how ica_reclassify
works.
This is already happening in main
but it is within tedica.automatic_selection
tedana/tedana/selection/tedica.py
Lines 69 to 70 in a21d65e
selector = ComponentSelector(tree, component_table, cross_component_metrics=xcomp) | |
selector.select() |
By removing the component table from the selector
initialization and now initializing in tedana.py
the fact that this was happening is more up-front.
If the goal is to log all failed attempts, we would probably want to save both the ica mixing matrices and the outputs. This is a bit tricky because we currently save most files at the end and not as the data in those files are generated. We could also add some way to tack on a full tree to the end of the executed tree, but that might just add confusion. Either way, this does feel like a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thank you for explaining. This is different from scikit-learn classes, which have fit
, fit_transform
, and transform
methods that distinguish between fitting the estimator to training data and transforming new data using the parameters from the fitting data. I know we don't need to have one-to-one correspondence, but I guess I didn't intuitively grasp it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say this is the equivalent of running fit_transform
a second time on the same dataset using the slightly different parameters (different seed in our cause). We also have a second use case that's the equivalent of running a second transform on a dataset that's already been transformed. I'm not sure that's something that is or should be time with the methods in scikit-learn
Co-authored-by: Taylor Salo <salot@pennmedicine.upenn.edu>
Co-authored-by: Taylor Salo <salot@pennmedicine.upenn.edu>
Co-authored-by: Taylor Salo <salot@pennmedicine.upenn.edu>
Can we merge this? |
@tsalo You're effectively the second review since I made the last round of changes. If you're happy with this, then merge. |
Closes #921.
Changes proposed in this pull request:
component_table
,cross_component_metrics
,status_table
as parameters toselect
, rather than__init__
. This better fits the scikit-learn organization, in which__init__
takes hyperparameters andtransform
(which is equivalent toselect
) takes actual data.select
.ComponentSelector
initialization is done before fMRI data are loaded to save time if the user-provided tree as an errorn_echoes
andn_vols
are stored inselector.cross_component_metrics
instead of as separate parametersica_reclassify
now starts at the first node in the inputtedtree
that doesn't have anoutputs
key rather than the first node that isn't defined in an un-run tree. This potentially opens up additional functionality sinceica_reclassify
could build on multiple previous executions