Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate metrics based on decision tree #969

Merged
merged 39 commits into from
Mar 20, 2024
Merged

Conversation

tsalo
Copy link
Member

@tsalo tsalo commented Aug 11, 2023

Closes #921.

Changes proposed in this pull request:

  • This helps ensure that metrics requested in decision trees are reflected in the component table. This will allow users to develop decision trees that use metrics other than the ones currently hardcoded into the tedana workflow code, as long as the metrics are actually defined in the tedana metric collection function.
  • Restructure the ComponentSelector class to take component_table, cross_component_metrics, status_table as parameters to select, rather than __init__. This better fits the scikit-learn organization, in which __init__ takes hyperparameters and transform (which is equivalent to select) takes actual data.
    • Also add a trailing underscore to any attributes that are set in select.
  • Initialize the ComponentSelector in main tedana function, then grab the necessary metrics from the object. We also have to ensure that a few extra metrics are calculated, since they're used for the reports.
    • ComponentSelector initialization is done before fMRI data are loaded to save time if the user-provided tree as an error
  • When defined, n_echoes and n_vols are stored in selector.cross_component_metrics instead of as separate parameters
  • ica_reclassify now starts at the first node in the inputted tree that doesn't have an outputs key rather than the first node that isn't defined in an un-run tree. This potentially opens up additional functionality since ica_reclassify could build on multiple previous executions
  • Updates to docstrings and documentation to explain the added functionality.

@tsalo tsalo added the bug issues describing a bug or error found in the project label Aug 15, 2023
@tsalo
Copy link
Member Author

tsalo commented Aug 23, 2023

I still need to update a bunch of tests and the reclassification workflow, since many things depend on the old class.

@tsalo
Copy link
Member Author

tsalo commented Feb 16, 2024

One thing I'm realizing here is that the current approach of passing the selector into decision functions and modifying it within them is unfortunately fragile. Given that the decision functions primarily modify the component table, I think they could be refactored to just accept the comptable and modify that.

@handwerkerd
Copy link
Member

@tsalo I'm realizing that to specify the external metrics in the decision tree json, like you requested in tsalo#14 it will be much easier if we merge this first. I've identified a few of the breaking test issues and I think I can get this working, but it's going to intersect with a bunch of the recent updates to main. Do you want to align with main or should I just do that and work from there?

@handwerkerd
Copy link
Member

One thing I'm realizing here is that the current approach of passing the selector into decision functions and modifying it within them is unfortunately fragile. Given that the decision functions primarily modify the component table, I think they could be refactored to just accept the comptable and modify that.

I think some of the selection_util functions just interact with comptable, but the functions in selection_nodes.py interact with multiple fields in selector. Which aspect of fragility is bothering you? Having a better sense of that might help me figure out how to resolve that issue.

@handwerkerd handwerkerd marked this pull request as ready for review February 28, 2024 18:56
@handwerkerd handwerkerd self-requested a review February 28, 2024 19:00
handwerkerd
handwerkerd previously approved these changes Feb 28, 2024
@handwerkerd handwerkerd added enhancement issues describing possible enhancements to the project refactoring issues proposing/requesting changes to the code which do not impact behavior labels Feb 28, 2024
@tsalo tsalo removed the bug issues describing a bug or error found in the project label Feb 28, 2024
@handwerkerd
Copy link
Member

@eurunuela Do you have time to review this in the near-ish future? Both Taylor & I worked on this, so it could use one more pair of eyes.

@eurunuela
Copy link
Collaborator

@eurunuela Do you have time to review this in the near-ish future? Both Taylor & I worked on this, so it could use one more pair of eyes.

Absolutely! I'll have a look at it next week.

@handwerkerd
Copy link
Member

@eurunuela Flagging again in case you have a chance to look this over. The dev call is next week and I'd like to know if there's anything in this PR that needs discussion there.

@eurunuela
Copy link
Collaborator

Yes, sorry. Will definitely review this week.

eurunuela
eurunuela previously approved these changes Mar 13, 2024
Copy link
Collaborator

@eurunuela eurunuela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@handwerkerd
Copy link
Member

@tsalo The last round of edits were more me than you. You should probably do the last round of checks to make sure you're happy with this & then merge.

Copy link
Member Author

@tsalo tsalo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed a couple of small typos and one step that seems extraneous.

docs/building_decision_trees.rst Outdated Show resolved Hide resolved
tedana/selection/selection_utils.py Outdated Show resolved Hide resolved
Comment on lines +709 to +710
# Create a re-initialized selector object if rerunning
selector = ComponentSelector(tree)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary. If it is, then we should open an issue to fix it in a future PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was necessary because selector.select looks for the last node in the decision tree that was run and then continues to run the rest of the nodes. This is how ica_reclassify works.

This is already happening in main but it is within tedica.automatic_selection

selector = ComponentSelector(tree, component_table, cross_component_metrics=xcomp)
selector.select()

By removing the component table from the selector initialization and now initializing in tedana.py the fact that this was happening is more up-front.

If the goal is to log all failed attempts, we would probably want to save both the ica mixing matrices and the outputs. This is a bit tricky because we currently save most files at the end and not as the data in those files are generated. We could also add some way to tack on a full tree to the end of the executed tree, but that might just add confusion. Either way, this does feel like a separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thank you for explaining. This is different from scikit-learn classes, which have fit, fit_transform, and transform methods that distinguish between fitting the estimator to training data and transforming new data using the parameters from the fitting data. I know we don't need to have one-to-one correspondence, but I guess I didn't intuitively grasp it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say this is the equivalent of running fit_transform a second time on the same dataset using the slightly different parameters (different seed in our cause). We also have a second use case that's the equivalent of running a second transform on a dataset that's already been transformed. I'm not sure that's something that is or should be time with the methods in scikit-learn

Co-authored-by: Taylor Salo <salot@pennmedicine.upenn.edu>
@handwerkerd handwerkerd dismissed stale reviews from eurunuela and themself via 8037dbf March 13, 2024 15:28
Co-authored-by: Taylor Salo <salot@pennmedicine.upenn.edu>
Co-authored-by: Taylor Salo <salot@pennmedicine.upenn.edu>
@tsalo
Copy link
Member Author

tsalo commented Mar 20, 2024

Can we merge this?

@handwerkerd
Copy link
Member

@tsalo You're effectively the second review since I made the last round of changes. If you're happy with this, then merge.

@tsalo tsalo merged commit 9cbd484 into ME-ICA:main Mar 20, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement issues describing possible enhancements to the project refactoring issues proposing/requesting changes to the code which do not impact behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check metrics exist before running tree. Possibly calculate metrics from tree
3 participants