Generate metrics based on decision tree #969

tsalo · 2023-08-11T18:15:08Z

Closes #921.

Changes proposed in this pull request:

This helps ensure that metrics requested in decision trees are reflected in the component table. This will allow users to develop decision trees that use metrics other than the ones currently hardcoded into the tedana workflow code, as long as the metrics are actually defined in the tedana metric collection function.
Restructure the ComponentSelector class to take component_table, cross_component_metrics, status_table as parameters to select, rather than __init__. This better fits the scikit-learn organization, in which __init__ takes hyperparameters and transform (which is equivalent to select) takes actual data.
- Also add a trailing underscore to any attributes that are set in select.
Initialize the ComponentSelector in main tedana function, then grab the necessary metrics from the object. We also have to ensure that a few extra metrics are calculated, since they're used for the reports.
- ComponentSelector initialization is done before fMRI data are loaded to save time if the user-provided tree as an error
When defined, n_echoes and n_vols are stored in selector.cross_component_metrics instead of as separate parameters
ica_reclassify now starts at the first node in the inputted tree that doesn't have an outputs key rather than the first node that isn't defined in an un-run tree. This potentially opens up additional functionality since ica_reclassify could build on multiple previous executions
Updates to docstrings and documentation to explain the added functionality.

tsalo · 2023-08-23T19:27:14Z

I still need to update a bunch of tests and the reclassification workflow, since many things depend on the old class.

tsalo · 2024-02-16T21:08:49Z

One thing I'm realizing here is that the current approach of passing the selector into decision functions and modifying it within them is unfortunately fragile. Given that the decision functions primarily modify the component table, I think they could be refactored to just accept the comptable and modify that.

handwerkerd · 2024-02-26T17:42:45Z

@tsalo I'm realizing that to specify the external metrics in the decision tree json, like you requested in tsalo#14 it will be much easier if we merge this first. I've identified a few of the breaking test issues and I think I can get this working, but it's going to intersect with a bunch of the recent updates to main. Do you want to align with main or should I just do that and work from there?

handwerkerd · 2024-02-26T17:45:44Z

One thing I'm realizing here is that the current approach of passing the selector into decision functions and modifying it within them is unfortunately fragile. Given that the decision functions primarily modify the component table, I think they could be refactored to just accept the comptable and modify that.

I think some of the selection_util functions just interact with comptable, but the functions in selection_nodes.py interact with multiple fields in selector. Which aspect of fragility is bothering you? Having a better sense of that might help me figure out how to resolve that issue.

handwerkerd · 2024-02-29T15:28:45Z

@eurunuela Do you have time to review this in the near-ish future? Both Taylor & I worked on this, so it could use one more pair of eyes.

eurunuela · 2024-03-01T09:57:48Z

@eurunuela Do you have time to review this in the near-ish future? Both Taylor & I worked on this, so it could use one more pair of eyes.

Absolutely! I'll have a look at it next week.

handwerkerd · 2024-03-12T18:02:00Z

@eurunuela Flagging again in case you have a chance to look this over. The dev call is next week and I'd like to know if there's anything in this PR that needs discussion there.

eurunuela · 2024-03-12T18:08:22Z

Yes, sorry. Will definitely review this week.

eurunuela

LGTM!

handwerkerd · 2024-03-13T14:11:52Z

@tsalo The last round of edits were more me than you. You should probably do the last round of checks to make sure you're happy with this & then merge.

tsalo

Noticed a couple of small typos and one step that seems extraneous.

docs/building_decision_trees.rst

tedana/selection/selection_utils.py

tsalo · 2024-03-13T15:15:12Z

tedana/workflows/tedana.py

+                # Create a re-initialized selector object if rerunning
+                selector = ComponentSelector(tree)


This shouldn't be necessary. If it is, then we should open an issue to fix it in a future PR.

This was necessary because selector.select looks for the last node in the decision tree that was run and then continues to run the rest of the nodes. This is how ica_reclassify works.

This is already happening in main but it is within tedica.automatic_selection

tedana/tedana/selection/tedica.py

Lines 69 to 70 in a21d65e

selector = ComponentSelector(tree, component_table, cross_component_metrics=xcomp)

selector.select()

By removing the component table from the selector initialization and now initializing in tedana.py the fact that this was happening is more up-front.

If the goal is to log all failed attempts, we would probably want to save both the ica mixing matrices and the outputs. This is a bit tricky because we currently save most files at the end and not as the data in those files are generated. We could also add some way to tack on a full tree to the end of the executed tree, but that might just add confusion. Either way, this does feel like a separate PR.

Ah, thank you for explaining. This is different from scikit-learn classes, which have fit, fit_transform, and transform methods that distinguish between fitting the estimator to training data and transforming new data using the parameters from the fitting data. I know we don't need to have one-to-one correspondence, but I guess I didn't intuitively grasp it.

I'd say this is the equivalent of running fit_transform a second time on the same dataset using the slightly different parameters (different seed in our cause). We also have a second use case that's the equivalent of running a second transform on a dataset that's already been transformed. I'm not sure that's something that is or should be time with the methods in scikit-learn

Co-authored-by: Taylor Salo <salot@pennmedicine.upenn.edu>

tedana/selection/selection_utils.py

Co-authored-by: Taylor Salo <salot@pennmedicine.upenn.edu>

tedana/selection/selection_utils.py

tsalo · 2024-03-20T21:33:33Z

Can we merge this?

handwerkerd · 2024-03-20T21:49:18Z

@tsalo You're effectively the second review since I made the last round of changes. If you're happy with this, then merge.

tsalo added 12 commits August 7, 2023 09:38

Get required metrics from decision tree.

d03ea9a

Continue changes.

c7577c7

More updates.

b866c1d

Merge remote-tracking branch 'upstream/main' into gen-req-metrics

f87c97f

Store necessary_metrics as a list.

a1cc401

Update selection_nodes.py

adb83a9

Update selection_utils.py

610ea5a

Update across the package.

10f7b37

Keep updating.

dd9cd25

Update tedana.py

1236509

Merge remote-tracking branch 'upstream/main' into gen-req-metrics

f83b4cd

Add extra metrics to list.

2a7460c

tsalo added the bug issues describing a bug or error found in the project label Aug 15, 2023

Update ica_reclassify.py

dd0f8d7

tsalo added 5 commits February 16, 2024 15:22

Merge remote-tracking branch 'upstream/main' into gen-req-metrics

bf90c80

Run black.

ab967aa

Address style issues.

cc4118e

Try fixing test bugs.

cd38577

Update test_component_selector.py

c4de5be

tsalo added 6 commits February 16, 2024 16:15

Update component_selector.py

79de1f3

Use component table directly in selectcomps2use.

41ab340

Merge remote-tracking branch 'upstream/main' into gen-req-metrics

f72fad7

Fix.

adf6f3a

Include generated metrics in necessary metrics.

11c03fb

Update component_selector.py

9e22791

handwerkerd mentioned this pull request Feb 26, 2024

Added F statistics and options for different types of tests tsalo/tedana#14

Open

4 tasks

handwerkerd marked this pull request as ready for review February 28, 2024 18:56

handwerkerd self-requested a review February 28, 2024 19:00

handwerkerd previously approved these changes Feb 28, 2024

View reviewed changes

handwerkerd requested a review from eurunuela February 28, 2024 19:03

handwerkerd added enhancement issues describing possible enhancements to the project refactoring issues proposing/requesting changes to the code which do not impact behavior labels Feb 28, 2024

tsalo removed the bug issues describing a bug or error found in the project label Feb 28, 2024

tsalo mentioned this pull request Feb 29, 2024

Rename ComponentSelector to DecisionTree #1051

Open

Merge remote-tracking branch 'upstream/main' into gen-req-metrics

49ee821

eurunuela previously approved these changes Mar 13, 2024

View reviewed changes

tsalo commented Mar 13, 2024

View reviewed changes

Update docs/building_decision_trees.rst

8037dbf

Co-authored-by: Taylor Salo <salot@pennmedicine.upenn.edu>

handwerkerd dismissed stale reviews from eurunuela and themself via 8037dbf March 13, 2024 15:28

Update tedana/selection/selection_utils.py

232f8ed

Co-authored-by: Taylor Salo <salot@pennmedicine.upenn.edu>

tsalo commented Mar 13, 2024

View reviewed changes

tedana/selection/selection_utils.py Show resolved Hide resolved

Update tedana/selection/selection_utils.py

b726e3c

Co-authored-by: Taylor Salo <salot@pennmedicine.upenn.edu>

tsalo commented Mar 17, 2024

View reviewed changes

tedana/selection/selection_utils.py Outdated Show resolved Hide resolved

Update tedana/selection/selection_utils.py

72da98e

handwerkerd approved these changes Mar 17, 2024

View reviewed changes

eurunuela approved these changes Mar 20, 2024

View reviewed changes

handwerkerd mentioned this pull request Mar 20, 2024

Generate metrics from external regressors using F stats #1064

Open

1 task

tsalo merged commit 9cbd484 into ME-ICA:main Mar 20, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate metrics based on decision tree #969

Generate metrics based on decision tree #969

tsalo commented Aug 11, 2023 •

edited by handwerkerd

tsalo commented Aug 23, 2023

tsalo commented Feb 16, 2024

handwerkerd commented Feb 26, 2024

handwerkerd commented Feb 26, 2024

handwerkerd commented Feb 29, 2024

eurunuela commented Mar 1, 2024

handwerkerd commented Mar 12, 2024

eurunuela commented Mar 12, 2024

eurunuela left a comment

handwerkerd commented Mar 13, 2024

tsalo left a comment

tsalo Mar 13, 2024

handwerkerd Mar 13, 2024

tsalo Mar 17, 2024

handwerkerd Mar 17, 2024

tsalo commented Mar 20, 2024

handwerkerd commented Mar 20, 2024

		# Create a re-initialized selector object if rerunning
		selector = ComponentSelector(tree)

	selector = ComponentSelector(tree, component_table, cross_component_metrics=xcomp)
	selector.select()

Generate metrics based on decision tree #969

Generate metrics based on decision tree #969

Conversation

tsalo commented Aug 11, 2023 • edited by handwerkerd

tsalo commented Aug 23, 2023

tsalo commented Feb 16, 2024

handwerkerd commented Feb 26, 2024

handwerkerd commented Feb 26, 2024

handwerkerd commented Feb 29, 2024

eurunuela commented Mar 1, 2024

handwerkerd commented Mar 12, 2024

eurunuela commented Mar 12, 2024

eurunuela left a comment

Choose a reason for hiding this comment

handwerkerd commented Mar 13, 2024

tsalo left a comment

Choose a reason for hiding this comment

tsalo Mar 13, 2024

Choose a reason for hiding this comment

handwerkerd Mar 13, 2024

Choose a reason for hiding this comment

tsalo Mar 17, 2024

Choose a reason for hiding this comment

handwerkerd Mar 17, 2024

Choose a reason for hiding this comment

tsalo commented Mar 20, 2024

handwerkerd commented Mar 20, 2024

tsalo commented Aug 11, 2023 •

edited by handwerkerd