Generate metrics from external regressors using F stats #1064

handwerkerd · 2024-03-20T21:26:58Z

Closes #1009. This is an alternative approach to #1008 and #1021.

If a user provides external regressors, this will calculate a fit to those regressors to include as metrics in the component table and for use in decision trees.

Changes proposed in this pull request:

Create a new module, metrics.external, for calculating metrics from external regressors
- An F statistic is used with a polynomial detrending baseline and p, F, and R2 values are saved as metrics in the component table
- These stats are saved for the Full F model using all regressors as well as partial F models. This makes it possible to both examine when the full model fits, but to also log if it's the motion vs ROI-based vs cardiac/respiration-based regressors that fit to a specific component time series.
- There is a special class of regressors called task_keep If regressors under this label are included, these will be excluded from the full F model and a separate full F model will be calculated using just these regressors. This was a suggestion from @dowdlelt so that it would be possible to identify and conservative retain components that fit to the overall task design.
Add a new parameter to the tedana CLI, --external, to pass in a TSV with external regressors.
Draft decision trees that incorporate external regressor correlation metrics.
- In the decision tree there is a new field external_regressor_config which is a dictionary with the following keys:
  - info A description of what the config does that's saved
  - report A description to add to report.txt
  - calc_stats Currently the only option is "F" but this leaves open possibilities for additional options.
  - detrend: Will automatically calculate the number of polynomial detrending regressors if true, but can also be a number for the users to specify. If this is false, then will be set to 0 (mean removal) and log a warning.
  - f_states_partial_models [optional] A list of titles for the partial models (i.e. ["Motion", "CSF"])
  - If f_states_partial_models has model names, each model name is its own key and contains either a list of column labels or a regular express wildcard (i.e. "Motion": ["^mot_.*$"] means the Motion partial model will include any external regressor column label that begins with mot_ and "Motion": ["mot_x", "mot_y", "mot_z"] specifies 3 specific column label names
  - task_keep [optional] Contents are a regex wildcard or specific names to define task regressors that will not be included in the full model
- demo_minimal_external_regressors_motion_task_models.json uses all the above options, uses the partial models to add a classification_tag, but not change results and retains components that correlate to the task (R2>0.5), have kappa>elbow irregardless of what rho is.
- demo_minimal_external_regressors_single_model.json uses the minimum number of options to run with external regressors.
Includes tests that should cover all new code

To do:

Update documentation to explain all this new functionality and outputs (waiting on this for enough of us to agree that the approach is generally good and we're not making more substantive changes to the structure/parameters/etc

tedana/resources/external_regressor_configs/Mot12_CSF.json

tedana/resources/decision_trees/demo_minimal_external_regressors_motion_task_models.json

tedana/resources/external_regressor_configs/Mot12_CSF.json

tedana/metrics/external.py

tsalo

I went through and added comments. The method looks solid to me. I have a few thoughts on style though:

There's a lot of commented vestigial code.
The docstrings should have type annotations, per Incorporate type hints when possible #704. I would also recommend requiring parameter named in the function calls (i.e., by putting a leading * before the parameters).
Some variable names do not match the rest of the codebase (e.g., you use n_time instead of n_vols).
It would be great if strings and lines in docstrings were broken on punctuation. This will result in cleaner diffs in the future.
We're going to need extensive tests to cover the new code.

tedana/metrics/_utils.py

tedana/metrics/collect.py

tedana/metrics/external.py

tedana/metrics/_utils.py

tedana/metrics/external.py

Co-authored-by: Taylor Salo <salot@pennmedicine.upenn.edu>

handwerkerd · 2024-05-02T21:18:22Z

@tsalo I just added type hints to external.py I'm still a bit of a novice with type hints and it looks like formatting is non-trivially changing across python versions. I tried to follow guidance that's valid for python 3.8 since that's our oldest permitted version. I'll keep working in other files and on your remaining review comments, but let me know if I'm appropriately adding type hints so far.

Update as I was writing this comment: Lots of tests just failed so the answer to that question is"no". It looks numpy v1.24.3 that's currently in my python 3.8 environment includes numpy.typing and not numpy._typing while numpy v1.26.3 in my python 3.11 environment includes numpy._typing and not numpy.typing. I suspect there's a way to set up backwards compatibility for this, but I wanted to share what I'm seeing before I switch to childcare stuff. I figured out the issue and all tests are passing now. It seems like using import numpy.typing as npt works, but directly referencing np.typing does not work across all versions

handwerkerd · 2024-05-03T03:26:48Z

@tsalo Can you explain more what you mean by "It would be great if strings and lines in docstrings were broken on punctuation. This will result in cleaner diffs in the future."

tsalo · 2024-05-03T13:39:05Z

Absolutely. Take a paragraph in the text, like the one below:

This is sentence one. This is
sentence two. This is sentence
three- but it's still going.

Since sentences are a unit within the paragraph, we're more likely to change them than random words.
Plus, since almost everything we write is in markdown or rst, the newlines don't impact the rendered text- just the code.

-This is sentence one. This is
+This is sentence one. This is a 
-sentence two. This is sentence
+new sentence two with other
-three- but it's still going.
+changes. This is sentence three-
+but it's still going.

The diff on that, where the only consideration is line length, is much more extensive (and harder to interpret for a reviewer) than the diff if we broke up the text on punctuation:

 This is sentence one.
-This is sentence two.
+This is a new sentence 
+two with other changes.
 This is sentence three-
 but it's still going.

tedana/selection/selection_utils.py

tsalo and others added 30 commits August 7, 2023 09:38

Get required metrics from decision tree.

d03ea9a

Continue changes.

c7577c7

More updates.

b866c1d

Merge remote-tracking branch 'upstream/main' into gen-req-metrics

f87c97f

Store necessary_metrics as a list.

a1cc401

Update selection_nodes.py

adb83a9

Update selection_utils.py

610ea5a

Update across the package.

10f7b37

Keep updating.

dd9cd25

Update tedana.py

1236509

Merge remote-tracking branch 'upstream/main' into gen-req-metrics

f83b4cd

Add extra metrics to list.

2a7460c

Update ica_reclassify.py

dd0f8d7

Draft metric-based regressor correlations.

9a7afa9

Fix typo.

44cb047

Work on trees.

1b3681b

Expand regular expressions in trees.

4c7a9ce

Fix up the expansion.

4da6730

Really fix it though.

bed401e

Fix style issue.

3a03d54

Added external regress integration test

30da783

Got intregration test with external regressors working

564886b

Added F tests and options

2d3c1da

added corr_no_detrend.json

a574f71

updated names and reporting

7b0b348

Merge remote-tracking branch 'upstream/main' into gen-req-metrics

bf90c80

Run black.

ab967aa

Address style issues.

cc4118e

Try fixing test bugs.

cd38577

Update test_component_selector.py

c4de5be

tsalo reviewed Apr 8, 2024

View reviewed changes

tedana/resources/external_regressor_configs/Mot12_CSF.json Outdated Show resolved Hide resolved

tsalo reviewed Apr 8, 2024

View reviewed changes

tedana/resources/decision_trees/demo_minimal_external_regressors_motion_task_models.json Outdated Show resolved Hide resolved

tsalo reviewed Apr 8, 2024

View reviewed changes

tedana/resources/external_regressor_configs/Mot12_CSF.json Outdated Show resolved Hide resolved

handwerkerd added 3 commits April 9, 2024 11:23

removed mot12_csf.json changed task to signal

4562273

Merge branch 'main' into external-regressors

6789511

fixed tests with task_keep signal

5404d65

tsalo reviewed Apr 10, 2024

View reviewed changes

tedana/metrics/external.py Show resolved Hide resolved

tsalo mentioned this pull request Apr 28, 2024

Using tedana in conjunction with rapidtide #1071

Open

tsalo reviewed May 1, 2024

View reviewed changes

handwerkerd and others added 9 commits May 2, 2024 15:29

Merge branch 'main' into external-regressors

daa172a

Update tedana/metrics/external.py

3ff9c2e

Co-authored-by: Taylor Salo <salot@pennmedicine.upenn.edu>

Update tedana/metrics/_utils.py

f44f385

Co-authored-by: Taylor Salo <salot@pennmedicine.upenn.edu>

Update tedana/metrics/collect.py

79de3d4

Co-authored-by: Taylor Salo <salot@pennmedicine.upenn.edu>

Update tedana/metrics/external.py

fe09e68

Co-authored-by: Taylor Salo <salot@pennmedicine.upenn.edu>

Update tedana/metrics/external.py

3b0a792

Co-authored-by: Taylor Salo <salot@pennmedicine.upenn.edu>

Responding to review comments

7043f10

reworded docstring

1ba5f27

Added type hints to external.py

33569ab

handwerkerd added 2 commits May 2, 2024 22:22

fixed external.py type hints

e81017a

type hints to _utils collect and component_selector

8625c9c

handwerkerd commented May 3, 2024

View reviewed changes

tedana/selection/selection_utils.py Outdated Show resolved Hide resolved

type hints and doc improvements in selection_utils

15e543f

handwerkerd commented May 3, 2024

View reviewed changes

tedana/selection/selection_utils.py Outdated Show resolved Hide resolved

handwerkerd commented May 3, 2024

View reviewed changes

tedana/selection/selection_utils.py Outdated Show resolved Hide resolved

handwerkerd added 3 commits May 6, 2024 12:07

no expand_node recursion

c26b86d

removed expand_nodes expand_node expand_dict

3ffc62d

Merge branch 'main' into external-regressors

3c407a9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate metrics from external regressors using F stats #1064

Generate metrics from external regressors using F stats #1064

handwerkerd commented Mar 20, 2024 •

edited

tsalo left a comment

handwerkerd commented May 2, 2024 •

edited

handwerkerd commented May 3, 2024

tsalo commented May 3, 2024

Generate metrics from external regressors using F stats #1064

Are you sure you want to change the base?

Generate metrics from external regressors using F stats #1064

Conversation

handwerkerd commented Mar 20, 2024 • edited

tsalo left a comment

Choose a reason for hiding this comment

handwerkerd commented May 2, 2024 • edited

handwerkerd commented May 3, 2024

tsalo commented May 3, 2024

handwerkerd commented Mar 20, 2024 •

edited

handwerkerd commented May 2, 2024 •

edited