Document decision tree steps in report #959

tsalo · 2023-08-03T16:27:58Z

Closes None.

Changes proposed in this pull request:

Start adding comments explaining each of the steps in the minimal decision tree. There are still some things I don't understand.
- I'd like to add more layperson's descriptions in the _comment fields.

tedana/resources/decision_trees/minimal.json

tsalo · 2023-08-03T18:01:45Z

@handwerkerd do you know why the paths to the artifacts in the CircleCI config were wrong? Is it just that we changed the actual output folders in the DTM PR and forgot to update the artifact paths, or did we choose to stop uploading artifacts because of free plan limits or something similar?

tsalo · 2023-08-04T14:39:35Z

I didn't get through all of the Kundu decision tree steps, because some are quite complicated, so the report will be incomplete for that one.

handwerkerd · 2023-08-09T21:05:07Z

Is it just that we changed the actual output folders in the DTM PR and forgot to update the artifact paths

Looking at your change, I think that's what happened. I changed the location of the outputs for testing, but didn't realize I should have also made this change.

codecov · 2023-08-11T18:03:52Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (b139386) 89.54% compared to head (3875ea7) 89.37%.
Report is 11 commits behind head on main.

❗ Current head 3875ea7 differs from pull request most recent head 2e4f290. Consider uploading reports for the commit 2e4f290 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #959      +/-   ##
==========================================
- Coverage   89.54%   89.37%   -0.18%     
==========================================
  Files          26       26              
  Lines        3396     3397       +1     
  Branches      619      619              
==========================================
- Hits         3041     3036       -5     
- Misses        207      211       +4     
- Partials      148      150       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tsalo · 2023-08-12T14:42:38Z

I think I've successfully documented the remaining nodes of the kundu tree.

eurunuela

There are a couple of comments that I find impossible to understand. Could you please make them more understandable to the regular user/contributor? Thanks!

tedana/resources/decision_trees/kundu.json

eurunuela · 2023-08-24T08:49:14Z

tedana/resources/decision_trees/kundu.json

-                "tag_if_true": "Unlikely BOLD"
+                "tag_if_true": "Unlikely BOLD",
+                "log_extra_info": "",
+                "log_extra_report": "Any components with negative t-statistic comparing distribution of T2* model F-statistics from voxels in clusters to those of voxels not in clusters and variance explained greater than median are rejected."


Also not easy to understand.

What about this?

Suggested change

"log_extra_report": "Any components with negative t-statistic comparing distribution of T2* model F-statistics from voxels in clusters to those of voxels not in clusters and variance explained greater than median are rejected."

"log_extra_report": "Any components with (1) greater than median variance explained and (2) a t-statistic less than zero, determined from a t-test of T2* model F-statistics from voxels in clusters (i.e., "signal" voxels) to voxels not in clusters (i.e., "noise" voxels), are rejected."

eurunuela · 2023-08-24T08:52:56Z

tedana/resources/decision_trees/kundu.json

+                "log_extra_info": "",
+                "log_extra_report": "If any components are still labeled as unclassified or unclassified high-variance, then a revised decision table score is calculated from the provisionally accepted, unclassified, and unclassified high-variance components."
+            },
+            "_comment": "Calculate a new d_table_score. Why can't we use the original d_table_score function? Because it's a metric function?"


Is this a comment for us?

Yeah the _comments are, I believe, for developers.

log_extra_report is for text that will be outputted in the reports. _comment lives just in this file and can be used to provide additional information to anyone who decides to read the json file. In this case, the _comment should be changed.

Suggested change

"_comment": "Calculate a new d_table_score. Why can't we use the original d_table_score function? Because it's a metric function?"

"_comment": "Calculate a new d_table_score. 5 metric are sorted and ranked across components and this is the average. A new score is calculated on a subset of components because the rankings will change depending on the removed components"

eurunuela · 2023-08-28T15:46:02Z

@tsalo I've been told there's a typo in the docs:

---manrej should be component indices to reject instead of to accept.

Could you please fix it in this PR? Just asking because this is docs-related.

tsalo · 2023-09-08T15:11:10Z

Currently, that text says "This is based on the criteria of the MEICA v2.5 decision tree" and someone can look at several places to see what happens in that decision tree.

I don't think they can. Previous papers that I've read that describe the decision tree don't cover everything in sufficient detail. The main paper we cite for the v2.5 decision tree is Kundu et al. (2013), but that paper (and its supplement) seems to only describe Kappa- and Rho-based decisions. Olafsson et al. (2015) describes each of the steps in the decision tree in some detail, but it's completely missing the additional constraint of greater-than-median variance explained in several of the nodes.

Plus, that only applies to the v2.5 decision tree. The minimal decision tree doesn't have a paper that describes each step AFAIK, nor would custom decision trees.

Do we want a line of text in the report explain what happens in every node?

I think that is the best way to handle it.

If we're expecting people to copy the report into manuscripts (with light editing), then that's too much detail.

I think all of this information is necessary for manuscripts, but I can also see people moving some contents into their supplementary materials. I've seen folks do this with fMRIPrep methods sections as well.

handwerkerd · 2023-09-08T18:32:43Z

When I wrote "someone can look at several places to see what happens in that decision tree." I meant that there are several places in the existing outputs to understand what happens, but I agree these aren't detailed in any previous publication.

I'm going to disagree that all this information should be included in a manuscript. I've included the report output that would be generated by this PR for the kundu tree below. Not only is this long, but it's confusing and a poor way for anyone to understand what is happening in this process. We can edit various sentences, but I don't think anyone who reads any block of text like that would leave with a solid understanding of the process. If I reviewed a manuscript with a paragraph like that, I'd ask for revisions. The flow chart is a much better way of sharing and understanding this process.

This is an example of the limits of manuscripts as a format and the original purpose for things like BIDS. If a manuscript includes details of every step to this level of detail, it would very long and unreadable. For example, there are many critical MRI acquisition parameters that are never reported in manuscripts, but I worked with the original BIDS group to make sure most of this acquisition parameter list was in the BIDS jsons. I think we should take a similar approach here. We should have very clear output files that make it easy for people to know what happened and should expect those files to be shared with data.

Right now the relevant data is in several places. (1) The flow chart (2) The info file tedana_[date].tsv as descriptors of each step (3) desc-ICA_decision_tree.json has additional descriptors.

Additional things we can do would be to generate a run-specific flow chart (#888) or just pull the info lines relating to the decision tree into a stand-alone file. I think much of the added text in this PR is useful for helping someone understand the process, but I just don't see it as useful text to include in a manuscript.

Here's what the report text currently looks like:

TE-dependence analysis was performed on input data using the tedana workflow \citep{dupre2021te}. An initial mask was generated from the first echo using nilearn's compute_epi_mask function. An adaptive mask was then generated, in which each voxel's value reflects the number of echoes with 'good' data. A two-stage masking procedure was applied, in which a liberal mask (including voxels with good data in at least the first echo) was used for optimal combination, T2*/S0 estimation, and denoising, while a more conservative mask (restricted to voxels with good data in at least the first three echoes) was used for the component classification procedure. A monoexponential model was fit to the data at each voxel using log-linear regression in order to estimate T2* and S0 maps. For each voxel, the value from the adaptive mask was used to determine which echoes would be used to estimate T2* and S0. Multi-echo data were then optimally combined using the T2* combination method \citep{posse1999enhancement}. Global signal regression was applied to the multi-echo and optimally combined datasets.

The following metrics were calculated: kappa, rho, countnoise, countsigFT2, countsigFS0, dice_FT2, dice_FS0, signal-noise_t, variance explained, normalized variance explained, d_table_score. Kappa (kappa) and Rho (rho) were calculated as measures of TE-dependence and TE-independence, respectively. A t-test was performed between the distributions of T2*-model F-statistics associated with clusters (i.e., signal) and non-cluster voxels (i.e., noise) to generate a t-statistic (metric signal-noise_z) and p-value (metric signal-noise_p) measuring relative association of the component to signal over noise. The number of significant voxels not from clusters was calculated for each component.

Next, component selection was performed to identify BOLD (TE-dependent) and non-BOLD (TE-independent) components using a decision tree. This is based on the criteria of the MEICA v2.5 decision tree \citep{kundu2013integrated}. For a detailed description of the decision tree steps, with the rationale for each step, see \citet{olafsson2015enhanced}. All components are initially labeled as 'unclassified'. Any components with rho greater than kappa are rejected. Any components with more voxels that are significant based on the S0 model's F-statistics than the T2* model's are rejected, as long as there is at least one significant voxel for the T2 model. The median variance explained is calculated across all components, for later steps. Any components with higher S0 model beta map-F-statistic map Dice similarity index than T2 model beta map-F-statistic map Dice similarity index and greater than median variance explained are rejected. Any components with negative t-statistic comparing distribution of T2* model F-statistics from voxels in clusters to those of voxels not in clusters and variance explained greater than median are rejected. The kappa elbow is calculated from all components, for later steps. Unclassified components exhibiting a large step down in variance explained are classified as 'unclassified high-variance'. The 'kundu' rho elbow is calculated from all and unclassified components, for later steps. Any unclassified components with kappa greater than or equal to the kappa elbow are provisionally accepted. Any provisionally accepted components with rho greater than the rho elbow are reset to 'unclassified'. The variance explained upper threshold is calculated as the 90th percentile of variance explained from provisionally accepted components. The variance explained lower threshold is calculated as the 25th percentile of variance explained from provisionally accepted components. The 'extend factor' is determined based on the number of volumes in the run. The 'max_good_meanmetricrank' is calculated as the number of provisionally accepted components times the extend factor. The variance explained-kappa ratio is calculated from the provisionally accepted components, as the maximum kappa minus the minimum kappa, divided by the maximum variance explained minus the minimum variance explained. Any provisionally accepted, unclassified, or unclassified high-variance components with a decision table score greater than 'max_good_meanmetricrank' and variance explained greater than the variance explained upper threshold multiplied by the extend factor are rejected. Any provisionally accepted, unclassified, or unclassified high-variance components with a decision table score greater than 'max_good_meanmetricrank', variance explained less than or equal to the variance explained lower threshold, and kappa less than or equal to the kappa elbow will be accepted and labeled as 'low variance'. If no components are still labeled as unclassified or unclassified high-variance, then all remaining provisionally accepted components are accepted. If any components are still labeled as unclassified or unclassified high-variance, then a revised decision table score is calculated from the provisionally accepted, unclassified, and unclassified high-variance components. Any provisionally accepted, unclassified, or unclassified high-variance components with a revised decision tree score greater than the 'conservative_guess', variance explained-kappa ratio greater than the extend factor times two, and variance explained greater than the variance explained upper threshold times the extend factor are rejected. Any provisionally accepted, unclassified, or unclassified high-variance components with a revised decision table score greater than 'num_acc_guess' times 0.9 and variance explained greater than variance explained lower threshold times the extend factor are rejected. An updated variance explained lower threshold (25th percentile) is calculated from the 'num_acc_guess' highest variance explained components among the remaining provisionally accepted, unclassified, and unclassified high-variance components. Any provisionally accepted, unclassified, or unclassified high-variance components with a revised decision table score greater than 'num_acc_guess' and variance explained greater than the new variance explained lower threshold are accepted and labeled as 'borderline'. Any provisionally accepted, unclassified, or unclassified high-variance components with kappa less than or equal to the kappa elbow and variance explained greater than the new variance explained lower threshold are accepted and labeled as 'borderline'. All remaining unclassified, unclassified high-variance, or provisionally accepted components are accepted. Minimum image regression was then applied to the data in order to remove spatially diffuse noise \citep{kundu2013integrated}.

This workflow used numpy \citep{van2011numpy}, scipy \citep{virtanen2020scipy}, pandas \citep{mckinney2010data,reback2020pandas}, scikit-learn \citep{pedregosa2011scikit}, nilearn, bokeh \citep{bokehmanual}, matplotlib \citep{Hunter:2007}, and nibabel \citep{brett_matthew_2019_3233118}. This workflow also used the Dice similarity index \citep{dice1945measures,sorensen1948method}.

handwerkerd · 2023-10-12T15:36:41Z

The plan for this was to create a DOI for the decision trees so that the json file could reference the tree and include that in the report. In the last meeting we suggested doing this through zenodo or figshare, but I realized we could also get DOIs from osf.io and that's already where tedana is sharing other data. My plan would be to create a new project tedana decision trees or put the files into an existing project like https://osf.io/bpe8h/ and create a DOI. Then the reference would say ...used the kundu decision tree specified in doi:xyz Sounds good to others?

That said, one issue I have is that I'd like to store the json file and the flow chart visualization at the DOI link. The trouble is that I'd need to create a DOI and then upload the json file with the newly created DOI into the project. I'm not sure if that newly created DOI would then reference a version of the project before the json was uploaded. Anyone have experience with this?

Update: I'm realizing that the DOI would go into references.bib so I just need to include a stable citep label in the decision tree json file. This means we'll need to keep a running list of unique labels in references.bib might might get annoying if we end up creating a lot of tree variations, but this should otherwise be fine.

Update 2: https://help.osf.io/article/220-create-dois "DOIs point to the current version of the project or registration. OSF does not support DOI versioning at this time." Since we specifically want to generate updated DOIs with content updates, we cannot use OSF. I'm fairly sure both zenodo and figshare will let me create one project with multiple files where I can get a new doi for each version. Anyone of preferences for figshare vs zenodo?

tsalo · 2023-10-18T19:14:35Z

I feel like FigShare might be a better fit for this, but I don't have a strong preference.

eurunuela · 2023-10-19T09:02:31Z

I have never used FigShare so I have no opinion on this.

handwerkerd · 2023-10-30T16:56:56Z

@tsalo Could you try to merge Main into this PR? I tried to do it on my branch and I'm running into problems. If you get it working, that might make it easier for me to figure out what's going wrong on my end. See: #970 (comment)

tsalo · 2023-10-31T19:09:49Z

@handwerkerd I updated the branch yesterday, but forgot to comment to that effect until now. Sorry about that!

handwerkerd · 2024-01-18T03:04:25Z

Planned changes from past developers calls:
Work in progress: https://github.com/handwerkerd/tedana/tree/doc-tree-dh

Put more text in _comments rather than making a hyper-detailed report
The report log will link to a DOI where each version of the tree is detailed as a json file and a flow chart
Use figshare for the DOI since it allows different DOIs for multiple versions of an object
Remove log_extra_report fields from the json files but save removing the code that interacts with that field for a separate PR

Making updates we discussed

Aligned with main

handwerkerd · 2024-02-20T17:11:03Z

Not sure if anyone but @tsalo can see this yet, but I uploaded the decision trees to figureshare in a new tedana project and created: 10.6084/m9.figshare.25251433 The updated data are at: https://figshare.com/account/articles/25251433

This is still private because I think there are limits to what I can change once I publish, but I can probably give individuals access to let me know if it includes what you think it should include.

tsalo · 2024-02-20T17:35:45Z

The current approach of PRs to my branch are confusing me, so I've changed the target branch to a new one under ME-ICA (doc-tree). @handwerkerd you can merge into the new branch if you'd like and just modify it directly.

handwerkerd · 2024-02-21T02:25:01Z

This is either my git skills or a permissions issue, but I'm not sure how I can push directly to https://github.com/tsalo/tedana/tree/doc-tree. My current updates are now in https://github.com/handwerkerd/tedana/tree/doc-tree
I think this is very close to ready to review/merge. I've fully removed log_extra_report from the code. I needed to change https://osf.io/f6g45 (data for testing reclassify) to remove that key so that tests would pass, but the change was otherwise fairly easy.

I also fixed the five-echo test failure above, but there's a new error with five-echo and I'm having trouble solving. Five-echo is the one that uses the minimal decision tree with this line for reports:
https://github.com/handwerkerd/tedana/blob/09b9e2ef7f79262fad9ab7a32207012b51304ad6/tedana/resources/decision_trees/minimal.json#L4
For some reason, it's now unhappy with the comma separate citations \\citep{kundu2013integrated, dupre2021te} Not quite sure why this is happening now.

tsalo · 2024-02-21T03:00:57Z

@handwerkerd I have "allow edits by maintainers" selected, so you should be able to push to my branch if you add it as a remote, but I was originally proposing that you merge this PR into the ME-ICA/tedana@doc-tree branch (the new target branch for this PR) and make your changes there.

tsalo added 3 commits August 3, 2023 11:24

Add comments to the minimal decision tree.

b50cb4e

Improve comments.

4d0a141

Move from comments to log_extra_report.

eb4d341

tsalo added the refactoring issues proposing/requesting changes to the code which do not impact behavior label Aug 3, 2023

tsalo added 2 commits August 3, 2023 12:58

Add comments and fix one description.

d5c5b38

Update minimal.json

61476e8

tsalo commented Aug 3, 2023

View reviewed changes

tsalo added 2 commits August 3, 2023 13:15

Retain artifacts.

05d70f7

Store correct artifact path.

b7886fe

tsalo added 7 commits August 3, 2023 15:09

Test minimal decision tree.

62dc400

Add newlines.

7c7517f

Update tedana.py

d35cfe6

Update tedana.py

daae14d

Keep documenting the trees.

cb55ddd

Update minimal.json

9e41d92

Fix decision tree description.

3d9f9fc

Merge remote-tracking branch 'upstream/main' into doc-tree

b10b70a

Merge remote-tracking branch 'upstream/main' into doc-tree

a500b05

tsalo changed the title ~~Add comments to minimal decision tree~~ Document decision tree steps in report Aug 12, 2023

Document the remaining nodes in the kundu tree.

f92d600

tsalo requested a review from handwerkerd August 14, 2023 13:31

tsalo requested a review from eurunuela August 23, 2023 19:25

eurunuela requested changes Aug 24, 2023

View reviewed changes

Fix manrej typo.

9b391e3

Cite Olafsson paper for decision tree nodes.

044e223

handwerkerd mentioned this pull request Sep 19, 2023

September 2023 Developers Call #978

Closed

Merge branch 'ME-ICA:main' into doc-tree-dh

3ce5aca

updated minimal.json and started references.bib

5d0947d

handwerkerd mentioned this pull request Oct 20, 2023

Making updates we discussed tsalo/tedana#12

Merged

7 tasks

Merge 'upstream/main' into doc-tree-dh

b9aaa5d

Merge branch 'main' into doc-tree

3875ea7

handwerkerd added 2 commits October 31, 2023 21:16

aligning to main again

7a6895c

test-component_selector

7cee859

tsalo and others added 4 commits February 3, 2024 11:16

Merge branch 'doc-tree' into doc-tree-dh

262f62b

Merge pull request #12 from handwerkerd/doc-tree-dh

2e4f290

Making updates we discussed

Merge remote-tracking branch 'upstream/main' into doc-tree-dh

ded2abe

Merge pull request #15 from handwerkerd/doc-tree-dh

8d4195b

Aligned with main

handwerkerd mentioned this pull request Feb 20, 2024

fixing config and style issues tsalo/tedana#16

Closed

Merge remote-tracking branch 'upstream/main' into doc-tree

fb4c290

tsalo changed the base branch from main to doc-tree February 20, 2024 17:33

handwerkerd merged commit fb4c290 into ME-ICA:doc-tree Feb 21, 2024
12 of 13 checks passed

handwerkerd mentioned this pull request Feb 21, 2024

Document decision tree steps in report and remove log_extra_report #1043

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document decision tree steps in report #959

Document decision tree steps in report #959

tsalo commented Aug 3, 2023

tsalo commented Aug 3, 2023

tsalo commented Aug 4, 2023

handwerkerd commented Aug 9, 2023

codecov bot commented Aug 11, 2023 •

edited

Loading

tsalo commented Aug 12, 2023

eurunuela left a comment

eurunuela Aug 24, 2023

tsalo Aug 24, 2023 •

edited

Loading

eurunuela Aug 24, 2023

tsalo Aug 24, 2023

handwerkerd Aug 29, 2023

eurunuela commented Aug 28, 2023

tsalo commented Sep 8, 2023

handwerkerd commented Sep 8, 2023

handwerkerd commented Oct 12, 2023 •

edited

Loading

tsalo commented Oct 18, 2023

eurunuela commented Oct 19, 2023

handwerkerd commented Oct 30, 2023

tsalo commented Oct 31, 2023

handwerkerd commented Jan 18, 2024

handwerkerd commented Feb 20, 2024 •

edited

Loading

tsalo commented Feb 20, 2024

handwerkerd commented Feb 21, 2024

tsalo commented Feb 21, 2024

	"log_extra_report": "Any components with negative t-statistic comparing distribution of T2* model F-statistics from voxels in clusters to those of voxels not in clusters and variance explained greater than median are rejected."
	"log_extra_report": "Any components with (1) greater than median variance explained and (2) a t-statistic less than zero, determined from a t-test of T2* model F-statistics from voxels in clusters (i.e., "signal" voxels) to voxels not in clusters (i.e., "noise" voxels), are rejected."

	"_comment": "Calculate a new d_table_score. Why can't we use the original d_table_score function? Because it's a metric function?"
	"_comment": "Calculate a new d_table_score. 5 metric are sorted and ranked across components and this is the average. A new score is calculated on a subset of components because the rankings will change depending on the removed components"

Document decision tree steps in report #959

Document decision tree steps in report #959

Conversation

tsalo commented Aug 3, 2023

tsalo commented Aug 3, 2023

tsalo commented Aug 4, 2023

handwerkerd commented Aug 9, 2023

codecov bot commented Aug 11, 2023 • edited Loading

Codecov Report

tsalo commented Aug 12, 2023

eurunuela left a comment

Choose a reason for hiding this comment

eurunuela Aug 24, 2023

Choose a reason for hiding this comment

tsalo Aug 24, 2023 • edited Loading

Choose a reason for hiding this comment

eurunuela Aug 24, 2023

Choose a reason for hiding this comment

tsalo Aug 24, 2023

Choose a reason for hiding this comment

handwerkerd Aug 29, 2023

Choose a reason for hiding this comment

eurunuela commented Aug 28, 2023

tsalo commented Sep 8, 2023

handwerkerd commented Sep 8, 2023

handwerkerd commented Oct 12, 2023 • edited Loading

tsalo commented Oct 18, 2023

eurunuela commented Oct 19, 2023

handwerkerd commented Oct 30, 2023

tsalo commented Oct 31, 2023

handwerkerd commented Jan 18, 2024

handwerkerd commented Feb 20, 2024 • edited Loading

tsalo commented Feb 20, 2024

handwerkerd commented Feb 21, 2024

tsalo commented Feb 21, 2024

codecov bot commented Aug 11, 2023 •

edited

Loading

tsalo Aug 24, 2023 •

edited

Loading

handwerkerd commented Oct 12, 2023 •

edited

Loading

handwerkerd commented Feb 20, 2024 •

edited

Loading