Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New module: Truvari #1751

Merged
merged 21 commits into from
Oct 12, 2023
Merged

New module: Truvari #1751

merged 21 commits into from
Oct 12, 2023

Conversation

pontushojer
Copy link
Contributor

@pontushojer pontushojer commented Aug 30, 2022

This PR adds a new module for the tool Truvari (https://github.com/ACEnglish/truvari). Specifically the output from the command truvari bench is supported. Below is a screenshot of the results when run on the test data.

image

PR for test data have been submitted here (MultiQC/test-data#241)

  • This comment contains a description of changes (with reason)
  • CHANGELOG.md has been updated
  • There is example tool output for tools in the https://github.com/ewels/MultiQC_TestData repository or attached to this PR
  • Code is tested and works locally (including with --lint flag)
  • docs/README.md is updated with link to below
  • docs/modulename.md is created
  • Everything that can be represented with a plot instead of a table is a plot
  • Report sections have a description and help text (with self.add_section)
  • There aren't any huge tables with > 6 columns (explain reasoning if so)
  • Each table column has a different colour scale to its neighbour, which relates to the data (eg. if high numbers are bad, they're red)
  • Module does not do any significant computational work

@vladsavelyev
Copy link
Member

@multiqc-bot changelog

@vladsavelyev vladsavelyev added module: new waiting: response Waiting for more information from user labels Sep 27, 2023
@vladsavelyev vladsavelyev added awaits-review Awaiting final review and merge. and removed waiting: response Waiting for more information from user labels Sep 27, 2023
Copy link
Member

@vladsavelyev vladsavelyev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, looks perfect to me now!

@ewels ewels added this to the MultiQC v1.17 milestone Oct 8, 2023
@ewels
Copy link
Member

ewels commented Oct 10, 2023

Thanks @pontushojer! Just had a look at the report, quick naive question:

  • From the header tooltips, it kind of sounds like TP (base) + FP = Base calls? But that's not the case. Is that expected?
    • Number of the base VCF calls matching the comp VCF + Number of the comp VCF calls non-matching the base VCF = Number of calls in the base VCF
  • If those numbers are meant to sum (😅) would a stacked bar plot work / be better?

Copy link
Member

@ewels ewels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! A few minor comments but generally the code looks great 👍🏻

It would be nice to add some helptext strings above the sections, explaining what the numbers mean and how to interpret them. It's ok to copy this from the source tool docs, if available.

multiqc/modules/truvari/bench.py Outdated Show resolved Hide resolved
multiqc/modules/truvari/bench.py Outdated Show resolved Hide resolved
multiqc/modules/truvari/bench.py Outdated Show resolved Hide resolved
multiqc/modules/truvari/bench.py Outdated Show resolved Hide resolved
multiqc/modules/truvari/bench.py Outdated Show resolved Hide resolved
multiqc/modules/truvari/bench.py Outdated Show resolved Hide resolved
multiqc/modules/truvari/bench.py Outdated Show resolved Hide resolved
@pontushojer
Copy link
Contributor Author

Thanks @pontushojer! Just had a look at the report, quick naive question:

  • From the header tooltips, it kind of sounds like TP (base) + FP = Base calls? But that's not the case. Is that expected?

    • Number of the base VCF calls matching the comp VCF + Number of the comp VCF calls non-matching the base VCF = Number of calls in the base VCF
  • If those numbers are meant to sum (😅) would a stacked bar plot work / be better?

Yeah, its actually TP (base) + FN = Base calls and similarly TP (comp) + FP = Comp calls. I have not update the descriptions and added helptext to clarify this.

The bar plot was a nice suggestion so I also added this. This allows one to see a breakdown of classification for call from the perspective of Comp (the comparison VCF)

image

and perspective of Base (the baseline "truth" VCF).

image

I also updated several other things since there were slight changes to the truvari output from version v4.0.0. Some statistics had been renamed and version parsing also needed an update. I added a new log for this version in this PR MultiQC/test-data#284

@vladsavelyev
Copy link
Member

The bar plot looks great!

Though it looks like it now fully replicates the information shown in the table. And perhaps with another tab for the GT-matched version of the statistics, it can also cover the hidden columns as well, so the table could be dropped in favour of the bar plot? 🤔

@pontushojer
Copy link
Contributor Author

The bar plot looks great!

Though it looks like it now fully replicates the information shown in the table. And perhaps with another tab for the GT-matched version of the statistics, it can also cover the hidden columns as well, so the table could be dropped in favour of the bar plot? 🤔

Yes, I was actually starting to think the same when I implemented this. Now I have added the genotype data to the bar graph as well. I also moved the "GT concordance" stat to the general stats table. Then I remove the table as its all all represented in the other plots/tables.

Here is the current report sections

General stats
"GT concordance" is hidden by default.
image

Bargraph
image

Scatter plot
I remove the colouring per sample
image

@vladsavelyev
Copy link
Member

@pontushojer, it looks great and ready to merge!

I actually agree on your point that the scatter plot looks a bit less helpful in a static version without some way to tell samples apart. Though colors probably won't help much in this case, we'd need to show labels. But that's for the core plotting code to implement, the modules shouldn't worry about this, in my opinion. Something to take a look. I created an issue for that: #2111

@vladsavelyev vladsavelyev merged commit 961bc62 into MultiQC:master Oct 12, 2023
4 checks passed
@pontushojer pontushojer deleted the module-truvari branch December 1, 2023 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaits-review Awaiting final review and merge. module: new
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants