Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Sentieon dedup report (similar to Picard MarkDuplicates) #2144

Closed
1 task done
mathiasbio opened this issue Oct 20, 2023 · 3 comments
Closed
1 task done

Comments

@mathiasbio
Copy link

Name of the tool

Sentieon Dedup

Tool homepage

https://support.sentieon.com/manual/usages/general/#dedup-algorithm

Tool description

Works similar to Picard MarkDuplicates, marking or removing read / optical duplicates.

Tool output

normal.dedup.metrics.tsv.zip

If I add:

## METRICS CLASS picard.sam.DuplicationMetrics

Immediately above this line in the report:

LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED SECONDARY_OR_SUPPLEMENTARY_RDS UNMAPPED_READS UNPAIRED_READ_DUPLICATES READ_PAIR_DUPLICATES READ_PAIR_OPTICAL_DUPLICATES PERCENT_DUPLICATION ESTIMATED_LIBRARY_SIZE

Then multiqc detects and adds it to the json report, however it would be nice to avoid adding the wrong tool name to the report for traceability and reproducibility reasons.

Log filename pattern

No response

Data suitable for MultiQC plot(s)

As it contains the same summary-stats as picard markduplicates it would be nice of the report from this tool could mirror those from picard.

Currently when adding the line ## METRICS CLASS picard.sam.DuplicationMetrics I get this in the multiqc_picard_dups.json report. If this could be outputted from the Sentieon Dedup report as well, that would be great!

{
"normal": {
"LIBRARY": "Unknown Library",
"UNPAIRED_READS_EXAMINED": 400299.0,
"READ_PAIRS_EXAMINED": 440331782.0,
"SECONDARY_OR_SUPPLEMENTARY_RDS": 3872761.0,
"UNMAPPED_READS": 1264137.0,
"UNPAIRED_READ_DUPLICATES": 146877.0,
"READ_PAIR_DUPLICATES": 39145351.0,
"READ_PAIR_OPTICAL_DUPLICATES": 4048209.0,
"PERCENT_DUPLICATION": 0.089026,
"ESTIMATED_LIBRARY_SIZE": 2564198488.0
},
"tumor": {
"LIBRARY": "Unknown Library",
"UNPAIRED_READS_EXAMINED": 193170.0,
"READ_PAIRS_EXAMINED": 325237198.0,
"SECONDARY_OR_SUPPLEMENTARY_RDS": 3163125.0,
"UNMAPPED_READS": 747098.0,
"UNPAIRED_READ_DUPLICATES": 71991.0,
"READ_PAIR_DUPLICATES": 39186710.0,
"READ_PAIR_OPTICAL_DUPLICATES": 3746726.0,
"PERCENT_DUPLICATION": 0.120561,
"ESTIMATED_LIBRARY_SIZE": 1348927846.0
}
}

Most interesting data for the General Stats table

READ_PAIR_DUPLICATES (perhaps represented as % instead)
READ_PAIR_OPTICAL_DUPLICATES (perhaps represented as % instead)
PERCENT_DUPLICATION (primarily this, as for mark duplicates)

Before submitting

  • I have included example data (zipped, not pasted) that can be used to write the module.
@vladsavelyev
Copy link
Member

vladsavelyev commented Oct 22, 2023

Thank you for creating the issue, and attaching an example output!

This is to be resolved with this PR, once merged :) #2110

@mathiasbio
Copy link
Author

Oh great! Thanks! I'll keep an eye open for the new release : )

@vladsavelyev
Copy link
Member

Alright, so this is addressed this by supporting Sentieon by the Picard module directly (so we can easier expand it to other Sentieon QC tools matching corresponding Picard tools): #2110. Hope that works!

A new MultiQC version will be released on Friday :)

@vladsavelyev vladsavelyev added this to the MultiQC v1.18 milestone Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants