-
Notifications
You must be signed in to change notification settings - Fork 583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New module: Sentieon Dedup Metrics #1943
Conversation
num_lines: 3 | ||
shared: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these logs can be shared, then setting num_lines
doesn't really make sense - if the output was appended half way down an existing file then it won't be in the first 3 lines.
If Sentieon always creates these files then remove shared
and keep num_lines
(maybe also for the other blocks above and below?). If a likely use case is that the output from multiple samples / tools could be appended to a single file, remove num_lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for you prompt reply, @ewels. I probably didn't (don't) understand the purpose of shared
and num_lines
completely 😬 The new Sentieon-Dedup MultiQC-module only uses the first three lines from the sentieon-dedup-metrics file, that is why I set num_lines
to three.
Concerning the option shared
, I think I looked at this:
and I figured if some other MultiQC-module wanted to use the same log-file, then fine. Moreover, the other Sentieon-MultiQC-modules had shared
set to true
so I also just did that.
On re-reading your explanation and the docs, I get the impression that shared
is actually more a way of having MultiQC handle the case where several pipeline-tools write to the same log/metrics/summary-file. If that is indeed the case, then I guess shared
should be removed from all the Sentieon MultiQC-modules in the search_patterns.yaml
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ewels : So I should just the field num_lines
and shared
for sentieon
?
Thank you a lot @asp8200 for raising the issue and for the contribution! We thought more about this as well as other PRs that request adding Sentieon modules (e.g. #1180), and decided to instead refactor the Picard module directly to support Sentieon: #2110 Now one can run MultiQC on Sentieon outputs (like the one you shared) and they will be interpreted as corresponding Picard outputs. We restructured the test data as well, and whenever we have a test example for Sentieon tool corresponding to an existing Picard tool, we added that example into a subfolder, e.g. Let me know if that works for you, and if you have other ideas on what Sentieon outputs to add and test 😌 |
Adding support for Sentieon Dedup Metrics (corresponding to picard MarkDuplicates)
Issue #1936
As far as I could tell, we can't just call the MarkDuplicates MultiQC-module for the Sentieon-Dedup (as is done in the
biobambam2/bamsormadup MultiQC-module ), since, for instance, parsing of the Sentieon-command is different from the parsing of the MarkDuplicates command.
This is how the Sentieon Dedup section looks if MultiQC findes to DedupMetrics-files:
![image](https://private-user-images.githubusercontent.com/37172585/248550750-7c4b4684-2e21-4ba5-85d3-c0a1bd23e94d.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTgyODY2MzEsIm5iZiI6MTcxODI4NjMzMSwicGF0aCI6Ii8zNzE3MjU4NS8yNDg1NTA3NTAtN2M0YjQ2ODQtMmUyMS00YmE1LTg1ZDMtYzBhMWJkMjNlOTRkLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjEzVDEzNDUzMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI5MjQ5ZjQ5NDFhZmQ4NjVkMDI4ZGZhMjM5NWVmODY2NDU1OGM4ZDFhMzYxYWM1M2EyNWYzYzFiMmRkY2MwMjImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.PuicoPrzcpaYUkvqd7Kly6ohLA4lBby85Hw5BSMHMhI)
The Sentieon Dedup Metrics are displayed just like the corresponding metrics from Picard MarkDuplicates, and the module code was copied and adjusted from the Markduplicates MultiQC-module.
This comment contains a description of changes (with reason)
CHANGELOG.md
has been updatedThere is example tool output for tools in the https://github.com/ewels/MultiQC_TestData repository : data/modules/sentieon/issue_1180/P150_PE.st.metrics.DeDup.txt
Code is tested and works locally (including with
--lint
flag)docs/README.md
is updated with link to belowdocs/modules/sentieon.md
was already present but is now updated.Everything that can be represented with a plot instead of a table is a plot
Report sections have a description and help text (with
self.add_section
)There aren't any huge tables with > 6 columns (explain reasoning if so)
Each table column has a different colour scale to its neighbour, which relates to the data (eg. if high numbers are bad, they're red)
Module does not do any significant computational work