Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New module: Sentieon Dedup Metrics #1943

Closed
wants to merge 3 commits into from

Conversation

asp8200
Copy link

@asp8200 asp8200 commented Jun 25, 2023

Adding support for Sentieon Dedup Metrics (corresponding to picard MarkDuplicates)

Issue #1936

As far as I could tell, we can't just call the MarkDuplicates MultiQC-module for the Sentieon-Dedup (as is done in the
biobambam2/bamsormadup MultiQC-module ), since, for instance, parsing of the Sentieon-command is different from the parsing of the MarkDuplicates command.

image

This is how the Sentieon Dedup section looks if MultiQC findes to DedupMetrics-files:
image

The Sentieon Dedup Metrics are displayed just like the corresponding metrics from Picard MarkDuplicates, and the module code was copied and adjusted from the Markduplicates MultiQC-module.

  • This comment contains a description of changes (with reason)

  • CHANGELOG.md has been updated

  • There is example tool output for tools in the https://github.com/ewels/MultiQC_TestData repository : data/modules/sentieon/issue_1180/P150_PE.st.metrics.DeDup.txt

  • Code is tested and works locally (including with --lint flag)

  • docs/README.md is updated with link to below

  • docs/modules/sentieon.md was already present but is now updated.

  • Everything that can be represented with a plot instead of a table is a plot

  • Report sections have a description and help text (with self.add_section)

  • There aren't any huge tables with > 6 columns (explain reasoning if so)

  • Each table column has a different colour scale to its neighbour, which relates to the data (eg. if high numbers are bad, they're red)

  • Module does not do any significant computational work

@asp8200 asp8200 changed the title module for Sentieon Dedup Metrics DRAFT: module for Sentieon Dedup Metrics Jun 25, 2023
@asp8200 asp8200 changed the title DRAFT: module for Sentieon Dedup Metrics MultiQC module for Sentieon Dedup Metrics Jun 25, 2023
Comment on lines +735 to +736
num_lines: 3
shared: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these logs can be shared, then setting num_lines doesn't really make sense - if the output was appended half way down an existing file then it won't be in the first 3 lines.

If Sentieon always creates these files then remove shared and keep num_lines (maybe also for the other blocks above and below?). If a likely use case is that the output from multiple samples / tools could be appended to a single file, remove num_lines

Copy link
Author

@asp8200 asp8200 Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for you prompt reply, @ewels. I probably didn't (don't) understand the purpose of shared and num_lines completely 😬 The new Sentieon-Dedup MultiQC-module only uses the first three lines from the sentieon-dedup-metrics file, that is why I set num_lines to three.

Concerning the option shared, I think I looked at this:

https://github.com/ewels/MultiQC/blob/7129bd10420149c032e380bef57f9c9ca0044d33/docs/core/development/modules.md?plain=1#L381-L382

and I figured if some other MultiQC-module wanted to use the same log-file, then fine. Moreover, the other Sentieon-MultiQC-modules had shared set to true so I also just did that.

On re-reading your explanation and the docs, I get the impression that shared is actually more a way of having MultiQC handle the case where several pipeline-tools write to the same log/metrics/summary-file. If that is indeed the case, then I guess shared should be removed from all the Sentieon MultiQC-modules in the search_patterns.yaml 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ewels : So I should just the field num_lines and shared for sentieon?

@vladsavelyev vladsavelyev changed the title MultiQC module for Sentieon Dedup Metrics New module: Sentieon Dedup Metrics Aug 25, 2023
@vladsavelyev
Copy link
Member

Thank you a lot @asp8200 for raising the issue and for the contribution! We thought more about this as well as other PRs that request adding Sentieon modules (e.g. #1180), and decided to instead refactor the Picard module directly to support Sentieon: #2110

Now one can run MultiQC on Sentieon outputs (like the one you shared) and they will be interpreted as corresponding Picard outputs.

We restructured the test data as well, and whenever we have a test example for Sentieon tool corresponding to an existing Picard tool, we added that example into a subfolder, e.g. sentieon_issue_1180 here: https://github.com/ewels/MultiQC_TestData/tree/master/data/modules/picard/MarkDuplicates

Let me know if that works for you, and if you have other ideas on what Sentieon outputs to add and test 😌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants