Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error parsing parabricks bammetrics output files #2122

Closed
4 tasks done
tamuanand opened this issue Oct 16, 2023 · 3 comments · Fixed by #2127
Closed
4 tasks done

Error parsing parabricks bammetrics output files #2122

tamuanand opened this issue Oct 16, 2023 · 3 comments · Fixed by #2127

Comments

@tamuanand
Copy link

Description of bug

For reference - posted this on MultiQC slack - https://multiqc.slack.com/archives/C04QMP84K5L/p1697356986698759

I had a question on MultiQC parsing Parabricks bammetrics files - https://docs.nvidia.com/clara/parabricks/4.1.0/documentation/tooldocs/man_bammetrics.html

  • This tool applies an accelerated version of the GATK CollectWGSMetrics for assessing coverage and quality of an aligned whole-genome BAM file. This includes metrics such as the fraction of reads that pass the base and mapping quality filters, and the coverage levels (read-depth) across the genome

MultiQC docs seem to indicate that CollectWGSMetrics is one of the stats from picard - https://multiqc.info/modules/picard/

However, in my case with multiqc 1.16, it does not seem to work with Parabricks bammetrics. It does not seem to like something about these files.

File that triggers the error

200K_HA2WPADXX_2.bammetrics.metrics.txt
800K_HA2WPADXX_2.bammetrics.metrics.txt

MultiQC Error log

/// MultiQC 🔍 | v1.16

[2023-10-16 00:59:34] multiqc                                            [DEBUG  ]  This is MultiQC v1.16
[2023-10-16 00:59:34] multiqc                                            [DEBUG  ]  Command used: /usr/local/bin/multiqc . --verbose
[2023-10-16 00:59:35] multiqc                                            [DEBUG  ]  Latest MultiQC version is v1.16
[2023-10-16 00:59:35] multiqc                                            [DEBUG  ]  Working dir : /temp_dir/BAMMETRICS/picard/wgs_metrics
[2023-10-16 00:59:35] multiqc                                            [DEBUG  ]  Template    : default
[2023-10-16 00:59:35] multiqc                                            [DEBUG  ]  Running Python 3.11.5 | packaged by conda-forge | (main, Aug 27 2023, 03:34:09) [GCC 12.3.0]
[2023-10-16 00:59:35] multiqc                                            [DEBUG  ]  Analysing modules: custom_content, ccs, ngsderive, purple, conpair, lima, peddy, somalier, methylQA, mosdepth, phantompeakqualtools, qualimap, preseq, hifiasm, quast, qorts, rna_seqc, rockhopper, rsem, rseqc, busco, bustools, goleft_indexcov, gffcompare, disambiguate, supernova, deeptools, sargasso, verifybamid, mirtrace, happy, mirtop, sambamba, gopeaks, homer, hops, macs2, theta2, snpeff, gatk, htseq, bcftools, featureCounts, fgbio, dragen, dragen_fastqc, dedup, pbmarkdup, damageprofiler, mapdamage, biobambam2, jcvi, mtnucratio, picard, vep, sentieon, bakta, prokka, qc3C, nanostat, samblaster, samtools, sexdeterrmine, eigenstratdatabasetools, bamtools, jellyfish, vcftools, longranger, stacks, varscan2, snippy, umitools, bbmap, bismark, biscuit, diamond, hicexplorer, hicup, hicpro, salmon, kallisto, slamdunk, star, hisat2, tophat, bowtie2, bowtie1, cellranger, snpsplit, odgi, pangolin, nextclade, freyja, humid, kat, leehom, librarian, adapterRemoval, bbduk, clipandmerge, cutadapt, flexbar, sourmash, kaiju, kraken, malt, motus, trimmomatic, sickle, skewer, sortmerna, biobloomtools, fastq_screen, afterqc, fastp, fastqc, filtlong, prinseqplusplus, pychopper, porechop, pycoqc, minionqc, anglerfish, multivcfanalyzer, clusterflow, checkqc, bcl2fastq, bclconvert, interop, ivar, flash, seqyclean, optitype, whatshap
[2023-10-16 00:59:35] multiqc                                            [DEBUG  ]  Using temporary directory for creating report: /tmp/tmp195eg92u
[2023-10-16 00:59:35] multiqc                                            [INFO   ]  Search path : /temp_dir/BAMMMETRICS/picard/wgs_metrics
|         searching | ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 2/2
[2023-10-16 00:59:35] multiqc                                            [DEBUG  ]  Summary of files that were skipped by the search: [skipped_module_specific_max_filesize: 6]
[2023-10-16 00:59:36] multiqc.plots.bargraph                             [DEBUG  ]  Using matplotlib version 3.8.0
[2023-10-16 00:59:36] multiqc.plots.linegraph                            [DEBUG  ]  Using matplotlib version 3.8.0
[2023-10-16 00:59:36] multiqc                                            [DEBUG  ]  No samples found: custom_content
[2023-10-16 00:59:36] multiqc                                            [DEBUG  ]  No samples found: picard
[2023-10-16 00:59:36] multiqc.utils.software_versions                    [DEBUG  ]  Reading software versions from config.software_versions
[2023-10-16 00:59:36] multiqc                                            [WARNING]  No analysis results found. Cleaning up..
[2023-10-16 00:59:36] multiqc                                            [INFO   ]  MultiQC complete

Before submitting

  • I have read the troubleshooting documentation.
  • I am using the latest release of MultiQC.
  • I have included a full MultiQC log, not truncated.
  • I have attached an input file (.zip if necessary) that triggers the error.
@vladsavelyev
Copy link
Member

Thank you for reporting this issue! The format of the parabricks output is slightly different than the original Picard output, so I made some adjustment to the search pattern and the module code to make it work: #2127

It's the same story as with Sentieon that uses Picard internally but with different headers, and there is more work needed to properly make all submodules flexible enough: #2110

@tamuanand
Copy link
Author

Hi @vladsavelyev

Thanks for looking into this. I am providing 2 more example files here in this comment.

  • the 2 files provided earlier were from subsampled reads (200K and 800K) as part of the initial tests with parabricks.

This time around, a 25M WGS was run in two different ways (with/without bwakit postalt) - this is more close to a realtime run.

run2_nopostalt_HA2WPADXX_2.bammetrics_metrics.txt
run1_postalt_HA2WPADXX_2.bammetrics_metrics.txt

Hope this helps.

@tamuanand
Copy link
Author

tamuanand commented Oct 19, 2023

Hi @vladsavelyev and @ewels

Is this now fixed in MultiQC 1.17

Edit: I see it in the release notes in Module Updates https://github.com/ewels/MultiQC/releases/tag/v1.17

Thanks,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants