-
Notifications
You must be signed in to change notification settings - Fork 583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NanoStats.txt generated by NanoPlot Not Recognized #1995
Comments
Hi @joeellis1331, thanks for reporting the bug. It looks like NanoPlot no longer uses NanoStat directly to write the text report, and NanoStat seems to be discontinued). NanoPlot now is calling the nanomath library directly. It's also mentioned that there is a Rust replacement for NanoStat, called Cramino, though it seems to output a different format, as far as I can see. As a temporary solution, it looks like nanomath can produce the legacy report if you run NanoPlot with the Longer term, I'll look into implementing support for the new format. There was a suggestion to generalise this module to support different tools from the Nano* familiy, as long as they use the same |
I think the route forward would be to create a new MultiQC module called Happy to do that, however, I'm missing more test data - currently, the NanoStat test data has a bunch of output of different types, and for the new format, we only have one file (kudos to you for providing it). Would you be able to generate more test data of different types by any chance, to have a similar structure we have for NanoStat in the MuiltiQC_TestData repo? |
Hi @vladsavelyev, (apologies for messing with the closed status, my bad) It would have probably been useful to provide the command I used to run NanoPlot! NanoPlot -t 4 --verbose --store -o $nameplot --tsv_stats --info_in_report --fastq $fq where Can you clarify what you mean by test data of different types? I ran NanoPlot on ~90 different fastq files so if you mean more samples I can happily provide more? Additionally as a note about Cramino, it also inputs only BAM/CRAM files and does not take FASTQ files so it isn't quite a replacement as it requires alignment. |
What I meant is that for NanoStat, we have example outputs for fastq, fasta, alignment, etc: https://github.com/ewels/MultiQC_TestData/tree/master/data/modules/nanostat I'm not very familiar with NanoStat and NanoPack, do you know if NanoPlot only works with FastQ or takes BAMs/Fasta as well? Is it only Cramino that supports BAMs? The reason I'm asking is that I don't want to implement a parser for just one specific file, because the tool might have other use cases that expect a different input and generate other variations of outputs that we want to support as well. |
Yep, NanoPlot can take various input: I'll start with your example, but it would be good to extend the example data with more. |
Got it, right now I only have fastq files processed. I could potentially generate from summary file (albacore I believe), bam file, and pickle file formats as well. If those files inputs all produce the same NanoStats.txt would that be helpful to know? Overall it looks like a number of tools in the NanoPack are focused on BAM/CRAM formats. With a handful focusing on raw read (i.e. FASTQ/FASTQ) formats. NanoPlot seems to be the most accommodating as it accepts fasta, fastq_rich, fastq_minimal, summary (albacore or guppy), bam, ubam, cram, pickle, or feather. The pickle/feather is from NanoPlot itself. |
That would be great! The more - the better. Ideally, we want to replicate all the old-format examples we have here https://github.com/ewels/MultiQC_TestData/tree/master/data/modules/nanostat but in a new format. So our tests cover all possible scenarios. |
Alright, it appeared to be a pretty straightforward change, as the new format has a one-to-one match with the old format. See the pull request #1997 I'm keeping the module named |
@vladsavelyev Do this mean that it should work now for NanoPlot? Additionally, it would take me a few days but I could still generate the output for summary, bam, and fastq with and without the --tsv_stats flag as well if that's still desired? |
@joeellis1331, yep, it should work for NanoPlot now! |
Hi @vladsavelyev, apologies for the simplicity of this question but what is the best way to ensure these changes take effect on my install? I tried to use |
@joeellis1331 the pull-request is not yet merged, you can see it here: #1997 Even once merged you'd still need to install the development version, until those changes go out in a stable release. See docs for installing the development version (boils down to this command): pip install git+https://github.com/ewels/MultiQC.git And to try out the as-yet-unmerged pull request, I'd recommend first cloning the MultiQC repository and then using the GitHub CLI to check out the pull request. eg: git clone https://github.com/ewels/MultiQC.git
cd MultiQC
gh pr checkout 1997
pip install . |
* NanoStat: support new output format. Fixes #1995 * Docs: more mention of alternative tool names. Means if anyone searches on the modules listing page for these, it'll show up. * Split legacy parsing into its own function --------- Co-authored-by: Phil Ewels <phil.ewels@seqera.io>
Hi, Installing the development version I was still unable to get multiQC to generate a report from my NanoPlot output (i.e. it doesn't recognize the NanoStat.txt files). Is there additional resources/information I can provide? It is worth noting that I run multiQC within a parent directory where subdirectories contain the output from NanoPlot. |
I had this problem as well. You need to add the search terms in the config file to find the NanoPlot output stats: Create
Since there's also no prefix to the file, you have to add a Then run multiqc with: |
@joeellis1331 and @knacko, would you be able to share the Note that at the moment, MultiQC tells the nanostat files by checking that they contain either a "General summary:" or a "Metrics dataset" line inside: https://github.com/ewels/MultiQC/blob/master/multiqc/utils/search_patterns.yaml#L444-L451 |
@vladsavelyev Here is an example file, it does follow the "Metrics dataset" header naming convention. The file itself is named as such ("NanoStats.txt") there is no prefix added to the file name. |
Thank you so much @joeellis1331! The search patterns didn't account for tab characters as a column separator, that's why it was missing the file. I added a fix for that: #2155 |
Description of bug
Hi,
I have recently used NanoPlot 1.41.6 and tried to ingest the NanoStat.txt summary files into the multiQC report using v1.15 but kept receiving
[WARNING] No analysis results found. Cleaning up..
. I saw this issue post which and compared to the originalNanoStats.txt
file mine is slightly different. Maybe a recent update of NanoPlot adjusted this format, unsure!File that triggers the error
NanoStats_mine.txt
NanoStats_original.txt
MultiQC Error log
Before submitting
The text was updated successfully, but these errors were encountered: