Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qualimap RNAseq locale parsing error #1870

Closed
4 tasks done
fspecque opened this issue Feb 22, 2023 · 3 comments · Fixed by #2282
Closed
4 tasks done

Qualimap RNAseq locale parsing error #1870

fspecque opened this issue Feb 22, 2023 · 3 comments · Fixed by #2282
Labels
bug: module Bug in a MultiQC module

Comments

@fspecque
Copy link

Description of bug

Hello,

Thanks for this great tool. I've noticed a bug which seems to arise from a parsing error of qualimap RNAseq output. If I understood correctly, MultiQC parses the rnaseq_qc_results.txt from qualimap rnaseq analysis. But the authors of qualimap have separated the hundreds by a space, thus the counts displayed in MultiQC's report based on qualimap RNAseq output are incorrect. I guess the regex that is being used stops at blanks.

Example of qualimap RNAseq output:

>>>>>>> Reads genomic origin

    exonic =  1 233 088 (38,7%)
    intronic = 1 119 695 (35,14%)
    intergenic = 833 548 (26,16%)
    overlapping exon = 61 165 (1,92%)

Numbers reported by MultiQC:
screenshot

The MultiQC version I use is 1.14, installed with miniconda.

Note that the genome_results.txt from qualimap bamQC module separates hundreds with commas.

Best,
Florian

File that triggers the error

rnaseq_qc_results.txt

MultiQC Error log

/// MultiQC 🔍 | v1.14

|           multiqc | Report title: MultiQC report for 10 samples (qualimap rnaseq outputs)
|           multiqc | Prepending directory to sample names
|           multiqc | Search path : /home/flospe/Documents/projects/EMTSens/test_10_samples/qualimap_outputs/star_out
|         searching | ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1058/1058  
|          qualimap | Found 30 RNASeq reports
|           multiqc | Compressing plot data
|           multiqc | Report      : MultiQC-report-for-10-samples-qualimap-rnaseq-outputs_multiqc_report.html
|           multiqc | Data        : MultiQC-report-for-10-samples-qualimap-rnaseq-outputs_multiqc_report_data
|           multiqc | MultiQC complete

Before submitting

  • I have read the troubleshooting documentation.
  • I am using the latest release of MultiQC.
  • I have included a full MultiQC log, not truncated.
  • I have attached an input file (.zip if necessary) that triggers the error.
@fspecque
Copy link
Author

After further inspection, it is not a regular space, it a no-break space (unicode: U+00A0)

qualimap version: 2.2.1

@ewels ewels added the bug: module Bug in a MultiQC module label Feb 28, 2023
@ewels ewels changed the title Parsing error of qualimap RNAseq output Qualimap RNAseq locale parsing error Feb 28, 2023
@ewels
Copy link
Member

ewels commented Feb 28, 2023

Oof, I guess that this is due to a Java locale issue and not really the authors of Qualimap.

Regexes need updating to allow spaces:

https://github.com/ewels/MultiQC/blob/bad886aaf2783fa753a48764000b63457282d4c3/multiqc/modules/qualimap/QM_RNASeq.py#L20-L30

Instead of [\d,] need [\d, ] (not sure if anything special is needed for the non-breaking space). Then need to strip the spaces afterwards for proper Python float handling.

Looks like we already have support for sorting out . thousands separators:

https://github.com/ewels/MultiQC/blob/bad886aaf2783fa753a48764000b63457282d4c3/multiqc/modules/qualimap/QM_RNASeq.py#L44-L49

These Java locale issues are some of my least favourite in the MultiQC code base!

@vladsavelyev
Copy link
Member

Fixed in #2282

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug: module Bug in a MultiQC module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants