Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to set x axis of custom linegraph using tsv file #1242

Closed
anoronh4 opened this issue Jul 2, 2020 · 5 comments
Closed

unable to set x axis of custom linegraph using tsv file #1242

anoronh4 opened this issue Jul 2, 2020 · 5 comments

Comments

@anoronh4
Copy link

anoronh4 commented Jul 2, 2020

Description of unexpected behavior:
In custom content we are able to add data from several file types (json, tsv, yaml). I am attempting to add data of several samples using tsv and i am seemingly successful with the following:

SampleA	700650	7969	4991
SampleB	683285	7357	4688
SampleC	674329	7412	4632

and my config file says:

custom_data:
    customtab:
        file_format: 'tsv'
        section_name: 'Custom Data'
        plot_type: 'linegraph'
        pconfig:
            id: 'custom_linegraph'
            title: 'Custom Data Linegraph'
sp:
    customtab:
        fn: 'testfile.tsv'

Multiqc infers that the x-axis for each datapoint is 1, 2, 3 . However in my case it should be 0,1,2. But using a multi-sample tsv file, it is not obvious how i'm supposed to add that information. I tried a header line where the first field is blank, but it is treated as another row in the table and is visualized as Series0 on the linegraph (json file says{"name":"","data": [[1,0.0],[2,1.0],[3,2.0]]}). This is the table from that test:

	0	1	2
SampleA	700650	7969	4991
SampleB	683285	7357	4688
SampleC	674329	7412	4632

The only way to customize the x-axis that I can see is to use another file format like json or yaml, but this is an unexpected limitation, since tsv files are the most common and readily available input type.

File that triggers the unexpected behavior:
testfiles.zip

MultiQC run details:

  • Command used to run MultiQC: multiqc .
  • MultiQC Version: MultiQC v1.9
  • Operating System: linux
  • Python Version: Python 3.8.1
  • Method of MultiQC installation: singularity container from docker://ewels/multiqc
@anoronh4
Copy link
Author

anoronh4 commented Oct 16, 2020

update: i seem to be having a different but may not unrelated problem with *_mqc.json files.
demo.zip

Once again multiqc seems to be ignoring the x-labels that are entered into the custom content files, just like when i was using tsv. Strangely, the peaks in the plot are spaced in intervals of eleven.
Screen Shot 2020-10-16 at 11 28 58 AM

i think the linegraph plot is maybe expecting non-string keys, but i can't manage that in json format. even the linegraph from your example here does not show the expected x-axis labeling when i run multiqc on it.

any guidance at this point would be appreciated.

@anoronh4
Copy link
Author

anoronh4 commented Oct 16, 2020

update: yaml format gets better results for using the intended x-axis label, guessing because we have the liberty to use string/int keys in the data section. i'll leave this issue up because i still believe it to be a bug/area for improvement.

@zxl124
Copy link

zxl124 commented Dec 18, 2021

I have run into the same thing. Basically the X-axis values are meaningless when using custom-content module to make line graphs, in JSON format. The example code in MultiQC documentation has the X-axis values in 1,2,3,4,5

"data": {
    "sample_1": { "1": 12, "2": 14, "3": 10, "4": 7, "5": 16 },
    "sample_2": { "1": 9, "2": 11, "3": 15, "4": 18, "5": 21 }
  }

but the plot comes out with X-axis values in 0,1,2,3,4
image
I've tried changing the X-axis values to other numbers, but the plot remained the same. I suspect converting the keys to float before feeding into HighCharts might fix this bug.

@ewels
Copy link
Member

ewels commented Sep 9, 2022

Hi both,

Apologies for my slow response here. I was able to reproduce your error, funny how I had never noticed this in my own docs examples 🤦🏻

Line graphs and scatter graphs no attempt to make the axis labels into floats if they can, which seems to solve that part of the problem.

Phil

@ewels
Copy link
Member

ewels commented Sep 9, 2022

Ok, and second part also done. I've added some logic so that if the first sample has no sample name, then the values are used for x-axis labels.

This means that your original example with the TSV file now works as you expect 👍🏻

Thanks for pointing this out and suggesting it!

Phil

ewels added a commit to MultiQC/test-data that referenced this issue Sep 9, 2022
bnbowman pushed a commit to bnbowman/MultiQC that referenced this issue Dec 20, 2022
* Try prefixing analysis dirs

* Update CHANGELOG.md

* tried to add conditional execution to actions

* Update CHANGELOG.md

* Add dependency statement

* only overwrite id when not set

* changelog

* always set c_id

* Revert setting force_interactive flag for rich with --no-ansi

* Don't force terminal escape codes for the progress bar

* Extend kallisto module regex to recognize newer output

I've noticed multiqc (v 1.12) didn't recognise some kallisto output (I'm using kb_python 0.26.4).
Having digged a bit, it seems to not work with more recent kallisto output, i.e. this snippet taken from here:
https://github.com/pachterlab/GRNP_2020/blob/daed9c2f204f1c3f6ee0e864c3db93b0baadfc8a/notebooks/FASTQ_processing/ProcessPBMC_NG.ipynb
```
[index] k-mer length: 31
[index] number of targets: 187,626
[index] number of k-mers: 108,619,921
tcmalloc: large alloc 3221225472 bytes == 0x556459b7e000 @  0x7feac4ab5887 0x556458814ad2 0x55645880d061 0x5564587e1372 0x7feac3935bf7 0x5564587e60da
[index] number of equivalence classes: 752,021
[quant] will process sample 1: A_R1.gz
                               A_R2.gz
[quant] will process sample 2: B_R1.gz
                               B_R2.gz
[quant] finding pseudoalignments for the reads ... done
[quant] processed 170,526,037 reads, 98,632,205 reads pseudoaligned
```
Turns out multiqc only looks for `pair|file` but not sample. Replacing sample for file did do the trick, hence I suggest to add `sample` in the regex pattern.
I haven't tested this, but it should work now.

Here is the code which generates this output:
https://github.com/pachterlab/kallisto/blob/83bde908c403ea4014b5092a243e5c7240f48dd5/src/ProcessReads.cpp#L235

This is the commit which introduced it (already in 2018, so not sure why this hasn't been caught yet)
pachterlab/kallisto@62e9464

* Replace logger.hasHandlers() with logger.handlers

There are cases where configuring logging results in logger.handlers being empty but logger.hasHandlers() returns True: MultiQC#1643

Since the block modified removes based on logger.handlers, the condition to enter the block should check logger.handlers rather than logger.hasHandlers()

* Added description of changes for pull request

* Document 'no_version_check' config option

* Docs tweak

* Fix kwargs for MultiQC plugins

* New config option 'custom_table_header_config'

* Run black

* Update adapterRemoval.py

Returns actual proportion of reads that were collapsed and discarded

* Black format

* Black format

* BlackPython

* Fix chart labels and titles

* Fix chart labels and titles

* Add columns to stats table

Add columns with proportion of collapsed/discarded reads to the general stats table

* Add Columns - Fix format

* Changelog

* Fixed bug when other fields also have a "-" instead of an integer.

* Updated CHANGELOG

* Fixed typos

* Fixed format typo

* Fixed format typo

* Nanostat: Remove HTML escaping

Jinja2 escape() function removed in jinja2 v3.10

I don't think that this escaping should be required. I can't see any effect in the report when I remove it anyway.

* Changelog

* Changing 0 to None

* Skip fields with `-`

* Pangolin 4.0 compatability

Recently pangolin has been updated to version 4.0 and this changes the output CSV file - see: https://github.com/cov-lineages/pangolin/releases/tag/v4.0

This causes the module to fail in its current state as row['qc_status'] already exists and the current replacement triggers a key error by searching for row['status'] which no longer exists. Thanks to @alexomics for tracking down the issue.

* Don't duplicate custom-content section descriptions.

Fixed edge-case bug in custom content where a `description` that doesn't terminate in `.` gave duplicate section descriptions.

* Changelog

* Tidied the verbose log, added summaries for skipped search files to debug log

* Allow sorting of table columns with text contents

* update changelog

* optimize linegraph category comparison

* Somalier: division by zero in sex ploidy plot

* Changelog

* Add time zone

* Update changelog

* Fix typo in bcl2fastq.py

* Handle too long and low complexity

* update changelog

* fix zero division error in sambamba markdup module

* black formatting

* update CHANGELOG.md to address MultiQC#1654

* bclconvert checks RunInfo xml if reads are singleend or pairedend and sets clusterlength appropriately. resolves MultiQC#1697

* Added CITATION.cff file for standardized citations

* fixed formatting of url

* fixed citation formatting

* Run prettier

* Fix module crashing due to missing field in report

* Fix bug where module wouldn't run if all content was within a MultiQC config file

Fixes MultiQC#1686

* nanostat: add check for quality scores

* update CHANGELOG.md

* update CHANGELOG.md

* Custom content: Fix crash when 'info' isn't set

Closes MultiQC#1688

* Added nix flake support

* Update docs/installation.md

Co-authored-by: Phil Ewels <phil@seqera.io>

* Fix zero division error

* Update fastqc.py

* Update fastqc.py

* fix format

* add change log

* fix doc ref

* Don't need Prettier _and_ markdown/yamllint CI

* Just capture the ValueError

* Rich-codex screenshot in the readme

* Corrected 'outdir' flag

Missing a dash for the flag to work.

* Clean up clean_img_paths

* Generate new screengrabs with rich-codex

* Add samtools flagstat column '% Read Mapped'

* update samtools flagstat changelog

* Added try,except for divisions to avoid division by 0 errors

* added the fixing of malt in the change log

* report median read length for fastqc

* add after filtering total reads to general stats table

* GitHub Actions: Tweet about new releases

* Bump to v1.13 for release

* rich-codex screenshots: Manual only, skip git checks

* Generate new screengrabs with rich-codex

* Fix changelog date

* Bump to v1.14dev

* Custom content: Render report even if there's only general stats there

See MultiQC#1756

* Bugfix: Make `config.data_format` work again

* Bump minimum version of Jinja2 to `>=3.0.0`

Closes MultiQC#1642

* Disable search progress bar if running with `--quiet` or `--no-ansi`

Closes MultiQC#1638

* Attempt to cooerce line / scatter x-axes into floats so as not to lose labels

See MultiQC#1242

* Use row 1 as x-axis labels if no sample name.

Closes MultiQC#1242

* Malt: Move changelog up to new version

* Merge changelog up

* Use OrderedDict instead of 'placement'

* Add code comment

* Add CI testing for Python 3.10 and 3.11

* Fix typo

* Quotes so it's 3.10 and not 3.1

* 3.11-dev

* Remove 3.11-dev for Windows

* Move merge markers GHA into lint workflow file

* Shorter job name

* Be more selective about when slow MultiQC test runs fire

- Master only for push event
- Don't run if only docs / markdown

* Run isort

* Remove py2 'from __future__ import print_function'

* Add GitHub actions CI for isort

* Changelog

* Remove all py2 'from __future__ imports'

* Tweak some imports

* Changelog

* added setuptools to flake

* rm emtpy bcftools stats variant depths plot

* moved changelog comment

* adjusted PR num

* fix duplicate heatmap for kraken

* changelog

* use None instead

* First commit of pre-commit

* Comment out all the tests that don't yet work

* Update gene_body_coverage.py

Using a normalized coverage to make genebody coverage plot ( similar to the method used by RSeQC). Us the formula 'norm_cov = ( cov - min(cov ) / ( max(cov) - min(cov) )' to compute normalized coverage

* Update gene_body_coverage.py

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Move changelog entry

* Test for Python 3.11 now that the official release is out

* CI: Use new version of actions/checkout to avoid Node.js depreciation warning

* Remove sample and chromosome before converting to int

This fixes issue-1793

* Remove filtered samples from general stats table

This fixes MultiQC#1780

* Update changelog

* Add additional entries for qualimap when region stats present

* fastp: use passed filter reads instead of after filter total reads

Signed-off-by: Josh Chorlton <jchorlton@gmail.com>

* bclconvert now handles different r1 and r2 lengths instead of assuming they are the same

* updated CHANGELOG.md

* update bustools

* Update CHANGELOG.md

* Remove changelog entry

* Move changelog to entry to correct place

* Fix changelog

* Kraken: Improve heatmap config

* Apply suggestions from code review

Co-authored-by: Phil Ewels <phil.ewels@seqera.io>

* handle singleindex data

* cleanup

* CHANGELOG.md bclconvert fix issue link typo and note single-index paired-end data handled

* Qualimap BamQC: Refactor to parse regexes per section

Also: Fix randomly aggressive Snippy module parsing bug

* HsMetrics: Allow custom columns in General Stats too

* Replace nested loop with list comprehension when parsing output file headers

* CHANGELOG

* Output headers order preserved and Sample is first value

* Fix ubuntu version in GitHub CI to preserve Py3.6 testing.

Python 3.6, I think your days are numbered..

* add back original avg field

Signed-off-by: Josh Chorlton <jchorlton@gmail.com>

* fixes

Signed-off-by: Josh Chorlton <jchorlton@gmail.com>

* update busco colors

Signed-off-by: Josh Chorlton <jchorlton@gmail.com>

* fix: frontmatter yaml formatting issue

* Update docs to use --cl-config instead of --cl_config

Closes MultiQC#1825

* Update multiqc/modules/fastqc/fastqc.py

Co-authored-by: Phil Ewels <phil.ewels@seqera.io>

* Update multiqc/modules/fastqc/fastqc.py

Co-authored-by: Phil Ewels <phil.ewels@seqera.io>

* Update multiqc/modules/fastqc/fastqc.py

Co-authored-by: Phil Ewels <phil.ewels@seqera.io>

* suggestion

Signed-off-by: Josh Chorlton <jchorlton@gmail.com>
Co-authored-by: Erik Danielsson <danielsson.erik.0@gmail.com>
Co-authored-by: Phil Ewels <phil.ewels@scilifelab.se>
Co-authored-by: Ido Tamir <ido.tamir@vbcf.ac.at>
Co-authored-by: seb-mueller <sebm@posteo.de>
Co-authored-by: Jonathan Oribello <Jonathan.d.oribello@gmail.com>
Co-authored-by: NiemannJ <69033839+NiemannJ@users.noreply.github.com>
Co-authored-by: fgvieira <fgarrettvieira@gmail.com>
Co-authored-by: mattloose <matt.loose@nottingham.ac.uk>
Co-authored-by: Josh Chorlton <jchorlton@gmail.com>
Co-authored-by: vladsaveliev <vladislav.savelyev@populationgenomics.org.au>
Co-authored-by: Sam Chorlton <>
Co-authored-by: jethror1 <45037268+jethror1@users.noreply.github.com>
Co-authored-by: Garth Kong <kongga2017@gmail.com>
Co-authored-by: Andrei Seleznev <aseleznev@illumina.com>
Co-authored-by: lew2mz <david.lewis@cchmc.org>
Co-authored-by: Phil Ewels <phil@seqera.io>
Co-authored-by: phue <patrick.huether@imp.ac.at>
Co-authored-by: David Lewis <60514384+IllustratedMan-code@users.noreply.github.com>
Co-authored-by: Chang Y <yech1990@gmail.com>
Co-authored-by: beausoleilmo <beausoleilmo@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jean Mainguy <jean.mainguy@outlook.fr>
Co-authored-by: aidaanva <aida.andrades@gmail.com>
Co-authored-by: SusiJo <susanne.jodoin@gmx.de>
Co-authored-by: Phil Ewels <phil.ewels@seqera.io>
Co-authored-by: TNalpat <thomas.nalpathamkalam@gmail.com>
Co-authored-by: Redmar van den Berg <RedmarvandenBerg@lumc.nl>
Co-authored-by: James Fellows Yates <jfy133@gmail.com>
Co-authored-by: Maarten-vd-Sande <maartenvandersande@hotmail.com>
Co-authored-by: Adam Talbot <adam.talbot@nonacus.com>
Co-authored-by: Oleh Pratsko <olehpratsko@gmail.com>
Co-authored-by: Josh Chorlton <jchorl@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants