Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add library information to Salmon report #733

Closed
jaclyn-taroni opened this issue Apr 17, 2018 · 9 comments · Fixed by #1485
Closed

Add library information to Salmon report #733

jaclyn-taroni opened this issue Apr 17, 2018 · 9 comments · Fixed by #1485

Comments

@jaclyn-taroni
Copy link

Hi @ewels,

Thanks for this tool! We at Alex's Lemonade are interested in using MultiQC to generate quality control reports for our refine.bio project (https://github.com/AlexsLemonade/refinebio).

We will be using Salmon for our RNA-seq pipeline and we would like to report additional QC information to users (related: AlexsLemonade/refinebio#131 (comment) and AlexsLemonade/refinebio#187 (comment)).

Would it be possible to include/display the following information from a (quasi-mapping mode) Salmon run in the form of a table:

  • library_types from aux_info/meta_info.json
  • compatible_fragments_ratios and strand_mapping_bias from lib_format_counts.json

Thanks again!

@ewels
Copy link
Member

ewels commented Apr 18, 2018

Cool! Yes I can certainly have a look. Everything in the meta_info.json file is parsed already, so that’s just a case of displaying it in the report. The other stuff shouldn’t be too difficult as long as the file isn’t huge.

It looks like my example test data may be out of date as that file is a .txt instead of .json. Do you guys have an example Salmon output that I could add to develop around please? Just dropping a zip file on this issue is fine (or PR to the test data repo).

We have a MultiQC hackathon event next week so this would be a great beginner project for someone there..

@jaclyn-taroni
Copy link
Author

Great, thank you! Here is an example Salmon output: DRR016125_quant.zip It was run using the latest release of Salmon (v0.9.1).

Please let me know if you need anything else!

ewels added a commit to MultiQC/test-data that referenced this issue Apr 18, 2018
@ewels
Copy link
Member

ewels commented Apr 18, 2018

That's perfect, thank you!

@rspreafico-vir
Copy link
Contributor

If this module gets revisited, would it be possible also to provide individual files from Salmon's output (such as aux_info/meta_info.json) to MultiQC and still figure out the sample name? Right now the full output is required by MultiQC, whereas for other tools typically a single log file is sufficient.

@ewels
Copy link
Member

ewels commented Jun 16, 2021

@rspreafico-vir - apologies for the ridiculously long 2 year delay in responding to your request!

The reason that Salmon needs the full output is because the meta_info.json files do not contain the input filename. As such, using the full file path is the only method I could think of to get a sample name for each sample.

Looking now, I guess we could parse cmd_info.json for the input file paths. But that would still require passing the entire directory to MultiQC in order to be able to maintain that association, so I guess it would not really bring any benefit.

@megumi-mori
Copy link
Contributor

Hi @ewels!
I found this ticket on the difficulty project board!
I think I can figure out how to make the requested tables from the lib_format_counts.json file.
But I don't know how to display the library_types, which seems to require a table of strings. If I could get a pointer I can get working on this ticket.

@ewels
Copy link
Member

ewels commented Jun 19, 2021

Just had a quick look - it's an array, but I guess that most of the time it's a single string per sample? So I think we can just join with a comma and add as a column to the general stats table (maybe hidden by default?). Can try to have a look when I'm back at work, but feel free to have a play 👌🏻

@megumi-mori
Copy link
Contributor

Ok, thanks! I just didn't know the table can handle strings.
I started making a subsection under Salmon to display a table - should I instead include the info in the general stats?
Thanks!

@ewels
Copy link
Member

ewels commented Jun 21, 2021

Yeah, just use "scale": False for the column config if using Strings. Then MultiQC doesn't try to add a bar underneath it and the formatting is simpler / better.

I think as it's just a single column, better to add to the General Stats table 👍🏻 Can perhaps hide it by default (or better still - hide if all values are the same across all samples, show if not). Using "hidden": True.

megumi-mori added a commit to megumi-mori/MultiQC that referenced this issue Jul 7, 2021
@ewels ewels linked a pull request Feb 3, 2022 that will close this issue
2 tasks
@vladsavelyev vladsavelyev added this to the MultiQC v1.19 milestone Dec 15, 2023
vladsavelyev added a commit that referenced this issue Dec 17, 2023
* Salmon reports library info to General Stats table #733

* Fixed Markdownlint and Prettier errors

* Catches if 'library_types' key does not exist. Added log print for library format counts found.

* Move changelog up

* More specific search

* Add note of the requirement to be in a named directory

* Missed one

* Update multiqc/modules/salmon/salmon.py

Co-authored-by: Phil Ewels <phil.ewels@seqera.io>

---------

Co-authored-by: Phil Ewels <phil.ewels@scilifelab.se>
Co-authored-by: vladsaveliev <vladislav.savelyev@populationgenomics.org.au>
Co-authored-by: Phil Ewels <phil.ewels@seqera.io>
Co-authored-by: Vlad Savelyev <vladislav.sav@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

5 participants