Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RSeQC TIN module - sample name cleaning #1484

Closed
ewels opened this issue Jul 6, 2021 · 2 comments
Closed

RSeQC TIN module - sample name cleaning #1484

ewels opened this issue Jul 6, 2021 · 2 comments
Labels
bug: module Bug in a MultiQC module priority: high

Comments

@ewels
Copy link
Member

ewels commented Jul 6, 2021

Originally posted by @guidohooiveld in #737 (comment)


Thanks Phil and Erik for creating the MultiQC RSeQC TIN submodule; much appreciated!

Earlier today I updated MutiQC to the latest development version, and ran it again on a map containing various QC output files, including TIN.
Et voila, the 2 columns (of which one is hidden) were indeed added to the General Statistics table. Nice & thanks!

One comment/question, though, regarding the sample names used for the TIN values in the General Statistics table: these are not the same as used for the other RSeQC modules. This makes the table 'less nice' and more difficult to read. See 1st screenshot below.

I think this is due to the fact that within the TIN "summary" file (the txt file *out.summary.txt) the full name of the BAM file is returned (used) by RSeQC (see its copied content below), which is then extracted (parsed) by the MultiQC TIN module, and subsequently used in the General Statistics table.

Therefore: would you have any suggestion to prevent this form happening? So that only the 'base name' is used in the table? Maybe by somehow using on-the-fly the function fn_clean_sample_names?
Note that I am not an expert on how to do this and it may be a too naive thought... but since the 'other' files seem to be correctly recognized and name cleaned (see 2nd screenshot), this may be feasible.

Thus, in summary: in the General Statistics table the full name present in the TIN summary file (*out.summary.txt) is used (e.g. "P26-1-6h_Aligned.sortedByCoord.out.bam"), whereas just the use of only the sample ID (base name) "P26-1-6h" would be preferred.

Content TIN summary file (P26-1-6h_Aligned.sortedByCoord.out.summary.txt):

Bam_file	TIN(mean)	TIN(median)	TIN(stdev)
P26-1-6h_Aligned.sortedByCoord.out.bam	53.72327495737302	53.34221052273402	18.530355596890026

An example file is present in my previous post in this thread (#737 (comment)).

image

Below a screenshot of a map containing for a sample the output of STAR, but also RSeQC and Picard. All relevant files are nicely recognized by MultiQC, and their names are properly 'cleaned' when used in the MultiQC report. Hence my (naive) thought above...

image

@ewels
Copy link
Member Author

ewels commented Jul 6, 2021

@guidohooiveld - fixed in v1.12dev. If you install the dev version it should now work for you as you expect.

Sorry for the inconvenience!

Phil

@guidohooiveld
Copy link

guidohooiveld commented Jul 6, 2021

Thanks Phil for the prompt action taken! It is indeed working now as expected.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug: module Bug in a MultiQC module priority: high
Projects
None yet
Development

No branches or pull requests

2 participants