-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why are novel transcripts with most reads assigned not reported? #127
Comments
Dear @SarahNadeau
In some cases there is not enough support for the transcript to be reported as reliable. E.g. if a novel transcript is supported only by a few, it will be discarded, and these reads will not be used.
This is indeed surprising, could you share non-grouped counts? Best |
Hi @andrewprzh, Thanks for the reply! The non-grouped counts also only include the one reported transcript. In case it helps, I've also included a copy of the log file. |
Just fyi I redacted some of the paths etc. in the log file, I hope that's not a problem. |
Dear @andrewprzh, I think I found the reason for these weird results. I played around with the source code, commenting out the lines that filter out similar and not-well-supported novel transcripts and uncommenting the lines that log why they would be filtered out. I can now see that there is an unreported transcript (number 9) that is discarded for being very similar to the transcript with the highest read support. For this reason, the unique reads are quite low for the transcript with the highest read support and thus the count values are also low compared to the transcript with the third highest number of reads supporting it. Based on these observations, would it make sense to re-calculate the unique reads assigned to a transcript after filtering out similar and not-well-supported transcripts? That way the counts are representative of the read support. Attached are some supporting materials so you can see what I mean. I apologize that I don't have a small reproducible example and that also these transcript and read/count numbers differ from my original example because I started using only a subset of the reads in order to iterate faster. Command line:
Selected lines from isoquant.log:
|
Dear @SarahNadeau Wow, thanks a lot for sharing those insights! It's really cool that you wend down to looks at the code, especially taking into account that transcript discovery part is not that well structured. I totally agree about recalculating the counts! Could you provide the exact lines you were modifying? I probably have a clue which ones are you talking about, but there a few filtering steps so I'd like to be sure :) Best |
Hi @andrewprzh, The changes are in the Here is the git diff:
|
Dear @SarahNadeau Thanks a lot! Will get my hands on it! Best |
Should be now fixed in IsoQuant 3.4. |
Hello,
I am a bit confused by my isoquant results. I analyzed three BAM files with combined 109932 mappings (109931 primary, 1 supplementary). The log shows that all 109931 reads were classified as intergenic and have polyAs (this is what I expect, we added polyAs to the alignment).
Questions
I was suprised there are only 107234 reads in
transcript_model_reads.tsv
. Why would some reads not be in this output?I was also surprised that a pivot table of
OUT.transcript_model_reads.tsv
shows reads are assigned to many different transcripts, but only one is reported inOUT.transcript_model_grouped_counts.tsv
. The reported transcript this is not even the one with the most read support (see screenshots below).Supporting files
isoquant_transcript_model_reads_counts.zip
Pivot table of the 107234 reads in
transcript_model_reads.tsv
:OUT.transcript_model_grouped_counts.tsv
:Command line (run as part of a snakemake workflow):
The text was updated successfully, but these errors were encountered: