Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TAMA Collapse assigns truncated reads to wrong transcript #104

Closed
cathycoutu opened this issue May 18, 2023 · 3 comments
Closed

TAMA Collapse assigns truncated reads to wrong transcript #104

cathycoutu opened this issue May 18, 2023 · 3 comments

Comments

@cathycoutu
Copy link

Thanks for providing TAMA, I'm enjoying the flexibility, filtering options, and ability to track what's happening. This is my first time working with IsoSeq data. My goal is to improve the accuracy of our transcriptome by merging the IsoSeq-derived transcriptome with our existing short-read derived transcriptome.

I'm starting with FLNC reads from IsoSeq3.

When collapsing reads into transcripts in the presence of partially 5' degraded reads using the nocap option, the shorter partially-degraded reads are being assigned to transcript models supported by a single read if the variation occurs 5' to the start of the degraded read.

python ~/bin/tama/tama_collapse.py -s tama_split_20_1.sam -f genome.fa -p tama_split_20_1 -i 99 -x no_cap -a 100 -z 100 -sj sj_priority -lde 1 -sjt 20

Here's an alignment to show you the problem. All the reads prefixed with "2" (at the top of the alignment) were assigned to model 2. All the reads prefixed with "3" were assigned to model 3. Model 2 is represented by the majority of the reads. In model 3, the first intron has not been spliced out. The shorter reads could have been assigned to either model with equal confidence.

problem alignment

Model 3 is actually only supported by a single read (m64128_230204_024757/124062102/ccs).
Assigning the partially degraded reads to model 3 makes it very difficult to auotomatically remove the model. It is not removed by remove_single_read_models.py, as it appears to be supported by 17 reads. This problem occurs for many high read depth genes in my dataset (note that model 2 was supported by over 1000 reads).

Can you recommend a way to remove partially-degraded reads prior to collapsing, or a setting in TAMA_collapse which would assign ambiguous reads (which could map to more than one model) to the model with the most reads?

I've attached fasta files containing the reads in the alignment, in case they would be useful.

Thanks again,
Cathy
G4714.2 reads subset.txt
G4714.3 reads.txt

@GenomeRIK
Copy link
Owner

Hi Cathy,

Using the capped mode for collapsing would solve this issue.

Sorry for the late reply!

Thank you,
Richard

@cathycoutu
Copy link
Author

cathycoutu commented Nov 2, 2023 via email

@GenomeRIK
Copy link
Owner

Hi Cathy,

That's great that you were able to get some answers! And thank you so much for using TAMA! I'll try to be quicker with my responses next time but feel free to email me if it is urgent.

Thank you,
Richard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants