TAMA Collapse assigns truncated reads to wrong transcript #104

cathycoutu · 2023-05-18T21:35:23Z

Thanks for providing TAMA, I'm enjoying the flexibility, filtering options, and ability to track what's happening. This is my first time working with IsoSeq data. My goal is to improve the accuracy of our transcriptome by merging the IsoSeq-derived transcriptome with our existing short-read derived transcriptome.

I'm starting with FLNC reads from IsoSeq3.

When collapsing reads into transcripts in the presence of partially 5' degraded reads using the nocap option, the shorter partially-degraded reads are being assigned to transcript models supported by a single read if the variation occurs 5' to the start of the degraded read.

python ~/bin/tama/tama_collapse.py -s tama_split_20_1.sam -f genome.fa -p tama_split_20_1 -i 99 -x no_cap -a 100 -z 100 -sj sj_priority -lde 1 -sjt 20

Here's an alignment to show you the problem. All the reads prefixed with "2" (at the top of the alignment) were assigned to model 2. All the reads prefixed with "3" were assigned to model 3. Model 2 is represented by the majority of the reads. In model 3, the first intron has not been spliced out. The shorter reads could have been assigned to either model with equal confidence.

Model 3 is actually only supported by a single read (m64128_230204_024757/124062102/ccs).
Assigning the partially degraded reads to model 3 makes it very difficult to auotomatically remove the model. It is not removed by remove_single_read_models.py, as it appears to be supported by 17 reads. This problem occurs for many high read depth genes in my dataset (note that model 2 was supported by over 1000 reads).

Can you recommend a way to remove partially-degraded reads prior to collapsing, or a setting in TAMA_collapse which would assign ambiguous reads (which could map to more than one model) to the model with the most reads?

I've attached fasta files containing the reads in the alignment, in case they would be useful.

Thanks again,
Cathy
G4714.2 reads subset.txt
G4714.3 reads.txt

GenomeRIK · 2023-11-02T01:00:24Z

Hi Cathy,

Using the capped mode for collapsing would solve this issue.

Sorry for the late reply!

Thank you,
Richard

cathycoutu · 2023-11-02T01:32:33Z

No worries, I asked in 2 different places and got the same answer months ago. In fact,I just presented using TAMA to all Agriculture and Agrifood Canada bioinformaticians today! Great software, thank you! Cathy

…

On Wed, Nov 1, 2023, 7:00 p.m. GenomeRIK ***@***.***> wrote: Hi Cathy, Using the capped mode for collapsing would solve this issue. Sorry for the late reply! Thank you, Richard — Reply to this email directly, view it on GitHub <#104 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AU6ETDGVJIGX3JBZ7FSBY5LYCLWDLAVCNFSM6AAAAAAYG7RBWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBZHA4TMNJYGM> . You are receiving this because you authored the thread.Message ID: ***@***.***>

GenomeRIK · 2023-11-02T02:42:51Z

Hi Cathy,

That's great that you were able to get some answers! And thank you so much for using TAMA! I'll try to be quicker with my responses next time but feel free to email me if it is urgent.

Thank you,
Richard

GenomeRIK closed this as completed Nov 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TAMA Collapse assigns truncated reads to wrong transcript #104

TAMA Collapse assigns truncated reads to wrong transcript #104

cathycoutu commented May 18, 2023

GenomeRIK commented Nov 2, 2023

cathycoutu commented Nov 2, 2023 via email

GenomeRIK commented Nov 2, 2023

TAMA Collapse assigns truncated reads to wrong transcript #104

TAMA Collapse assigns truncated reads to wrong transcript #104

Comments

cathycoutu commented May 18, 2023

GenomeRIK commented Nov 2, 2023

cathycoutu commented Nov 2, 2023 via email

GenomeRIK commented Nov 2, 2023