Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicates of transcript in "OUT.read_assignments.tsv" #168

Closed
P10911004-NPUST opened this issue Mar 14, 2024 · 6 comments
Closed

Duplicates of transcript in "OUT.read_assignments.tsv" #168

P10911004-NPUST opened this issue Mar 14, 2024 · 6 comments
Labels
bug Something isn't working fixed in release Issue resolved and the fix is released, waiting for approval supposedly fixed Issue supposed to be resolved, but not approved yet weird results Something looks odd in the resulting files

Comments

@P10911004-NPUST
Copy link

Hi,

Some of the transcripts have two identical records in the output file ("OUT.read_assignments.tsv"). What is the indication of those duplicates?

#read_id chr strand isoform_id gene_id assignment_type assignment_events exons additional_info
transcript/170970 Chr1 - AT1G01020.1 AT1G01020 unique fsm,tes_match_precise:-3 8571-8713 PolyA=False;
transcript/170619 Chr1 - AT1G01020.5 AT1G01020 inconsistent intron_retention:7836-7941 6833-7069 PolyA=False;
transcript/170619 Chr1 - AT1G01020.5 AT1G01020 inconsistent intron_retention:7836-7941 6833-7069 PolyA=False;
transcript/2476 Chr1 + AT1G01040.1 AT1G01040 inconsistent tss_match:22 28432-28707 PolyA=False;
transcript/2476 Chr1 + AT1G01040.1 AT1G01040 inconsistent tss_match:22 30902-31197 PolyA=False;
transcript/5023 Chr1 + AT1G01040.1 AT1G01040 unique_minor_difference ism_3 30410-30715 PolyA=False;
transcript/2503 Chr1 + AT1G01040.1 AT1G01040 inconsistent tes_match_precise:0 27099-27281 PolyA=False;

isoquant.log

Thanks for your great effort on this amazing project. This is really helpful. Thank you.

@andrewprzh
Copy link
Collaborator

Hi @P10911004-NPUST

For transcript/2476 it looks ok since there are different exon coordinates. It seems there are 2 different alignments reported in the BAM file.
For transcript/170619 it looks odd. Could you check how many alignments correspond to transcript/170619?

You can try running IsoQuant --no_secondary flag.
For transcript classification you can also try SQANTI3.

Best
Andrey

@andrewprzh andrewprzh added the weird results Something looks odd in the resulting files label Mar 21, 2024
@P10911004-NPUST
Copy link
Author

Hi @andrewprzh

I have rerun the whole pipeline, with --no_secondary flag. Duplicates still exist. Am I safe if I just simply ignore those duplicates?
isoquant.log

Some transcripts (each row was rearranged according to the read_id) were shown as below.
exp.txt

Thank you!

@andrewprzh
Copy link
Collaborator

@P10911004-NPUST

Now it looks even more odd. For example, transcript/0 is a FSM, but is also reported as ambiguous, which is self-contradictory.
Could you maybe share your initial FASTA or BAM file? Is it possible some sequences have identical ids?

Best
Andrey

@andrewprzh andrewprzh added bug Something isn't working supposedly fixed Issue supposed to be resolved, but not approved yet labels Apr 22, 2024
@andrewprzh
Copy link
Collaborator

@P10911004-NPUST

I think I found and fixed the issue. I need some more time to double check. It will be out in the next release.

Best
Andrey

@andrewprzh
Copy link
Collaborator

And by the way, it's safe to ignore duplicated records.

Best
Andrey

@andrewprzh
Copy link
Collaborator

Should be fixed in version 3.4

@andrewprzh andrewprzh added the fixed in release Issue resolved and the fix is released, waiting for approval label May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixed in release Issue resolved and the fix is released, waiting for approval supposedly fixed Issue supposed to be resolved, but not approved yet weird results Something looks odd in the resulting files
Projects
None yet
Development

No branches or pull requests

2 participants