Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reference gene lost #140

Closed
zpliu1126 opened this issue Jan 7, 2024 · 4 comments
Closed

reference gene lost #140

zpliu1126 opened this issue Jan 7, 2024 · 4 comments
Labels
bug Something isn't working fixed in release Issue resolved and the fix is released, waiting for approval weird results Something looks odd in the resulting files

Comments

@zpliu1126
Copy link

Hi~ Andrey,

As you mentioned in README.

SAMPLE_ID.extended_annotation.gtf - GTF file with the entire reference annotation plus all discovered novel transcripts;

After I successfully ran IsoQuant, I checked the contents of the file SAMPLE_ID.extended_annotation.gtf; I found that it was missing some genes compared to annotated reference gff3 (38,958 vs 40,281). Are the missing 1,323 reference annotated genes due to long reads not detecting these genes ?

#* count of reference genes
cat A2.extended_annotation.gtf|awk '$3=="gene"{print $0}'|grep novel_gene -v |wc -l 

Best
zpliu

@andrewprzh
Copy link
Collaborator

Dear @zpliu1126

This looks odd, but I feel like I might have a clue where the bug is...
I'm out of the office for a while, so I'll try to fix it as soon as I'm back.

Best
Andrey

@andrewprzh andrewprzh added bug Something isn't working weird results Something looks odd in the resulting files labels Jan 11, 2024
@biochristmas
Copy link

Hi,
I have also encountered a similar problem. The reference annotation indicates that there are 100,919 genes and 107,233 mRNAs. The input files consist of 63,077 and 4,922 sequences, respectively. The command used is 'isoquant.py --reference reference.fa --genedb reference.gtf --fastq mixture.polish.fasta genome.polish.fasta --data_type pacbio_ccs -o output.' The resulting file is OUT.extended_annotation.gtf. The annotation file obtained contains 29,734 genes and 35,349 transcripts. I understand that this annotation file in the isoquant software includes complete reference annotations and any novel transcripts found. Why are there significantly fewer genes and transcripts compared to the number of records in the reference annotation GTF file? During the execution, there were many 'no exons' warnings. Could these warnings possibly be the cause of the missing gene and transcript count?"

@andrewprzh
Copy link
Collaborator

Dear @biochristmas @zpliu1126

Yes, there is a flaw in the construction function, this problem will be fixed in the nearest release.

Best
Andrey

@andrewprzh andrewprzh added the fixed in release Issue resolved and the fix is released, waiting for approval label May 9, 2024
@andrewprzh
Copy link
Collaborator

Finally released new version 3.4, which fixes this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixed in release Issue resolved and the fix is released, waiting for approval weird results Something looks odd in the resulting files
Projects
None yet
Development

No branches or pull requests

3 participants