-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
should extended_annotation.gtf be a superset of the input gtf? #175
Comments
Yes, this is a known flaw in the current version, it is now fixed and will be out in 3.4 (hopefully soon). Best |
Should be fixed now in IsoQuant 3.4 |
I thought this was fixed, but I'm seeing some instances where the exon information for a gene was not copied over. I wonder if this is related to whether or not reads were assigned to the gene. |
I noticed this initially in an unprocessed pseudogene (WASH7P) just because it happens to be very close to the beginning of |
There should not additional filtering, so sounds odd. What kind of information is missing, is it exon records? Thanks |
Ah! This probably a false alarm: it looks like the transcript name was not copied over, but the exons themselves are present. I was looking for the gene name and didn't see the exons. For example the first exon in both files:
|
Yes, additional information such as gene names etc is only copied for genes and transcript records. |
The reason I noticed this is because I was looking at IGV, and it wasn't displaying the exons for WASH7P, only the gene body. I think this is really a bug in how IGV is parsing the GTF (it should be matching on transcript_id), but you will probably update sooner. 😂 |
Yeah, I thought transcript_id would be enough. Maybe converting to GFF3 and having Anyway, will fix exon information. |
Exon attributes should be now copied from the reference in IsoQuant 3.6.1. |
This is what I assumed should happen, but it doesn't appear to be the case: my reference GTF has ~61k genes (GRCh38, gencode v39) but the output
extended_annotation.gtf
does not include all the known genes and transcripts (by a large margin: 23k genes). Is there some filtering going on here?The text was updated successfully, but these errors were encountered: