New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extracting exons result is different from manual extraction #60
Comments
Hi, Short answer: Gene Detailed answer: A guick grep on the GTF file shows that gene
Using GenomicFeatures:
Extract all the distinct exons for gene
As you can see, the 8th and 9th exons share the same genomic location but Ensembl treats them as distinct exons. Hope this helps, sessionInfo()
|
Wow, that's a really impressive answer. Thanks so much for taking your time for this. This really helps :). Just to clarify this maybe. If I would now take only the non overlapping parts of these exons, these effect shouldn't be there anymore? |
That's correct:
So 11 non-overlapping exonic parts for this gene. The 8th part ( H. |
Ok, I manages to write the code, so it produces the exact same results (because I wanted to check, what you code is doing :)). However, I am trying to figure out which GENCODE annotations would be better for this purpose. I compared |
Yeah so this kind of question is really best asked on the Bioconductor support site. I suppose people will need a little bit more context, e.g. what are you trying to achieve exactly, in order to give advice about which file is best. Thanks! |
If I try to extract the length of all exons (also those overlapping) using your package with this code and this gencode file
the result is
[1] 75 84 99 108 135 189 189 199 312 2878 2879
(11 exons).If I do it manually (with cleaning the file first using
gzcat gencode.v43.basic.annotation.gff3.gz | grep -v # > gencode.v43.basic.annotation.clean.gff
The result is
[1] 75 84 99 108 135 189 199 312 2878 2879
(10 exons). Without the unique command I retain 15 exons. So both is different from yours. I can't figure out where you got that additional exon from. the unique command is necessary if two exons fromHAVANA
andENSEMBL
are exactly identical. I am not able to reproduce the results from your package manually.This i just exemplary, but I get slightly different numbers for most genes.
The text was updated successfully, but these errors were encountered: