Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entrez and Ensembl mappings #39

Closed
stuppie opened this issue Jul 3, 2018 · 6 comments
Closed

Entrez and Ensembl mappings #39

stuppie opened this issue Jul 3, 2018 · 6 comments
Assignees

Comments

@stuppie
Copy link

stuppie commented Jul 3, 2018

Looking at 10168, it has ensembl mappings to both ENSG00000186448 and ENSG00000281709. However, looking at ensembl and entrez records about these genes:
https://www.ncbi.nlm.nih.gov/gene?cmd=retrieve&dopt=default&list_uids=10168
https://uswest.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000186448
https://uswest.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000281709

I only see the single 1-to-1 mappings between 10168 and ENSG00000186448. Is this right?

See: SuLab/scheduled-bots#19

@sirloon
Copy link
Member

sirloon commented Aug 14, 2018

according to ensembl (as of rel 93), file gene_ensembl__xref_entrezgene__dm.txt, we have both associations:

(venv) mygene@su05:/opt/genedoc-hub/ensembl/93$ grep ENSG00000281709 gene_ensembl__xref_entrezgene__dm.txt
9606    ENSG00000281709 110354863
9606    ENSG00000281709 10168
(venv) mygene@su05:/opt/genedoc-hub/ensembl/93$ grep ENSG00000186448 gene_ensembl__xref_entrezgene__dm.txt
9606    ENSG00000186448 110354863
9606    ENSG00000186448 10168

@sirloon sirloon closed this as completed Aug 14, 2018
@sirloon sirloon reopened this Aug 14, 2018
@sirloon
Copy link
Member

sirloon commented Aug 14, 2018

after regenerating extra mapping file, we still have both:

(venv) mygene@su05:~/mygene.info/src$ grep ENSG00000186448 /opt/genedoc-hub/ensembl/93/gene_ensembl__gene__extra.txt 
ENSG00000186448 10168
(venv) mygene@su05:~/mygene.info/src$ grep ENSG00000281709 /opt/genedoc-hub/ensembl/93/gene_ensemb
l__gene__extra.txt                                                                                
ENSG00000281709 10168

@stuppie
Copy link
Author

stuppie commented Aug 14, 2018

I guess my question is "where do the mappings come from?", because I don't see it on Entrez's site. But the answer is:
the Entrez to Ensembl cross-references come from Ensembl, not from Entrez... Right?

@sirloon
Copy link
Member

sirloon commented Aug 14, 2018

cross-refs come from both ensembl -> entrez and entrez -> ensembl. I'm still investigating on this 10168 case, I'll get back to you soon.

@sirloon
Copy link
Member

sirloon commented Aug 15, 2018

I'll write a documentation page with details about how we build that mapping.

For this particular issue, the problem in the end comes from BioMart (we query ensembl Biomart to get ensembl data in general) reporting a association between ENSG00000281709 and 10168, whereas Ensembl website doesn't report the same according to what you've found. We'll contact Ensembl about this issue.

http://uswest.ensembl.org/biomart/martview/f433127fefb5a20fea7d6602ebfb2862?VIRTUALSCHEMANAME=default&ATTRIBUTES=hsapiens_gene_ensembl.default.feature_page.ensembl_gene_id|hsapiens_gene_ensembl.default.feature_page.entrezgene&FILTERS=hsapiens_gene_ensembl.default.filters.ensembl_gene_id."ENSG00000281709,ENSG00000186448"&VISIBLEPANEL=resultspanel

@sirloon sirloon closed this as completed Aug 15, 2018
@newgene
Copy link
Member

newgene commented Aug 16, 2018

FYI, I asked Ensembl helpdesk about this. Basically, they need to fix the content on the Ensembl gene page to be consistent with the underlying data (i.e. what we retrieved from biomart).
from ensembl helpdesk.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants