Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing OMIM info #1833

Closed
4WGH opened this issue Mar 25, 2020 · 11 comments
Closed

missing OMIM info #1833

4WGH opened this issue Mar 25, 2020 · 11 comments
Assignees

Comments

@4WGH
Copy link

4WGH commented Mar 25, 2020

https://scout.scilifelab.se/cust003/15041/sv/variants/a74106f85d532d2a7aa97225c1fdbab1

no OMIM info in Scout
bild

the gene is in OMIM
https://www.omim.org/entry/606810

@dnil dnil added the bug label Mar 25, 2020
@dnil
Copy link
Collaborator

dnil commented Mar 25, 2020

Thank you! I confirm it is missing on this variant and a couple of others in the same case. Several others ok however, both in this case and other recent cases. Note that this is a fairly old case with a recent rerun, so reupload issues can be of interest.

@northwestwitch northwestwitch self-assigned this Mar 30, 2020
@northwestwitch
Copy link
Member

I'm trying to figure out at which stage and why the omim info gets lost for some genes.

For instance we know that correct OMIM data gests parsed for genes such as CDC45 (https://scout.scilifelab.se/genes/1739) but not for PRODH (https://scout.scilifelab.se/genes/9453)

Following the gene parsing process looks like this:
scout.load_hgnc_genes calls the link genes function:

genes = link_genes(

That is responsible for annotating the omim phenotypes, specifically here:

add_omim_info(genes, symbol_to_id, genemap_lines, mim2gene_lines)

I've been printing debug messages and looks like the gene contains a "phenotype" key with the correct values in this latter function, but then when everything goes back to thescout.load_hgnc_genes and
it loops thru the progressbar the info is somehow lost and doesn't reach the build_hgnc_gene (line 154):

gene_obj = build_hgnc_gene(gene_data, build=build)

I've been spending quite some time on this and now omim locked me out for having done too many requests. I could use some help here!

@northwestwitch
Copy link
Member

@moonso when you have some time could you take a look at this? This weird behavior of PRODH gene is still a mistery to me

@dnil
Copy link
Collaborator

dnil commented Apr 2, 2020

Meh, weird! Mystery is the right word! So far:

  • all looks ok in the sv vcf; PRODH is annotated for the variant PRODH|ENSG00000100033|Transcript|ENST00000357068
  • db entry for the variant is ok as far as I can tell; gene is there
  • db entry for the disorder is ok - gene is there
  • behaviour in stage is the same as prod

There might be something weird with the OMIM entries, esp some confusion with PRODH2; let me check that.

One worrying detail is that the gene has two phenotype entries in OMIM, one of which is the one we are looking for and the other an association, a ´{pheno}´ which were not correctly parsed in some waaaay old version. The latter is correctly ditched and not loaded to the db, but I can't shake the feeling that may also be connected.

@dnil
Copy link
Collaborator

dnil commented Apr 2, 2020

No, can't reproduce the PRODH2/PRODH OMIM number thing. Leaving that.

@dnil
Copy link
Collaborator

dnil commented Apr 2, 2020

Just for the record, it is indeed a thing:
Screenshot 2020-04-02 at 11 19 24
Screenshot 2020-04-02 at 11 18 58
Screenshot 2020-04-02 at 11 16 11
We need to sharpen the routines that deal with mapping disorder to hgnc_id to gene_symbol when old aliases point to new genes.

@northwestwitch
Copy link
Member

When the dictionary of aliases is created, it contains these items:

'HSPOX2': {'true': None, 'ids': {9453}},
'PIG6': {'true': None, 'ids': {14524, 9453}},
'PRODH1': {'true': None, 'ids': {9453}},
'PRODH': {'true': 9453, 'ids': {9453}},
'PRODH2': {'true': None, 'ids': {9453, 17325}, 'true_id': 17325},
'TP53I6': {'true': None, 'ids': {9453}},

@dnil
Copy link
Collaborator

dnil commented Apr 2, 2020

Shouldn't

alias_genes[alias.upper()]["true_id"] = hgnc_id

assign 'true' rather than 'true_id' then?

@dnil
Copy link
Collaborator

dnil commented Apr 2, 2020

(testing that)

@dnil
Copy link
Collaborator

dnil commented Apr 2, 2020

😅
Screenshot 2020-04-02 at 13 40 35
Ahhh, the relief!

@dnil
Copy link
Collaborator

dnil commented Apr 2, 2020

PR upcoming..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants