Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessing gene name consistently via PIR ID #126

Open
nataled opened this issue Mar 6, 2017 · 7 comments

Comments

Projects
None yet
1 participant
@nataled
Copy link
Collaborator

commented Mar 6, 2017

Hello,

We would like to be able to access the pombe gene name consistently given the PR:ID.

We have noticed that the name is usually included, but not in a specific filed. Is it possible to do this in any way?

Thanks

Val

Reported by: ValWood

@nataled

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 6, 2017

Hi Val,

Can you provide a few examples of PRO terms that have the gene name somewhere? I can easily find them myself because I know which terms should have them, but it is possible they are also found in terms that won't necessarily have them. Well, put more pragmatically, do you see them in an inconsistent way in terms NOT marked with Category=gene or Category=organism-gene? If not then I have an idea of what we can do to mark them. I'm guessing you see the genes indicated in the synonym field, is that correct?

Best regards,
Darren

Original comment by: nataled

@nataled

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 3, 2017

Hi Darren,

Apologies for the very delayed response. This fell ff my radar, but Midori has looked out some examples to illustrate our problem.

Best,

Val


What I noticed is that terms marked Category=gene or Category=organism-gene have the gene name somewhere, but not always in exactly the same tag.

For Category=gene, sometimes the pombe gene name is present as the EXACT PRO-short-label synonym, but for others a gene name from a different species (often S. cerevisiae) is the EXACT synonyms and the pombe name is a RELATED synonym. Maybe that's not such a big deal if we should be concentrating on Category=organism-gene terms, but then we may have to review to see whether we have more to request.

examples:
[Term]
id: PR:000027499
name: mitogen-activated protein kinase HOG1
def: "A p38-like stress-activated mitogen-activated protein kinase that is a translation product of the yeast HOG1 gene or a 1:1 ortholog thereof." [PMID:10207620, PRO:CNA]
comment: Category=gene. The gene HOG1 in S. pombe is named sty1.
synonym: "HOG1" EXACT PRO-short-label [PRO:DNx]
synonym: "MAP kinase spc1" EXACT []
synonym: "STY1" RELATED []
is_a: PR:000000001 ! protein

[Term]
id: PR:000027605
name: mediator of replication checkpoint protein 1
def: "A protein that is a translation product of the Schizosaccharomyces pombe 972h- mrc1 gene or a 1:1 ortholog thereof." [PRO:CNA]
comment: Category=gene. Requested by=PomBase.
synonym: "DNA replication checkpoint mediator mrc1" EXACT []
synonym: "mrc1" EXACT PRO-short-label [PRO:DNx]
is_a: PR:000000001 ! protein

For Category=organism-gene, most of the pombe terms have an EXACT PRO-short-label synonym consisting of the pombe gene name with the prefix "Spom-". I haven't yet spotted any pombe Category=organism-gene terms that don't have a PRO-short-label synonym, but I have found a few that don't use the standard pombe gene name. That would throw us off.

example - OK:
[Term]
id: PR:000027596
name: histone H3.3 (Schizosaccharomyces pombe)
def: "A fungal histone H3.3 that is encoded in the genome of Schizosaccharomyces pombe." [PMID:11242054, PMID:20929775, PomBase:MAH]
comment: Category=organism-gene. Requested by=PomBase.
synonym: "Spom-hht3" EXACT PRO-short-label [PRO:DNx]
is_a: PR:000027595 ! fungal histone H3.3
is_a: PR:000041293 ! core histone (Schizosaccharomyces pombe)

2 examples - standard gene name in PRO term name but not the short-label synonym:
[Term]
id: PR:000029999
name: DNA repair protein Crb2 (Schizosaccharomyces pombe)
def: "A tumor suppressor p53-binding protein 1 that is encoded in the genome of Schizosaccharomyces pombe." [PRO:DAN]
comment: Category=organism-gene.
synonym: "DNA repair protein rhp9 (Schizosaccharomyces pombe)" EXACT [PRO:DNx]
synonym: "Spom-TP53BP1" EXACT PRO-short-label [PRO:DNx]
is_a: PR:000000001 ! protein

[Term]
id: PR:000030002
name: serine/threonine-protein kinase cds1 (Schizosaccharomyces pombe)
def: "A serine/threonine-protein kinase Chk2 that is encoded in the genome of Schizosaccharomyces pombe." [PRO:DAN]
comment: Category=organism-gene.
synonym: "Spom-CHEK2" EXACT PRO-short-label [PRO:DAN]
is_a: PR:000000001 ! protein

example - the term name, definition, and PRO-short-label use the S.c. name, and the correct pombe name is only in another synonym:
[Term]
id: PR:O14216
name: DNA replication regulator sld2 (Schizosaccharomyces pombe 972h-)
alt_id: PR:000027524
def: "A DNA replication regulator sld2 that is encoded in the genome of Schizosaccharomyces pombe 972h-." [PMID:11937031, PomBase:MAH]
comment: Category=organism-gene. Requested by=PomBase.
synonym: "DNA replication regulator drc1" EXACT []
synonym: "SPAC6B12.11" RELATED []
synonym: "Spom972h-SLD2" EXACT PRO-short-label [PRO:DNx]
xref: UniProtKB:O14216
is_a: PR:000027523 ! DNA replication regulator sld2
is_a: PR:000029043 ! Schizosaccharomyces pombe 972h- protein

We also use quite a few PRO terms that are marked Category=organism-modification, and it would be really nice to be able to retrieve gene names for those as well. I think we could parse away the "Spom-" prefix and the "/[modification]" suffixes easily enough if that Spom-[gene-name]/[modification] syntax is consistent, and the PRO-short-label synonyms use standard gene names. But at the moment not all do.

OK:
[Term]
id: PR:000027516
name: transcriptional regulator prz1 unmodified form (Schizosaccharomyces pombe)
def: "A transcriptional regulator prz1 unmodified form in Schizosaccharomyces pombe." [PMID:12637524, PomBase:MAH]
comment: Category=organism-modification. Requested by=PomBase.
synonym: "Spom-prz1/UnMod" EXACT PRO-short-label [PRO:DAN]
is_a: PR:000000001 ! protein

correct name only in a synonym other than the PRO-short-label (related to example above):
[Term]
id: PR:000027526
name: DNA replication regulator sld2 unmodified form (Schizosaccharomyces pombe)
def: "A DNA replication regulator sld2 unmodified form in Schizosaccharomyces pombe." [PMID:11937031, PomBase:MAH]
comment: Category=organism-modification. Requested by=PomBase.
synonym: "DNA replication regulator drc1 unmodified form (Schizosaccharomyces pombe)" EXACT []
synonym: "Spom-SLD2/UnMod" EXACT PRO-short-label [PRO:DAN]
is_a: PR:000000001 ! protein


Original comment by: ValWood

@nataled

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 4, 2017

Hi Val, Midori,
The PRO-short-label is designed to give some indication of orthology, so whenever a pombe term is orthologous to a previously-existing term in PRO, the label reflects that (with, as you noted, Spom- prepended to it). The exception is when a term is defined based only on the encoding gene, in which case the actual gene name in that organism is used. Thus, it is not useful for your purpose. As things stand right now, you should be able to reliably grab organism-gene terms (the ones roughly equivalent to UniProtKB entries but made specifically to the species level). All of these have a line in the stanza with has_gene_template and the official PomBase identifier and gene name. I'm not sure why, but the stanza examples you provided above seem to lack this line. I verified that our downloads do have it. Can you tell me where you get your downloads from so I can track down the issue?

The only way to get the desired gene name from an organism-modification term at the moment would be to use our SPARQL interface. I think it should be possible to ask the question "what is the name of the gene given in the organism-gene ancestor of term X?" Are you familiar with the interface?

I've actually been considering making the labels based on the name of the gene in the given organism. I will raise the discussion with the consortium members.

Original comment by: nataled

@nataled

This comment has been minimized.

Copy link
Collaborator Author

commented May 1, 2017

Hi Val, Midori,
I tought myself how to do SPARQL queries, at least enough to create the ones you need. If you go to the page http://pir.georgetown.edu/pro/pro_sparql.shtml there are a number of example queries. Each is designed to do some common database retrieval. The two at the bottom (#12 and #13) will be of most interest to you. #12 will return all PRO terms that represent proteins encoded by a particular gene of interest (which must be entered as a model organism database identifier). #13 is the one you most directly requested above. Given a PRO identifier, what gene encoded that protein? Directions for use are provided within the query (click "show query").

I hope this will be useful to you. It was fun making them! I know...I'm weird.

Original comment by: nataled

@nataled

This comment has been minimized.

Copy link
Collaborator Author

commented May 5, 2017

That's OK, we are all weird here too ;)

I'm going to link to this ticket on the relevent tickets on our tracker....our developer (Kim Rutherford) will then be able to assess whether this will do the trick...

Basically we are looking for a way to automate sensible display of PRO names on our Gene pages.

More later when we get to these tickets.

Thanks for looking into this for us.

Val

Original comment by: ValWood

@nataled

This comment has been minimized.

Copy link
Collaborator Author

commented May 5, 2017

FYI
pombase/website#67

Original comment by: ValWood

@nataled

This comment has been minimized.

Copy link
Collaborator Author

commented May 5, 2017

And, just to attack on two fronts, the change-over from the current orthology-based PRO-short-labels to the organism-specific gene-based labels has been approved. These should go live in our next release (which, just so you don't get too excited, has not yet been scheduled--figure about a month).

Original comment by: nataled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.