Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Relating PRO to UniProt #165
This will be possible once we find out just what the UniProt PURLS intend to mean. I recall @JervenBolleman saying he considers them to mean the same as PRO when he gives talks, but I'm not sure there's agreement on that (several people on the previous thread--myself included--indicated that they consider them as referring to database entries). In PRO we consider them exactly that--database entries that are about some protein class (for example, http://purl.uniprot.org/uniprot/P05067 is_about http://purl.obofoundry.org/obo/PR_P05067).
My main concern is that the UniProt PURLs might be overloaded in meaning. That is, some people consider them to refer to classes of proteins, some say they refer to database entries, and others might consider them as referring to sequences . If they are database entries, fine, but for PRO purposes we'll need a way to refer to the sequence. If they are protein classes, fine, we'll provide the appropriate equivalency statements, but we'll still need a way to refer to the sequence. If they are sequences, fine, we'll make the appropriate connection. I recall @cmungall suggesting that for the sequences we use a URL such as https://www.uniprot.org/uniprot/P05067.fasta?version=1. That would be fine, but there are also these things: http://purl.uniprot.org/isoforms/P05067-1. I asked if that PURL is intended to represent the (current) sequence, or intended to represent the class of proteins derived from that isoform. I did not get an answer.
Our IDs can do dual duties as representing database entities and things in nature. There is no need to get meta and introduce an extra layer of indirection. Or at least I am not aware of such a use case, where someone really needs to track both these things and keep them distinct.
I think the sequence vs protein molecule aspect is a bit more nuanced
@cmungall asked "What are the semantics of a non-GCRP trembl ID according to PRO?"
TrEMBL entries fall into the following types:
A) If there already exists a Swiss-Prot entry describing the products of some gene G (SP_of_G), then the TrEMBL entry describing a product of the same gene (Tr_of_G) can be:
B) If no Swiss-Prot entry describes the products of the TrEMBL gene, then the TrEMBL entry describing a product of that gene (Tr_of_G) can be:
C) If no gene is indicated in the TrEMBL entry (call it TrX), then...
Technically speaking, TrEMBL entries (like some Swiss-Prot) can also describe fragments.
I'm going to post a strawman proposal:
PRO gene-level protein classes and UniProt canonical/GCRP entries are to be considered equivalent in the strict OWL sense. (ergo the URIs could be collapsed with no loss of logical entailment and no introduction of inconsistency. This would be a win as the community would not have to make an arbitrary selection between two distinct PURLs/CURIEs)
Ontologically these are protein classes, which are material entity classes (as is currently the case in PRO)
(The uniprot docs talk about these as sequences, which is perfectly valid as the main use case for these involves treating them as sequences, but in the ontological treatment, the sequence would be a property of the material entity)
They are the superclasses of isoform classes (as they are now, in PRO)
The isoform level classes in PRO would be equivalent to the uniprot isoform entries (e.g. P12345-1)
There could be some kind of has-canonical-form relationship between the main class and isoform-1 (see http://purl.obolibrary.org/obo/RO_0002214)
Note that at the database level, the canonical entry will have annotations for things such as protein domains, functions, etc. At the ontological level this will not be taken to mean that all instances of that protein have those properties. Otherwise we end up with logical inconsistencies. Instead it will be a some-some.
Note that neither resource needs to make any changes to implement this. It would be a semantic MOU about ontological commitment of PURLs. And both would agree not to publish logical axioms that introduce logical inconsistencies.
However, if both parties agree, then there is a strong case for PRO switching from PRO purls for gene-level to instead use uniprot PURLs.