-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Representation of isoforms in biolink-model #230
Comments
See also https://proconsortium.org/PRO_QA.pdf |
Do we need a new type for isoforms? I think that the real distinction, if we need one, is between a GeneProduct at the level of a particular sequence or chemical structure, or a GeneProduct at the level of family or group of structures. The SwissProt id's are at some kind of group level (usually, but not always gene). The -1 -2 identifiers are at the sequence level. For chemicals, CHEBI for instance doesn't distinguish between different levels (family vs specific structure). If there's a SMILES or Inchi, you can say what level it is. I'd be tempted to make canonical sequence a property b/c it's up to a somewhat arbitrary choice, and therefore different groups will probably come up with different answers. Seems easier to manage with different properties? |
With respect to the canonical sequence as property suggestion, this is precisely what will be done in PRO. Specifically, every Swiss-Prot entry (lacking an isoform identifier) is considered a group level (with the caveats mentioned above), and these will each have a specific isoform tagged as canonical. At the moment, the proposed property is labeled 'has_canonical_sequence'. It would be PR:P12345 has_canonical_sequence UniProtKB:P12345-1. |
@cmungall Perhaps this is relevant in light of UniProt and their cleavage products for SARS-CoV-2 |
@cmungall - closing for now as we will have conflation group to hold gene+product+transcript and wait to add isoforms for now. |
Should we reopen in light of the UI discussion happening now on the relay (filtering ontological classes from display). We need a clear way to distinguish protein representations at the level users expect vs groupings |
See current:
https://w3id.org/biolink/vocab/ProteinIsoform
We should probably have a sibling for canonical/reference. This is very useful if you want to have constraints that say "I expect a UniProtKB:xxx-N here" vs "I expect a UniProtKB:xxx".
I want to make sure we use the right language here, re isoform, proteoform, variant, canonical, reference.
Particularly: is the
-1
considered an isoform? If so we need consistent terminology to distinguish the unsuffixed form, the-1
, and the-N
where N>1. Is 'canonical isoform' the right terminology.We also want to be consistent in how we name different sequence forms vs ptms. We should align with the PRO categories here.
Refs:
cc @nataled @JervenBolleman .
Note to OBO folks: blmod is a schema rather than upper ontology. In an OBO ontology, we would just have protein, with subclasses being the kinds of things in PRO. In OBO terms it may help as thinking of blmod incorporating metaclasses, i.e. the instances of https://w3id.org/biolink/vocab/ProteinIsoform are the uniprot Pnnn-N entries
The text was updated successfully, but these errors were encountered: