Representation of isoforms in biolink-model #230

cmungall · 2019-06-27T19:49:26Z

See current:

https://w3id.org/biolink/vocab/ProteinIsoform

GeneProduct
- GeneProductIsoform

We should probably have a sibling for canonical/reference. This is very useful if you want to have constraints that say "I expect a UniProtKB:xxx-N here" vs "I expect a UniProtKB:xxx".

I want to make sure we use the right language here, re isoform, proteoform, variant, canonical, reference.

Particularly: is the -1 considered an isoform? If so we need consistent terminology to distinguish the unsuffixed form, the -1, and the -N where N>1. Is 'canonical isoform' the right terminology.

We also want to be consistent in how we name different sequence forms vs ptms. We should align with the PRO categories here.

Refs:

cc @nataled @JervenBolleman .

Note to OBO folks: blmod is a schema rather than upper ontology. In an OBO ontology, we would just have protein, with subclasses being the kinds of things in PRO. In OBO terms it may help as thinking of blmod incorporating metaclasses, i.e. the instances of https://w3id.org/biolink/vocab/ProteinIsoform are the uniprot Pnnn-N entries

The text was updated successfully, but these errors were encountered:

cmungall · 2019-07-03T00:38:47Z

See also https://proconsortium.org/PRO_QA.pdf

cbizon · 2019-08-19T15:04:10Z

Do we need a new type for isoforms? I think that the real distinction, if we need one, is between a GeneProduct at the level of a particular sequence or chemical structure, or a GeneProduct at the level of family or group of structures. The SwissProt id's are at some kind of group level (usually, but not always gene). The -1 -2 identifiers are at the sequence level. For chemicals, CHEBI for instance doesn't distinguish between different levels (family vs specific structure). If there's a SMILES or Inchi, you can say what level it is.

I'd be tempted to make canonical sequence a property b/c it's up to a somewhat arbitrary choice, and therefore different groups will probably come up with different answers. Seems easier to manage with different properties?

nataled · 2019-08-19T15:51:35Z

With respect to the canonical sequence as property suggestion, this is precisely what will be done in PRO. Specifically, every Swiss-Prot entry (lacking an isoform identifier) is considered a group level (with the caveats mentioned above), and these will each have a specific isoform tagged as canonical. At the moment, the proposed property is labeled 'has_canonical_sequence'. It would be PR:P12345 has_canonical_sequence UniProtKB:P12345-1.

deepakunni3 · 2020-07-06T22:00:12Z

@cmungall Perhaps this is relevant in light of UniProt and their cleavage products for SARS-CoV-2

sierra-moxon · 2021-08-31T21:44:30Z

@cbizon @cmungall - do our conversations about conflation change the status of this ticket at all? Should it still be a priority?

sierra-moxon · 2021-09-14T23:21:38Z

@cmungall - closing for now as we will have conflation group to hold gene+product+transcript and wait to add isoforms for now.

cmungall · 2022-09-16T15:25:41Z

Should we reopen in light of the UI discussion happening now on the relay (filtering ontological classes from display). We need a clear way to distinguish protein representations at the level users expect vs groupings

This was referenced Jun 27, 2019

Use a standard vocabulary and annotation property for indicating metaclass PROconsortium/PRoteinOntology#150

Open

Change metadata to improve text on annotation downloads page geneontology/go-site#1018

Closed

cmungall added a commit that referenced this issue Jun 29, 2019

isoforms #230

6594da9

deepakunni3 added the discussion label Jul 13, 2020

cmungall mentioned this issue Sep 23, 2020

Document modeling patterns for species-generic proteins #458

Closed

cmungall mentioned this issue Mar 24, 2021

Add an attribute to add specificity in ontology hierarchy, including distinguishing 'entities' from 'groupings' #691

Open

sierra-moxon closed this as completed Sep 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Representation of isoforms in biolink-model #230

Representation of isoforms in biolink-model #230

cmungall commented Jun 27, 2019

cmungall commented Jul 3, 2019

cbizon commented Aug 19, 2019

nataled commented Aug 19, 2019

deepakunni3 commented Jul 6, 2020

sierra-moxon commented Aug 31, 2021

sierra-moxon commented Sep 14, 2021

cmungall commented Sep 16, 2022

Representation of isoforms in biolink-model #230

Representation of isoforms in biolink-model #230

Comments

cmungall commented Jun 27, 2019

cmungall commented Jul 3, 2019

cbizon commented Aug 19, 2019

nataled commented Aug 19, 2019

deepakunni3 commented Jul 6, 2020

sierra-moxon commented Aug 31, 2021

sierra-moxon commented Sep 14, 2021

cmungall commented Sep 16, 2022