Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Representation of isoforms in biolink-model #230

Closed
cmungall opened this issue Jun 27, 2019 · 7 comments
Closed

Representation of isoforms in biolink-model #230

cmungall opened this issue Jun 27, 2019 · 7 comments

Comments

@cmungall
Copy link
Collaborator

See current:

https://w3id.org/biolink/vocab/ProteinIsoform

  • GeneProduct
    • GeneProductIsoform

We should probably have a sibling for canonical/reference. This is very useful if you want to have constraints that say "I expect a UniProtKB:xxx-N here" vs "I expect a UniProtKB:xxx".

I want to make sure we use the right language here, re isoform, proteoform, variant, canonical, reference.

Particularly: is the -1 considered an isoform? If so we need consistent terminology to distinguish the unsuffixed form, the -1, and the -N where N>1. Is 'canonical isoform' the right terminology.

We also want to be consistent in how we name different sequence forms vs ptms. We should align with the PRO categories here.

Refs:

cc @nataled @JervenBolleman .

Note to OBO folks: blmod is a schema rather than upper ontology. In an OBO ontology, we would just have protein, with subclasses being the kinds of things in PRO. In OBO terms it may help as thinking of blmod incorporating metaclasses, i.e. the instances of https://w3id.org/biolink/vocab/ProteinIsoform are the uniprot Pnnn-N entries

@cmungall
Copy link
Collaborator Author

cmungall commented Jul 3, 2019

See also https://proconsortium.org/PRO_QA.pdf

@cbizon
Copy link
Collaborator

cbizon commented Aug 19, 2019

Do we need a new type for isoforms? I think that the real distinction, if we need one, is between a GeneProduct at the level of a particular sequence or chemical structure, or a GeneProduct at the level of family or group of structures. The SwissProt id's are at some kind of group level (usually, but not always gene). The -1 -2 identifiers are at the sequence level. For chemicals, CHEBI for instance doesn't distinguish between different levels (family vs specific structure). If there's a SMILES or Inchi, you can say what level it is.

I'd be tempted to make canonical sequence a property b/c it's up to a somewhat arbitrary choice, and therefore different groups will probably come up with different answers. Seems easier to manage with different properties?

@nataled
Copy link

nataled commented Aug 19, 2019

With respect to the canonical sequence as property suggestion, this is precisely what will be done in PRO. Specifically, every Swiss-Prot entry (lacking an isoform identifier) is considered a group level (with the caveats mentioned above), and these will each have a specific isoform tagged as canonical. At the moment, the proposed property is labeled 'has_canonical_sequence'. It would be PR:P12345 has_canonical_sequence UniProtKB:P12345-1.

@deepakunni3
Copy link
Member

@cmungall Perhaps this is relevant in light of UniProt and their cleavage products for SARS-CoV-2

@sierra-moxon
Copy link
Member

@cbizon @cmungall - do our conversations about conflation change the status of this ticket at all? Should it still be a priority?

@sierra-moxon
Copy link
Member

@cmungall - closing for now as we will have conflation group to hold gene+product+transcript and wait to add isoforms for now.

@cmungall
Copy link
Collaborator Author

Should we reopen in light of the UI discussion happening now on the relay (filtering ontological classes from display). We need a clear way to distinguish protein representations at the level users expect vs groupings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants