Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add prefix uniprot.mnemonic #1110

Open
bgyori opened this issue May 5, 2024 · 4 comments
Open

Add prefix uniprot.mnemonic #1110

bgyori opened this issue May 5, 2024 · 4 comments
Labels
New Used in combination with prefix, metaprefix, or collection for new entries Prefix

Comments

@bgyori
Copy link
Contributor

bgyori commented May 5, 2024

Prefix

uniprot.mnemonic

Name

UniProt

Homepage

https://www.uniprot.org/

Source Code Repository

No response

Description

The UniProt Knowledgebase (UniProtKB) is a comprehensive resource for protein sequence and functional information with extensive cross-references to more than 120 external databases. Besides amino acid sequence and a description, it also provides taxonomic data and citation information.

This entry represents UniProt mnemonics which combine an alphanumeric representation the protein name and a species identification code representing the biological source of the protein.

License

No response

Publications

pubmed:16381842

Example Local Unique Identifier

BRAF_HUMAN

Regular Expression Pattern for Local Unique Identifier

^[A-Z0-9]{1,10}_[A-Z0-9]{1,5}$

URI Format String

https://www.uniprot.org/uniprotkb/$1

Wikidata Property

No response

Contributor Name

Benjamin M. Gyori

Contributor GitHub

bgyori

Contributor ORCiD

0000-0001-9439-5346

Contributor Email

b.gyori@northeastern.edu

Contact Name

No response

Contact ORCiD

No response

Contact GitHub

No response

Contact Email

No response

Additional Comments

No response

@bgyori bgyori added Prefix New Used in combination with prefix, metaprefix, or collection for new entries labels May 5, 2024
@bgyori
Copy link
Contributor Author

bgyori commented May 5, 2024

Some comments on this entry: UniProt mnemonics are used often when referring to proteins in various places (e.g., in proteomics data sets). Mnemonics can also be resolved on uniprot.org. Their validation and resolution is of broad interest. The uniprot prefix represents UniProt accession numbers (in UniProt's own terminology).

In this sense, the relationship of uniprot vs uniprot.mnemonic is similar to hgnc identifiers vs hgnc.symbol-s which have two separate entries in the Bioregistry.

One issue this raises is that given that the resolver URL for both accession numbers and mnemonics is https://www.uniprot.org/uniprotkb/$1, this can cause confusion when reverse-mapping URLs to prefixes.

@JervenBolleman
Copy link

I just want to note that these are not stable, nor guaranteed to point to the same entity over time.

@cmungall
Copy link
Contributor

There was a previous discussion on resolvers for expressions

The thinking at that time was that this was only 2/5 in scope - it seems resolving on things like gene symbols (or symbol-species tuples) is even less in scope? At least expression symbols have a fixed semantics, but as @JervenBolleman says, label/name/symbol lookup is inherently unstable.

I think if we do add these we need a clear mechanism to distinguish identifier prefixes from expression prefixes from label lookup prefixes, communicating which are stable.

But IMO this would be scope creep. There is room for a general lookup or name resolver service (this is something we do in NCATS Translator - https://name-resolution-sri.renci.org/docs) but I think this should be separate from identifier standardization and resolution.

@bgyori
Copy link
Contributor Author

bgyori commented May 31, 2024

I am also on the fence about this one. Let me explain why I opened this: I was trying to demo how you can take a data set and annotate its various experimental factors (proteins, drugs, cell lines, etc.) with identifiers. In this data set, proteins are identified using UniProt mnemonics - which may be bad practice but is widespread. I think UniProt mnemonics occupy a special place in which they are kind of like identifiers: they have well defined syntax patterns and can be independently resolved through a URI pattern. If you do use UniProt mnemonics in some context, it's hugely more useful to be able to use a CURIE like uniprot.mnemonic:BRAF_HUMAN (and leverage validation, resolving, etc.) than just BRAF_HUMAN without any additional context. A second argument is that UniProt mnemonics vs UniProt IDs (accession numbers) are somewhat analogous to HGNC symbols vs HGNC IDs and HGNC symbols have their own entry and resolver (see e.g., https://bioregistry.io/registry/hgnc.symbol which harmonizes across all the external registries that have entries for it as well:
image
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
New Used in combination with prefix, metaprefix, or collection for new entries Prefix
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants