Skip to content

Records for languages like SMILES, HGVS, and SPDI #460

@sierra-moxon

Description

@sierra-moxon

Background:

SPDI (Sequence Position Deletion Insertion) nomenclature and HGVS (Human Genome Variation Society) nomenclature are two standards that when used correctly, can uniquely identify sequence variants. The HGVS and SPDI nomenclature provide a short-hand notation for capturing: the genome, assembly, position, and sequence change of a sequence variant. In this way, they are a kind of identifier.

Motivation for Prefixes:

We have a group of users that would like to identify a prefix for either (or both of):

Use Cases:

continuing from biolink-model issue:biolink/biolink-model#1042

  1. Biolink Model cares about identifier prefixes and in particular, uses them to document data sources that provide each class via the "id_prefixes" construct: https://linkml.io/linkml-model/docs/id_prefixes/
  2. Software built to find identifier equivalences between data sources for the NCATS Data Translator project (the Node Normalizer), uses Biolink Model prefix lists to:
    • return the preferred CURIE for this entity
    • return all other known equivalent identifiers for the entity
    • return semantic types for the entity as defined by the Biolink Model

It would be helpful to be able to reuse the existing architecture above to place HGVS and SPDI "identifiers" in their appropriate biolink model classes and normalize them in the context of other sequence variant identifiers in disparate data sets across NCATS Data Translator.

Challenges:

  1. There is no service or site that resolves these identifiers to some sort of informational page about the sequence variant, however, SPDI does have an API that gives a JSON data structure that reflects the content of the SPDI nomenclature of particular variants.
  2. Other groups also mint identifiers for the variant described by this syntax. For example, Rat Genome Database has identifiers for Rat variants that resolve to detail pages about the variant: https://rgd.mcw.edu/rgdweb/report/rgdvariant/main.html?id=RGD:14349032. That same rat variant can also have HGVS nomenclature: https://www.alliancegenome.org/allele/RGD:1600311#genomic-variant-information
  3. For HGVS nomenclature, there are three sites (perhaps more) that describe the standard:

Questions:

Does this group have any opinions on a registered prefix for an identifier that isn't resolvable (and isn't a parked prefix, meaning, there is no service that plans to support the expansion of the registered prefix)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions