Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A general solution to the databases as ontologies problem in bioregistry #1104

Open
cmungall opened this issue Apr 29, 2024 · 1 comment
Open

Comments

@cmungall
Copy link
Contributor

There is frequently a need to represent entities from a database as an ontology

See:

There are a lot of factors to condense here but some key points

  • idspace overloading between a database and its ontology representation can cause issues
  • ideally we would like upstream database to own the ontology representation, in practice this is likely to never happen, necessitating competing-cooperating alternative translations with no agreement on axioms
  • bioregistry conflates URLs for humans with semantic URIs

I propose that the bioregistry datamodel is extended to include inlined sub-records for ontology or KG translations of databases. These subrecords would have additional metadata to indicate the source (3rd party vs official vs quasi-official)

One case would be 3rd party ontology rendering with reminted prefixed IDs:

ncbitaxon:
   url: <official NCBI URL>
   renderings:
     - provider: obo
        type: ontology
        documentation: ...
        subset: COMPLETE
        download_url: <OBO ontology PURL>
        prefixmap:
            NCBITaxon: <OBO PURL>
     - provider: umls
        ...
ncit:
   url: <official NCIT URL>
   renderings:
    -  provider: obo
        type: ontology
        documentation: ...
        subset: COMPLETE
        download_url: <OBO ontology PURL>
        prefixmap:
            NCIT: <OBO PURL>

These renderings could even be first class entries as far as the bioregistry UI is concerned, e.g. obo$NCBITaxon (but obviously this wouldn't be used as a prefix)

Another would be 3rd part ontology renderings where the same prefixes and URL expansions are used:

rhea:
   url: <official RHEA URL>
   renderings:
     - provider: biopragmatics
        type: ontology
        documentation: currently this includes all annotations but this is under discussion https://github.com/biopragmatics/pyobo/issues/170
        subset: COMPLETE

here there is no bespoke prefixmap so the standard RHEA ones would be used.

perhaps controversially:

uniprotkb:
   url: <official uniprot URL>
   renderings:
    -  provider: pr
        type: ontology
        documentation: PRO classes at "species-gene" level generally use same local ID as uniprotkb
        subset: OVERLAP
        bioregistry_entry: pr

here this would be a link between 2 existing overlapping bioregistry entries

This scheme could also be used for KG renderings of databases in formats that are more suited than OWL (e.g. kgx, rdfstar with owlstar semantics)

Note that in cases for entries that are "born" ontologies we would not curate this info, this would be considered a reflexive relation

@matentzn
Copy link
Collaborator

I have not absorbed your proposal quite yet, but

bioregistry conflates URLs for humans with semantic URIs

While this is mostly true its not quite true conceptually:

"goche": {
    "contributor": {
      "email": "cthoyt@gmail.com",
      "github": "cthoyt",
      "name": "Charles Tapley Hoyt",
      "orcid": "0000-0003-4423-4370"
    },
    "description": "Represent chemical entities having particular CHEBI roles",
    "download_owl": "https://raw.githubusercontent.com/geneontology/go-ontology/master/src/ontology/imports/chebi_roles.owl",
    "example": "25512",
    "homepage": "https://github.com/geneontology/go-ontology",
    "name": "GO Chemicals",
    "pattern": "^\\d+$",
    "preferred_prefix": "GOCHE",
    "rdf_uri_format": "http://purl.obolibrary.org/obo/GOCHE_$1",
    "references": [
      "https://obo-communitygroup.slack.com/archives/C023P0Z304T/p1638472847049400",
      "https://github.com/geneontology/go-ontology/issues/19535"
    ],
    "repository": "https://github.com/geneontology/go-ontology",
    "synonyms": [
      "go.chebi",
      "go.chemical",
      "go.chemicals"
    ],
    "uri_format": "https://biopragmatics.github.io/providers/goche/$1"
  },

Check rdf_uri_format.

This does not entirely change the issue, just adding an additional layer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants