Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handling ID-prefixes in BTE in a consistent, straightforward way #591

Open
colleenXu opened this issue Mar 22, 2023 · 4 comments
Open

handling ID-prefixes in BTE in a consistent, straightforward way #591

colleenXu opened this issue Mar 22, 2023 · 4 comments
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request needs discussion x-bte

Comments

@colleenXu
Copy link
Collaborator

colleenXu commented Mar 22, 2023

BTE currently:

  • in most cases, it represents entity IDs (like a specific disease, gene, etc) in curie format (prefix:ID). Specifically, it uses the Translator/biolink-model capitalization and spelling for the prefix + ID
  • but in a few cases, including sub-query generation, BTE will remove the prefix for specific ID-namespaces while keeping it for others
    • not sure if we've figured out all the areas where this happens, or all the areas where the ID is required to be a certain format in order for things to function....
    • but sub-query generation + api-response-transform are two known important areas, because the APIs we're querying may not use ID-prefixes or the capitalization/spelling that Translator does...

This can easily lead to bugs and is confusing. With the current big code update, it's not clear if there are some bugs / unexpected behavior with ID-prefixes....which prompted the discussion + writing this issue.


2023-03-22 discussions in group meeting and afterwards:

  • getting the shared understanding written above
  • I propose that in the case of sub-query generation (specifically the format of string values in queryInputs), BTE should remove the prefix from all ID-namespaces (rather than keeping prefixes for some)
    • It'll be easier to adjust the x-bte annotation because I can then search repos for the prefixes in the linked list and adjust only those operations
  • @tokebe said we'll still need to review what's going on with ID-prefix handling before making decisions / code-changes.

Seems like a design decision between @tokebe and me, although @andrewsu and @newgene can weigh in.


Notes:

@colleenXu
Copy link
Collaborator Author

Also a consideration: do pending APIs (and Multiomics / Text-mining ones) need to keep both a ontology-specific field (where the value is probably not prefixed) and a general "ID" field for nodes?

Like here (the id vs HP field):

  "object": {
    "HP": "0000360",
    "id": "HP:0000360",
    "name": "Tinnitus",
    "type": "biolink:PhenotypicFeature"
  }

@colleenXu
Copy link
Collaborator Author

CC @tokebe @andrewsu @newgene

BTE previously automatically added the prefix to RHEA IDs, and at some point, stopped (this is the confusing part!)

I only noticed today while checking on another issue (primary knowledge source). And I tested / made this commit to add the needed prefix for operations using RHEA IDs as input: NCATS-Tangerine/translator-api-registry@d78fd51

I don't know the scope of this issue without doing a review of all the x-bte operations...my own musings below:

  • the comments I left on (almost all?) yamls should help me find these cases where we relied on BTE automatically adding the prefix....or I could go through all the operations using https://github.com/biothings/biomedical_id_resolver.js/blob/main/src/config.ts#L4 ID-namespaces as input (dunno if output is an issue too?)
  • is there any issue with using addPrefix()? using replPrefix would be safer (avoid cases of adding prefix twice like "CHEBI:CHEBI:1234")
    • this is an easy find-replace. looks like only two multiomics yamls are currently using addPrefix: drug-response and ehr risk

@colleenXu
Copy link
Collaborator Author

Noticed issue with GO prefixes and fixed by ensuring prefix is added to sub-queries NCATS-Tangerine/translator-api-registry@3703119

@colleenXu
Copy link
Collaborator Author

Noticed issue with MP prefix and fixed by ensuring prefix is added to sub-queries NCATS-Tangerine/translator-api-registry@72748a1

@tokebe tokebe added the enhancement New feature or request label Aug 1, 2023
@colleenXu colleenXu added bug Something isn't working documentation Improvements or additions to documentation labels Aug 10, 2023
@andrewsu andrewsu removed the bug Something isn't working label Sep 6, 2023
@tokebe tokebe self-assigned this Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request needs discussion x-bte
Projects
None yet
Development

No branches or pull requests

3 participants