Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using incompatible synonym sets #62

Open
patrickkwang opened this issue Aug 21, 2021 · 1 comment
Open

Using incompatible synonym sets #62

patrickkwang opened this issue Aug 21, 2021 · 1 comment
Labels
bug Something isn't working

Comments

@patrickkwang
Copy link
Contributor

We have a shared service that provides CURIE synonyms/equivalences: https://nodenormalization-sri-dev.renci.org/1.1/docs

However, we do not insist that all Translator components use it. There are at least a few components that use synonym sets that are inconsistent, in places, with those provided by the SRI normalizer.

This results in one common but minor issue where an ARA is unable to verify the results it receives from a KP, if the two components use different synonym sets. If the ARA simply trusts the KP, this could result in some odd knowledge states wherein entities are conflated or de-conflated(?) in unexpected ways. That's a bit troubling, but not a deal-breaker.

Recently, however, we've run into a more jarring issue arising from inconsistent synonymization. For computational convenience, TRAPI allows batching queries, typically by providing a list of CURIEs in the "ids" field of a query-graph node. When an ARA sends a batched query to a KP, it must afterward un-batch the response by identifying which results correspond to which sub-queries. This may be impossible if the KP and ARA use different synonymization schemes.

Example:

  1. ARA asks for genes associated with CHEBI:24996 or CHEBI:6801 (a batch query with two sub-queries)
  2. KP maps CHEBI:24996 to CHEMBL.COMPOUND:CHEMBL330546 and returns results including the latter CURIE
  3. ARA tries to map CHEMBL.COMPOUND:CHEMBL330546 to one of the two input CURIEs and fails - it knows what CHEMBL.COMPOUND:CHEMBL330546 is, and just doesn't believe it to be synonymous with CHEBI:24996
  4. ARA has no choice but to drop all of the CHEMBL.COMPOUND:CHEMBL330546 results

In this case, potential results were lost because the ARA and KP did not agree on synonym sets.

@patrickkwang patrickkwang added the bug Something isn't working label Aug 21, 2021
@cbizon
Copy link
Collaborator

cbizon commented Aug 23, 2021

After some discussion with Patrick, it seems like there are three independent ways to fix this:

  1. require that KP responses don't change the input curie
  2. require that all KPs use the same normalization scheme
  3. Modify how TRAPI handles batch requests, e.g. move from a single message with a list of curies to a list of messages each with a single curie.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants