Especially for batched CURIEs, we need to return the requested CURIE? #1622

edeutsch · 2021-08-22T04:57:24Z

In the current system for both KG2 and ARAX, I think when a CURIE is requested, we often return our canonical CURIE instead of the requested one. This seemed fine because it was mapped to the query node. BUT, in cases of batched CURIEs, this causes a big problem because the requestor which finds it very difficult to match the returned CURIEs with the list of requested CURIEs. Patrick and I has some Slack convo about this. And here's his resulting post:
https://togithub.com/NCATSTranslator/TranslatorArchitecture/issues/62

It seems to me that the solution is that if we synonymize an input CURIE (especially within a batch) to something else and process it, it seems that we probably better transform it back to what the requestor asked. Seems not too hard. Unless there's subclassing/superclassing involved, in which case that is not possible.

amykglen · 2021-08-22T17:05:22Z

seems doable (except for subclassing/superclassing, as you mentioned), although would be a major change to how Expand works.

I saw that issue come in, but I don't understand why the ARA needs to know which returned curie corresponds to which curie in the batch list? they're all mapped to n0 (or whichever qnode). so what's the problem? ARAX functions fine without needing to know that correspondence.

in other words - I don't understand why step 3 in Patrick's example is happening: we never worry about mapping the returned curies to the input curies. we do canonicalize the returned curies, but if our synonymizer doesn't recognize a curie, it just leaves it as it is. (no comparison to the QG's curie list is made.)

amykglen · 2021-08-22T20:06:40Z

ah, think I'm seeing the problem now, specifically for batch queries that are the second+ hop in a larger query. the edges from the prior hop may not be able to connect to edges in the second+ hop appropriately due to the different equivalent curies used. I think we haven't run into this really because we're still only using KG2 for second+ hops for speed reasons, so synonymization stays consistent.

edeutsch · 2021-08-22T20:17:45Z

yeah, it seems like it would be kind to the client if our system (KG2 for sure and maybe ARAX too) takes the list of input CURIEs, translates them to our internal best_curies and retains a dict of that transformation, and then when all the results are ready, perform the reverse transformation so that the final results are using the original CURIEs in the request.

amykglen · 2021-08-23T00:26:36Z

nice, yeah, was thinking of using the exact same technique - that way the internals of Expand actually don't need to be messed with at all really...

saramsey · 2021-09-08T20:28:10Z

see also NCATSTranslator/minihackathons#231

amykglen · 2021-09-09T19:58:21Z

@edeutsch - can you point me to the synonymizer method that would work best for getting the names that correspond to a list of (non-canonical) curies (e.g., ["MESH:D010300", "CHEBI:46195"])?

ultimately I want to load the info into a dictionary like this:

{
  "MESH:D010300": "Parkinson Disease",
  "CHEBI:46195": "paracetamol"
}

I realize that that targeted of a synonymizer method probably doesn't exist, but just not sure which higher-level method would work best to get my hands on this info.

edeutsch · 2021-09-09T20:29:26Z

yeah, there isn't a method that will do exactly what you want, although there could be I suppose. I think the easiest way to get what you want with existing methods to call the main "give be everything" method:

equivalence = synonymizer.get_normalizer_results(entities)

You can test on the command line with:

python3 node_synonymizer.py --lookup MESH:D010300,CHEBI:46195

The resulting structure is a dict with the curies that you asked for and then below that you would need to iterate through "nodes" to find the query curie and find your information in:

      {
        "category": "biolink:NamedThing",
        "identifier": "MESH:D010300",
        "label": "Parkinson Disease",
        "original_label": "Parkinson Disease"
      },

Not elegant, but workable?

edeutsch · 2021-09-10T00:11:10Z

(it's possible that you might not find it after iterating through nodes, in which case you'd need a backup plan)

amykglen · 2021-09-10T21:49:14Z

yep, that works! thanks, @edeutsch.

changes for this are complete in master - also confirmed that the issue reported in https://togithub.com/NCATSTranslator/minihackathons/issues/231 is fixed.

will verify it looks good on production when master is next rolled out.

edeutsch · 2021-09-10T22:41:36Z

master is now deployed everywhere

amykglen · 2021-09-11T02:22:11Z

thanks! discovered that remapped nodes weren't being decorated with attributes properly - now fixed in master.

strangely the full query in https://togithub.com/NCATSTranslator/minihackathons/issues/231 is now encountering an error during the Overlay FET step, written up in #1643.

edeutsch · 2021-09-11T03:26:40Z

master rolled out to production and everywhere for testing

amykglen · 2021-09-11T16:53:10Z

confirmed it looks good on production - an example with the same KEGG.COMPOUND reported in the minihackathon issue is here: https://arax.ncats.io/?r=24582

think this issue is safe to close, though the query for https://togithub.com/NCATSTranslator/minihackathons/issues/231 currently isn't working due to #1643, which seems to be an unrelated problem as far as I can tell.

edeutsch · 2021-09-12T17:37:31Z

outstanding, thank you! closing.

amykglen added the expand label Aug 23, 2021

amykglen self-assigned this Sep 2, 2021

This was referenced Sep 8, 2021

may need to have Expand reassign KG node names to the names associated with query CURIEs #1637

Closed

ARAX results "(S)-2-amino-4-(2-formamidophenyl)-4-oxobutanoic acid" when "kynurenine" is expected in D.4 NCATSTranslator/minihackathons#231

Closed

edeutsch mentioned this issue Sep 9, 2021

Do we need to conform to SRI Node Normalizer more? #1641

Closed

amykglen added a commit that referenced this issue Sep 10, 2021

Make KG nodes use input (QG) curies where applicable #1622

df1275b

amykglen added verify in next deployment and removed verify in next deployment labels Sep 10, 2021

amykglen added a commit that referenced this issue Sep 10, 2021

Handle KEGG vs. KEGG.COMPOUND discrepancy #1622

993da4b

amykglen added the verify in next deployment label Sep 10, 2021

amykglen mentioned this issue Sep 11, 2021

FET error with previously working Translator mini-hackathon query #1643

Closed

amykglen added a commit that referenced this issue Sep 11, 2021

Map back to input curies AFTER node decoration #1622

618a58f

amykglen removed the verify in next deployment label Sep 11, 2021

edeutsch closed this as completed Sep 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Especially for batched CURIEs, we need to return the requested CURIE? #1622

Especially for batched CURIEs, we need to return the requested CURIE? #1622

edeutsch commented Aug 22, 2021

amykglen commented Aug 22, 2021 •

edited

Loading

amykglen commented Aug 22, 2021

edeutsch commented Aug 22, 2021

amykglen commented Aug 23, 2021

saramsey commented Sep 8, 2021

amykglen commented Sep 9, 2021

edeutsch commented Sep 9, 2021 •

edited

Loading

edeutsch commented Sep 10, 2021

amykglen commented Sep 10, 2021 •

edited

Loading

edeutsch commented Sep 10, 2021

amykglen commented Sep 11, 2021

edeutsch commented Sep 11, 2021 •

edited

Loading

amykglen commented Sep 11, 2021

edeutsch commented Sep 12, 2021

Especially for batched CURIEs, we need to return the requested CURIE? #1622

Especially for batched CURIEs, we need to return the requested CURIE? #1622

Comments

edeutsch commented Aug 22, 2021

amykglen commented Aug 22, 2021 • edited Loading

amykglen commented Aug 22, 2021

edeutsch commented Aug 22, 2021

amykglen commented Aug 23, 2021

saramsey commented Sep 8, 2021

amykglen commented Sep 9, 2021

edeutsch commented Sep 9, 2021 • edited Loading

edeutsch commented Sep 10, 2021

amykglen commented Sep 10, 2021 • edited Loading

edeutsch commented Sep 10, 2021

amykglen commented Sep 11, 2021

edeutsch commented Sep 11, 2021 • edited Loading

amykglen commented Sep 11, 2021

edeutsch commented Sep 12, 2021

amykglen commented Aug 22, 2021 •

edited

Loading

edeutsch commented Sep 9, 2021 •

edited

Loading

amykglen commented Sep 10, 2021 •

edited

Loading

edeutsch commented Sep 11, 2021 •

edited

Loading