Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Especially for batched CURIEs, we need to return the requested CURIE? #1622

Closed
edeutsch opened this issue Aug 22, 2021 · 14 comments
Closed

Especially for batched CURIEs, we need to return the requested CURIE? #1622

edeutsch opened this issue Aug 22, 2021 · 14 comments
Assignees
Labels

Comments

@edeutsch
Copy link
Collaborator

In the current system for both KG2 and ARAX, I think when a CURIE is requested, we often return our canonical CURIE instead of the requested one. This seemed fine because it was mapped to the query node. BUT, in cases of batched CURIEs, this causes a big problem because the requestor which finds it very difficult to match the returned CURIEs with the list of requested CURIEs. Patrick and I has some Slack convo about this. And here's his resulting post:
https://togithub.com/NCATSTranslator/TranslatorArchitecture/issues/62

It seems to me that the solution is that if we synonymize an input CURIE (especially within a batch) to something else and process it, it seems that we probably better transform it back to what the requestor asked. Seems not too hard. Unless there's subclassing/superclassing involved, in which case that is not possible.

@amykglen
Copy link
Member

amykglen commented Aug 22, 2021

seems doable (except for subclassing/superclassing, as you mentioned), although would be a major change to how Expand works.

I saw that issue come in, but I don't understand why the ARA needs to know which returned curie corresponds to which curie in the batch list? they're all mapped to n0 (or whichever qnode). so what's the problem? ARAX functions fine without needing to know that correspondence.

in other words - I don't understand why step 3 in Patrick's example is happening: we never worry about mapping the returned curies to the input curies. we do canonicalize the returned curies, but if our synonymizer doesn't recognize a curie, it just leaves it as it is. (no comparison to the QG's curie list is made.)

@amykglen
Copy link
Member

ah, think I'm seeing the problem now, specifically for batch queries that are the second+ hop in a larger query. the edges from the prior hop may not be able to connect to edges in the second+ hop appropriately due to the different equivalent curies used. I think we haven't run into this really because we're still only using KG2 for second+ hops for speed reasons, so synonymization stays consistent.

@edeutsch
Copy link
Collaborator Author

yeah, it seems like it would be kind to the client if our system (KG2 for sure and maybe ARAX too) takes the list of input CURIEs, translates them to our internal best_curies and retains a dict of that transformation, and then when all the results are ready, perform the reverse transformation so that the final results are using the original CURIEs in the request.

@amykglen
Copy link
Member

nice, yeah, was thinking of using the exact same technique - that way the internals of Expand actually don't need to be messed with at all really...

@amykglen amykglen self-assigned this Sep 2, 2021
@saramsey
Copy link
Member

saramsey commented Sep 8, 2021

see also NCATSTranslator/minihackathons#231

@amykglen
Copy link
Member

amykglen commented Sep 9, 2021

@edeutsch - can you point me to the synonymizer method that would work best for getting the names that correspond to a list of (non-canonical) curies (e.g., ["MESH:D010300", "CHEBI:46195"])?

ultimately I want to load the info into a dictionary like this:

{
  "MESH:D010300": "Parkinson Disease",
  "CHEBI:46195": "paracetamol"
}

I realize that that targeted of a synonymizer method probably doesn't exist, but just not sure which higher-level method would work best to get my hands on this info.

@edeutsch
Copy link
Collaborator Author

edeutsch commented Sep 9, 2021

yeah, there isn't a method that will do exactly what you want, although there could be I suppose. I think the easiest way to get what you want with existing methods to call the main "give be everything" method:

equivalence = synonymizer.get_normalizer_results(entities)

You can test on the command line with:

python3 node_synonymizer.py --lookup MESH:D010300,CHEBI:46195

The resulting structure is a dict with the curies that you asked for and then below that you would need to iterate through "nodes" to find the query curie and find your information in:

      {
        "category": "biolink:NamedThing",
        "identifier": "MESH:D010300",
        "label": "Parkinson Disease",
        "original_label": "Parkinson Disease"
      },

Not elegant, but workable?

@edeutsch
Copy link
Collaborator Author

(it's possible that you might not find it after iterating through nodes, in which case you'd need a backup plan)

@amykglen
Copy link
Member

amykglen commented Sep 10, 2021

yep, that works! thanks, @edeutsch.

changes for this are complete in master - also confirmed that the issue reported in https://togithub.com/NCATSTranslator/minihackathons/issues/231 is fixed.

will verify it looks good on production when master is next rolled out.

@edeutsch
Copy link
Collaborator Author

master is now deployed everywhere

@amykglen
Copy link
Member

thanks! discovered that remapped nodes weren't being decorated with attributes properly - now fixed in master.

strangely the full query in https://togithub.com/NCATSTranslator/minihackathons/issues/231 is now encountering an error during the Overlay FET step, written up in #1643.

@edeutsch
Copy link
Collaborator Author

edeutsch commented Sep 11, 2021

master rolled out to production and everywhere for testing

@amykglen
Copy link
Member

confirmed it looks good on production - an example with the same KEGG.COMPOUND reported in the minihackathon issue is here: https://arax.ncats.io/?r=24582

think this issue is safe to close, though the query for https://togithub.com/NCATSTranslator/minihackathons/issues/231 currently isn't working due to #1643, which seems to be an unrelated problem as far as I can tell.

@edeutsch
Copy link
Collaborator Author

outstanding, thank you! closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants