-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Especially for batched CURIEs, we need to return the requested CURIE? #1622
Comments
seems doable (except for subclassing/superclassing, as you mentioned), although would be a major change to how Expand works. I saw that issue come in, but I don't understand why the ARA needs to know which returned curie corresponds to which curie in the batch list? they're all mapped to in other words - I don't understand why step 3 in Patrick's example is happening: we never worry about mapping the returned curies to the input curies. we do canonicalize the returned curies, but if our synonymizer doesn't recognize a curie, it just leaves it as it is. (no comparison to the QG's curie list is made.) |
ah, think I'm seeing the problem now, specifically for batch queries that are the second+ hop in a larger query. the edges from the prior hop may not be able to connect to edges in the second+ hop appropriately due to the different equivalent curies used. I think we haven't run into this really because we're still only using KG2 for second+ hops for speed reasons, so synonymization stays consistent. |
yeah, it seems like it would be kind to the client if our system (KG2 for sure and maybe ARAX too) takes the list of input CURIEs, translates them to our internal best_curies and retains a dict of that transformation, and then when all the results are ready, perform the reverse transformation so that the final results are using the original CURIEs in the request. |
nice, yeah, was thinking of using the exact same technique - that way the internals of Expand actually don't need to be messed with at all really... |
see also NCATSTranslator/minihackathons#231 |
@edeutsch - can you point me to the synonymizer method that would work best for getting the names that correspond to a list of (non-canonical) curies (e.g., ultimately I want to load the info into a dictionary like this:
I realize that that targeted of a synonymizer method probably doesn't exist, but just not sure which higher-level method would work best to get my hands on this info. |
yeah, there isn't a method that will do exactly what you want, although there could be I suppose. I think the easiest way to get what you want with existing methods to call the main "give be everything" method:
You can test on the command line with:
The resulting structure is a dict with the curies that you asked for and then below that you would need to iterate through "nodes" to find the query curie and find your information in:
Not elegant, but workable? |
(it's possible that you might not find it after iterating through nodes, in which case you'd need a backup plan) |
yep, that works! thanks, @edeutsch. changes for this are complete in will verify it looks good on production when |
|
thanks! discovered that remapped nodes weren't being decorated with attributes properly - now fixed in strangely the full query in https://togithub.com/NCATSTranslator/minihackathons/issues/231 is now encountering an error during the Overlay FET step, written up in #1643. |
|
confirmed it looks good on production - an example with the same KEGG.COMPOUND reported in the minihackathon issue is here: https://arax.ncats.io/?r=24582 think this issue is safe to close, though the query for https://togithub.com/NCATSTranslator/minihackathons/issues/231 currently isn't working due to #1643, which seems to be an unrelated problem as far as I can tell. |
outstanding, thank you! closing. |
In the current system for both KG2 and ARAX, I think when a CURIE is requested, we often return our canonical CURIE instead of the requested one. This seemed fine because it was mapped to the query node. BUT, in cases of batched CURIEs, this causes a big problem because the requestor which finds it very difficult to match the returned CURIEs with the list of requested CURIEs. Patrick and I has some Slack convo about this. And here's his resulting post:
https://togithub.com/NCATSTranslator/TranslatorArchitecture/issues/62
It seems to me that the solution is that if we synonymize an input CURIE (especially within a batch) to something else and process it, it seems that we probably better transform it back to what the requestor asked. Seems not too hard. Unless there's subclassing/superclassing involved, in which case that is not possible.
The text was updated successfully, but these errors were encountered: