-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[5] Allow batch mode in the Pages API #40
Comments
This feature would help me a lot as I need a mapping of a huge list of NCBI taxids to EOL page ids. |
Yes, sorry, I forget to add that the API for 'Search By Provider' should be batch-ified too, if possible. That would help both my use case and that of https://github.com/iimog |
Let's make separate tickets for each, so we can manage the tasks. I'm going to modify this one to be the pages API only, as an initial test case. |
Thanks. Easiest batch-ification might be for 'Search by provider', as it only returns a very simple integer value (well, actually 2 values, but one is redundant, and can probably be scrapped) |
For pages API, I think we need to discuss 2 points: |
For point 1) I imagine we would return a JSON array, each element of which corresponds to the IDs passed in, in which case an 'incorrect' ID would need to return a blank/null value in the appropriate slot in the array. Alternatively we could return an associative array with {ID1=>{data1}, ID2=>{data2}}, in which case we could simply ignore any wrong IDs. |
For point 2), that's what I meant in the original opening post when I said that it should be possible to turn off the "taxonConcepts" part of the result when calling the pages or data objects API. In general, I think we would want to use exactly the same format as the normal (non-batch) API, which saves having to create extra documentation, etc. So I think the first step is to add some extra params to the normal API which allow the user to slim down the returned request. It's not too bad at the moment, since we can set "details: False", "iucn: False", etc. But the following changes to the normal API would help:
While we are at it, could we also add an option to the vettedStatus parameter to return only unreviewed content (perhaps value = 3)? This is useful for checking content to review (also, come to think of it, something to return untrusted content might be useful for checking on EoL accuracy). Should I open new git issues for each of these 3 proposed changes to the normal API, or can they all be followed here? For the searchbyprovider API it is much simpler, since only a single value needs to be returned. |
Thanks for your reply. |
Also we need a way to pass a number of IDs into the API. At the moment, the ID is hard coded into the url, as in
I guess we don't want to encode this in the URL name. E.g.separating with a vertical bar (pipe character, %7C) in the file name looks bad to me:
perhaps we need something like
I use the pipe character (%7C) to separate numbers, as that is what is done in the licenses field. But some other separator could be used, or we could even repeat the parameter:
|
What's the status of this now? Has it been coded up, but not gone live, for instance? I can't see where the new code/documentation might be, and I think I'm not quite understanding the process of how issues move through the EoL machine. |
Actually, it is waiting for the next deploy. It is already committed to the master code branch. |
Cool, thanks. What calling format did you chose eventually? Oh, and when is the next deploy likely to be? Oh, and finally (sorry) does this also apply to the Search_by_provider API? |
For format, we have now 2 extra flags: one for batch mode and one for taxonomy included in the result or not. When you choose the batch mode, you can enter multiple pages' ids separated by ",". |
Thanks. Perfect. Should I open another issue for getting an identical thing coded for the data_objects and search_by_provider APIs? |
Yes, please |
Just done so (see above). What is the ETA for the next deploy, by the way? |
Thanks. |
The newest code will be released tomorrow (2016-01-28) after the downtime On Wed, Jan 27, 2016 at 9:01 AM, AmrMMorad notifications@github.com wrote:
|
For querying a large number of pages or data objects, it is slow and costly to make multiple API calls. It would be good if the pages and data_objects APIs could take an array of page IDs or data_object IDs and return a list of results.
To help with this, it would be sensible to be able to minimise the amount of data returned from a pages or data_object API query. At the moment, both types of query always return an (often large) "taxonConcepts" array. It would be sensible to have a parameter to set to true/false which can be used to turn this off, to save bandwidth / EoL effort.
The text was updated successfully, but these errors were encountered: