[5] Allow batch mode in the Pages API #40

hyanwong · 2015-11-04T16:17:19Z

For querying a large number of pages or data objects, it is slow and costly to make multiple API calls. It would be good if the pages and data_objects APIs could take an array of page IDs or data_object IDs and return a list of results.

To help with this, it would be sensible to be able to minimise the amount of data returned from a pages or data_object API query. At the moment, both types of query always return an (often large) "taxonConcepts" array. It would be sensible to have a parameter to set to true/false which can be used to turn this off, to save bandwidth / EoL effort.

iimog · 2015-11-10T10:08:11Z

This feature would help me a lot as I need a mapping of a huge list of NCBI taxids to EOL page ids.
Using the API with one ID at a time is not feasible for over 700,000 IDs.

hyanwong · 2015-11-10T14:27:36Z

Yes, sorry, I forget to add that the API for 'Search By Provider' should be batch-ified too, if possible. That would help both my use case and that of https://github.com/iimog

JRice · 2015-11-10T14:31:42Z

Let's make separate tickets for each, so we can manage the tasks. I'm going to modify this one to be the pages API only, as an initial test case.

hyanwong · 2015-11-10T14:34:17Z

Thanks. Easiest batch-ification might be for 'Search by provider', as it only returns a very simple integer value (well, actually 2 values, but one is redundant, and can probably be scrapped)

AmrMMorad · 2015-12-02T07:54:03Z

For pages API, I think we need to discuss 2 points:
1- If one of the pages' ids is wrong, Do you want an error (ie error message and no response is returned) or return the correct ones and "ignore" the wrong one?
2- What are the values that can be omitted from the returned values (ie less important ones) to reduce the overhead?
Thanks!

hyanwong · 2015-12-02T09:48:37Z

For point 1) I imagine we would return a JSON array, each element of which corresponds to the IDs passed in, in which case an 'incorrect' ID would need to return a blank/null value in the appropriate slot in the array. Alternatively we could return an associative array with {ID1=>{data1}, ID2=>{data2}}, in which case we could simply ignore any wrong IDs.

hyanwong · 2015-12-02T10:04:58Z

For point 2), that's what I meant in the original opening post when I said that it should be possible to turn off the "taxonConcepts" part of the result when calling the pages or data objects API. In general, I think we would want to use exactly the same format as the normal (non-batch) API, which saves having to create extra documentation, etc. So I think the first step is to add some extra params to the normal API which allow the user to slim down the returned request. It's not too bad at the moment, since we can set "details: False", "iucn: False", etc. But the following changes to the normal API would help:

taxonomy: true,false (default true) - removes taxonConcepts from the returned result (NB: the default on batch calls could be false)
~~subjects: all,... should also add the possibility of 'none', which perhaps should be the default for batch.~~ (sorry, my mistake, see [5] Implement true/false 'taxonomy' API parameter to reduce size of returned data in the API #60)

While we are at it, could we also add an option to the vettedStatus parameter to return only unreviewed content (perhaps value = 3)? This is useful for checking content to review (also, come to think of it, something to return untrusted content might be useful for checking on EoL accuracy). Should I open new git issues for each of these 3 proposed changes to the normal API, or can they all be followed here?

For the searchbyprovider API it is much simpler, since only a single value needs to be returned.

AmrMMorad · 2015-12-02T11:57:21Z

Thanks for your reply.
I think we need to create issues for these changes as, you know, these changes will be prerequisites for enabling batch mode in pages API. After we finish these, we could continue working on that.
Thanks!

hyanwong · 2015-12-02T11:59:06Z

@JRice OK to open 2 more issues to improve the pages & data_objects API, as a precursor to batch mode?

Edit: now done: #60 and #61. Also see #62

hyanwong · 2015-12-03T11:01:00Z

Also we need a way to pass a number of IDs into the API. At the moment, the ID is hard coded into the url, as in

http://eol.org/api/pages/1.0/1045608.json?images=2

I guess we don't want to encode this in the URL name. E.g.separating with a vertical bar (pipe character, %7C) in the file name looks bad to me:

http://eol.org/api/pages/1.0/1045608%7C328023%7C591753.json?images=2

perhaps we need something like

http://eol.org/api/pages/1.0/batch.json?images=2&pageIDs=1045608%7C328023%7C591753

I use the pipe character (%7C) to separate numbers, as that is what is done in the licenses field. But some other separator could be used, or we could even repeat the parameter:

http://eol.org/api/pages/1.0/batch.json?images=2&ID=1045608&ID=328023&ID=591753

hyanwong · 2016-01-27T10:58:55Z

What's the status of this now? Has it been coded up, but not gone live, for instance? I can't see where the new code/documentation might be, and I think I'm not quite understanding the process of how issues move through the EoL machine.

AmrMMorad · 2016-01-27T11:23:47Z

Actually, it is waiting for the next deploy. It is already committed to the master code branch.
Thanks

hyanwong · 2016-01-27T11:25:38Z

Cool, thanks. What calling format did you chose eventually? Oh, and when is the next deploy likely to be? Oh, and finally (sorry) does this also apply to the Search_by_provider API?

AmrMMorad · 2016-01-27T12:13:58Z

For format, we have now 2 extra flags: one for batch mode and one for taxonomy included in the result or not. When you choose the batch mode, you can enter multiple pages' ids separated by ",".
In fact, this is only applied for pages API.

hyanwong · 2016-01-27T13:44:05Z

Thanks. Perfect. Should I open another issue for getting an identical thing coded for the data_objects and search_by_provider APIs?

AmrMMorad · 2016-01-27T13:47:34Z

Yes, please
Thank you

hyanwong · 2016-01-27T13:58:20Z

Just done so (see above). What is the ETA for the next deploy, by the way?

AmrMMorad · 2016-01-27T14:01:27Z

Thanks.
Sorry I don't know the ETA for it. I think @JRice can answer this..

JRice · 2016-01-27T18:47:41Z

The newest code will be released tomorrow (2016-01-28) after the downtime
(09:00 ET). So, if it was on the master branch as of 12:00 ET today, it
will be included in the deploy.

On Wed, Jan 27, 2016 at 9:01 AM, AmrMMorad notifications@github.com wrote:

Thanks.
Sorry I don't know the ETA for it. I think @JRice
https://github.com/JRice can answer this..

—
Reply to this email directly or view it on GitHub
#40 (comment).

JRice changed the title ~~Allow batch mode in the API~~ Allow batch mode in the Pages API Nov 10, 2015

JRice changed the title ~~Allow batch mode in the Pages API~~ [5] Allow batch mode in the Pages API Nov 10, 2015

AmrMMorad self-assigned this Dec 1, 2015

This was referenced Dec 2, 2015

[5] Implement true/false 'taxonomy' API parameter to reduce size of returned data in the API #60

Closed

[5] Pages & data objects APIs - remove empty returned arrays #62

Closed

JRice modified the milestone: 2015.12.08 Dec 8, 2015

JRice closed this as completed Dec 22, 2015

This was referenced Jan 27, 2016

[3] Implement batch mode for the search_by_provider API #117

Open

[5] Implement batch mode for the data_objects API #118

Open

jhammock mentioned this issue Feb 9, 2016

Test the load of batch API use and restrict sizes if needed #126

Closed

hyanwong mentioned this issue Feb 10, 2016

Create a file that maps all hierarchy entries to EOL taxon concept (page) ids #162

Open

jhammock added the data services label Mar 24, 2016

hyanwong mentioned this issue Mar 29, 2016

[3] Better data structure returned by Pages API batch mode: hash not array #232

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[5] Allow batch mode in the Pages API #40

[5] Allow batch mode in the Pages API #40

hyanwong commented Nov 4, 2015

iimog commented Nov 10, 2015

hyanwong commented Nov 10, 2015

JRice commented Nov 10, 2015

hyanwong commented Nov 10, 2015

AmrMMorad commented Dec 2, 2015

hyanwong commented Dec 2, 2015

hyanwong commented Dec 2, 2015

AmrMMorad commented Dec 2, 2015

hyanwong commented Dec 2, 2015

hyanwong commented Dec 3, 2015

hyanwong commented Jan 27, 2016

AmrMMorad commented Jan 27, 2016

hyanwong commented Jan 27, 2016

AmrMMorad commented Jan 27, 2016

hyanwong commented Jan 27, 2016

AmrMMorad commented Jan 27, 2016

hyanwong commented Jan 27, 2016

AmrMMorad commented Jan 27, 2016

JRice commented Jan 27, 2016

[5] Allow batch mode in the Pages API #40

[5] Allow batch mode in the Pages API #40

Comments

hyanwong commented Nov 4, 2015

iimog commented Nov 10, 2015

hyanwong commented Nov 10, 2015

JRice commented Nov 10, 2015

hyanwong commented Nov 10, 2015

AmrMMorad commented Dec 2, 2015

hyanwong commented Dec 2, 2015

hyanwong commented Dec 2, 2015

AmrMMorad commented Dec 2, 2015

hyanwong commented Dec 2, 2015

hyanwong commented Dec 3, 2015

hyanwong commented Jan 27, 2016

AmrMMorad commented Jan 27, 2016

hyanwong commented Jan 27, 2016

AmrMMorad commented Jan 27, 2016

hyanwong commented Jan 27, 2016

AmrMMorad commented Jan 27, 2016

hyanwong commented Jan 27, 2016

AmrMMorad commented Jan 27, 2016

JRice commented Jan 27, 2016