Speed up BETYdb API #516

dlebauer · 2017-05-25T14:53:09Z

By speed up, I mean, to within an order of magnitude of an SQL query

This should resolve within a minute if limit=none https://terraref.ncsa.illinois.edu/bety-test/api/beta/search?key=9999999999999999999999999999999999999999&limit=1

Two suggestions I've heard.

'figure out how to stream and not hold all data in memory'
"replace the Ruby json creator under the API with an SQL that exports as json"
- sounds like a massive undertaking ... only seems realistic if it is easier than it sounds ...

Completion Criteria

This query resolves within a minute:
https://terraref.ncsa.illinois.edu/bety-test/api/beta/search?key=9999999999999999999999999999999999999999&limit=none
documented
tests written

The text was updated successfully, but these errors were encountered:

shuklaneerajdev · 2017-07-12T18:46:25Z

Please use the pagination of APIs to implement this. Its a standard practice. Please check this link for some detail-
https://stackoverflow.com/questions/13872273/api-pagination-best-practices

You can use this Gem to get the standard implementation-
https://github.com/davidcelis/api-pagination

dlebauer · 2017-07-12T20:46:07Z

@Luckn0wLe0pard if we implemented pagination, would it require sending one request per page, e.g. appending page=1, page=2, to the url? We can already do this with limit=5000, offset=5000&limit=5000, offset=10000&limit=5000, etc. And even when we do this, the response is very slow.

gauravsoti1 · 2017-07-17T09:24:49Z

Could you please provide the database schema and how exactly the query is being performed?

dlebauer · 2017-07-17T15:02:03Z

@gauravsoti1 the schema can be found at https://www.betydb.org/schemas?partial=relationships. Can you clarify what information you are looking for with 'how exactly the query is performed'?

shuklaneerajdev · 2017-07-17T18:43:32Z

Well, you need to implement pagination in every API since the response size needed can be very big and sometimes the client might not be ready to either-
1- wait for such a huge response
2- or not be able to handle huge response

Pagination is the preferred way.
This is better than using offset and limits as the client does not need to worry about offsets and limits. The client can just make HTTP call with the "previous" and "next" URLs, which means that offset and limit is automatically present in those URLs and the client does not need to worry about it.

{
"data" : [
{ data item 1 with all relevant fields },
{ data item 2 },
...
{ data item 100 }
],
"paging": {
"previous": "http://api.example.com/foo?since=TIMESTAMP1"
"next": "http://api.example.com/foo?since=TIMESTAMP2"
}
}

Since the call is slow regardless of this, maybe we can find the root cause by analysing the schema and call stack, looking at where the maximum time is being spent.
I suggest looking at the queries. Something like https://github.com/ankane/pghero can offer a broad level help as it provides an overview of where the time is being spent. You can find examples of what it looks like here-
https://pghero.dokkuapp.com/datakick/queries

shuklaneerajdev · 2017-07-17T18:59:42Z

Also looking at the code, the search results come from here-
https://github.com/PecanProject/bety/blob/master/app/models/traits_and_yields_view.rb

Clearly the search is performed over too many fields which is making it slow.

SEARCH_FIELDS = %w{ scientificname commonname trait
trait_description city sitename author
citation_year cultivar entity }

The time for client can be reduced by two ways-
1- Lets say the client is searching by trait_description most of the time- then there can be a seperate API created for that which can be heavily optimised by adding indexing etc. to trait_description
2- If the search is over all those fields, then we must see which of these can be indexed to optimise the overall performance.

… api

dlebauer added help wanted terraref labels May 25, 2017

dlebauer mentioned this issue Aug 17, 2017

Queries slow on millions of records #419

Closed

7 tasks

finist added a commit to finist/bety that referenced this issue Aug 28, 2017

issue PecanProject#516 fix n+1 for search api

fa99e9f

finist added a commit to finist/bety that referenced this issue Aug 28, 2017

issue PecanProject#516 replaced rabl template on jbuilder for search api

6e2e576

finist added a commit to finist/bety that referenced this issue Aug 28, 2017

issue PecanProject#516 replaced yajl json serializer on oj for search…

517434a

… api

finist added a commit to finist/bety that referenced this issue Sep 1, 2017

issue PecanProject#516 return rabl index view for support xml format

af0d82b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up BETYdb API #516

Speed up BETYdb API #516

dlebauer commented May 25, 2017 •

edited

Loading

shuklaneerajdev commented Jul 12, 2017

dlebauer commented Jul 12, 2017

gauravsoti1 commented Jul 17, 2017

dlebauer commented Jul 17, 2017

shuklaneerajdev commented Jul 17, 2017

shuklaneerajdev commented Jul 17, 2017

Speed up BETYdb API #516

Speed up BETYdb API #516

Comments

dlebauer commented May 25, 2017 • edited Loading

Completion Criteria

shuklaneerajdev commented Jul 12, 2017

dlebauer commented Jul 12, 2017

gauravsoti1 commented Jul 17, 2017

dlebauer commented Jul 17, 2017

shuklaneerajdev commented Jul 17, 2017

shuklaneerajdev commented Jul 17, 2017

dlebauer commented May 25, 2017 •

edited

Loading