Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up BETYdb API #516

Open
3 tasks
dlebauer opened this issue May 25, 2017 · 6 comments
Open
3 tasks

Speed up BETYdb API #516

dlebauer opened this issue May 25, 2017 · 6 comments

Comments

@dlebauer
Copy link
Member

dlebauer commented May 25, 2017

By speed up, I mean, to within an order of magnitude of an SQL query

This should resolve within a minute if limit=none https://terraref.ncsa.illinois.edu/bety-test/api/beta/search?key=9999999999999999999999999999999999999999&limit=1

Two suggestions I've heard.

  1. 'figure out how to stream and not hold all data in memory'
  2. "replace the Ruby json creator under the API with an SQL that exports as json"
    • sounds like a massive undertaking ... only seems realistic if it is easier than it sounds ...

Completion Criteria

@shuklaneerajdev
Copy link

Please use the pagination of APIs to implement this. Its a standard practice. Please check this link for some detail-
https://stackoverflow.com/questions/13872273/api-pagination-best-practices

You can use this Gem to get the standard implementation-
https://github.com/davidcelis/api-pagination

@dlebauer
Copy link
Member Author

@Luckn0wLe0pard if we implemented pagination, would it require sending one request per page, e.g. appending page=1, page=2, to the url? We can already do this with limit=5000, offset=5000&limit=5000, offset=10000&limit=5000, etc. And even when we do this, the response is very slow.

@gauravsoti1
Copy link

Could you please provide the database schema and how exactly the query is being performed?

@dlebauer
Copy link
Member Author

@gauravsoti1 the schema can be found at https://www.betydb.org/schemas?partial=relationships. Can you clarify what information you are looking for with 'how exactly the query is performed'?

@shuklaneerajdev
Copy link

Well, you need to implement pagination in every API since the response size needed can be very big and sometimes the client might not be ready to either-
1- wait for such a huge response
2- or not be able to handle huge response

Pagination is the preferred way.
This is better than using offset and limits as the client does not need to worry about offsets and limits. The client can just make HTTP call with the "previous" and "next" URLs, which means that offset and limit is automatically present in those URLs and the client does not need to worry about it.

{
"data" : [
{ data item 1 with all relevant fields },
{ data item 2 },
...
{ data item 100 }
],
"paging": {
"previous": "http://api.example.com/foo?since=TIMESTAMP1"
"next": "http://api.example.com/foo?since=TIMESTAMP2"
}
}

Since the call is slow regardless of this, maybe we can find the root cause by analysing the schema and call stack, looking at where the maximum time is being spent.
I suggest looking at the queries. Something like https://github.com/ankane/pghero can offer a broad level help as it provides an overview of where the time is being spent. You can find examples of what it looks like here-
https://pghero.dokkuapp.com/datakick/queries

@shuklaneerajdev
Copy link

Also looking at the code, the search results come from here-
https://github.com/PecanProject/bety/blob/master/app/models/traits_and_yields_view.rb

Clearly the search is performed over too many fields which is making it slow.

SEARCH_FIELDS = %w{ scientificname commonname trait
trait_description city sitename author
citation_year cultivar entity }

The time for client can be reduced by two ways-
1- Lets say the client is searching by trait_description most of the time- then there can be a seperate API created for that which can be heavily optimised by adding indexing etc. to trait_description
2- If the search is over all those fields, then we must see which of these can be indexed to optimise the overall performance.

finist added a commit to finist/bety that referenced this issue Aug 28, 2017
finist added a commit to finist/bety that referenced this issue Aug 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants