resultsets fetch implementation #22

gogosofta · 2016-09-15T12:44:48Z

Is the resultset FETCH implemented based on this concept https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-search-scrolling.html or is it restricted by the greatest of index.max_result_window and fetch.size ?
Currently I can't find a way to get the entire result set from a SELECT query if the result contains more than fetch.size (or index.max_result_window) rows because offsets are currently not implemented in sql4es.
Any hints/workarounds or plans for the future? :)

corneversloot · 2016-09-15T17:27:20Z

The fetch is implemented as a scroll and it will put fetch.size number of results within a single ResultSet. More results (if they exist) can be fetches using the Statement's getMoreResults() function which is part of the JDBC specification. SQLWorkbench for example will execute this automatically and show a tab for each ResultSet (Result 1, Result 2, ..., Result N).

gogosofta · 2016-09-16T08:55:20Z

Got it, but it can't be achieved by means of SQL, which makes the feature inaccessible for a non-programmable client (simple SQL data source).

corneversloot · 2016-09-16T09:37:13Z

What exactly do you mean by 'means of SQL'? The getMoreResults is part of the JDBC specification and clients can use it (SQLWorkbench does).

Maybe better to approach it from another side, what are you trying to achieve using which client? Understanding this will enable me to help more efficiently i think :)

As a side node, implementing OFFSET or LIMIT X, Y is not planned atm. The SQL parser used (from the Presto project) does not even support parsing these I think.

corneversloot · 2016-09-17T11:23:45Z

I have added an option (results.split) which can be used to control how results from Elasticsearch are put in ResultSet objects. The current default is that all results, no matter the fetch size, are put in a single ResultSet. Setting results.split to true will split the ResultSet into multiple ResultSet objects each containing at most fetch.size number of records.

I understand that it is not really an answer to your issue as it does not implement a way to use offset (which Elasticsearch does not support either). But I hope it helps :)

EDIT: Elasticsearch actually does support it so might be interesting to see if I can add it

gogosofta · 2016-09-17T13:00:16Z

Sounds great! That's what I actually wanted.
The idea is to be able to get all records from a select into a single ResultSet - what you've said now is the default case.

dmarkhas · 2016-09-18T07:27:26Z

In ESQueryState, I think the limit is not calculated correctly in lines 138-148.
If no fetch size and limit has been provided, the default value for limit is used, which is -1 - I guess this is what causes Drill to return 10 results, since 10 is the default number of results ES returns unless specified otherwise (and -1 is not a valid number).
When using WorkbenchJ, I see that it is calling the method setMaxRows with 0 (which according to JDBC spec should mean no limit), which then causes determineLimit to return 0...

corneversloot · 2016-09-18T11:16:22Z

Whow, good call on this issue. I have implemented a fix to avoid setting the number of results to -1. If there is no limit or maxResults specified the driver will fetch Integer.MAX_VALUE rows.

I have explored the option to use OFFSET using query.setFrom(N) and it turns out that its use is limited because Elasticsearch throws an exception when offset + limit > 10000. Given this limitation I have chosen not to implement OFFSET at this time. Offsetting a result will have to be done by the client by iterating over the results. Obviously this is not efficient but ES does not really provide the support for this scenario at this point.

dmarkhas · 2016-09-21T14:08:29Z

Looks like there is another issue - when not specifying results.split/fetch.size in the URL, and with no limit provided in the query, I am getting IndexOutOfBoundException, in Squirrel/WorkbenchJ/Drill.
There's no stack trace though, so I'm not sure where it is coming from yet, but I'll try to dig around some more.

corneversloot · 2016-09-22T06:12:44Z

Workbench's logs (help -> view logfile) could provide some information on the Exception.

One way this can happen if when your documents have more than 250 fields (including nested objects) which does not fit in the default array initialised to hold results. If this is the case you can increase the number of fields by setting the 'default.row.length' parameter

dmarkhas · 2016-09-22T06:34:22Z

I wonder if this is maybe coming back from ES itself? When I set a limit of 10,000 or 20,000, it doesn't happen and I get all results correctly. It only happens when the limit is really big (MAX_INT)...

I've also found this thread which seems related:
https://discuss.elastic.co/t/getting-java-lang-arrayindexoutofboundsexception-while-searching/35756

--Dan

From: Corne Versloot [mailto:notifications@github.com]
Sent: Thursday, September 22, 2016 09:13
To: Anchormen/sql4es sql4es@noreply.github.com
Cc: Markhasin, Dan dan.markhasin@intel.com; Comment comment@noreply.github.com
Subject: Re: [Anchormen/sql4es] resultsets fetch implementation (#22)

Workbench's logs (help -> view logfile) could provide some information on the Exception.

One way this can happen if when your documents have more than 250 fields (including nested objects) which does not fit in the default array initialised to hold results. If this is the case you can increase the number of fields by setting the 'default.row.length' parameter

—
You are receiving this because you commented.

Reply to this email directly, view it on GitHubhttps://github.com//issues/22#issuecomment-248820913, or mute the threadhttps://github.com/notifications/unsubscribe-auth/APcOgeBMHourYUBwSmzFdKvZeu3O9Vjpks5qshxdgaJpZM4J90sK.

Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

corneversloot · 2016-09-22T06:52:24Z

Seems very likely. I actually assumed ES would fetch the MAX_INT results using the scroll in 10.000 increments but I do not think it is the case. Setting the size to 10.000 and executing the scroll in the driver until the LIMIT number of results or the end of the result has been reached should fix this.

issue #22)

corneversloot · 2016-09-25T12:19:24Z

I have rebuild this part and results are always fetched in batches using a scroll. Search results are added to a single ResultSet or to multiple if configured to do so (setting results.split or using setMaxRows(...) on the Statement). Splitting results is generally speaking the better approach since it avoids loading all results before handing it to the client. Tested up to 3.6M rows with and without specifying a limit. I must say it is not particularly fast, at least not on my laptop. In the end Elasticsearch is not really build to just serve large number of records.

Feel free to use the latest 0.8.2.4 release to test it

dmarkhas · 2016-09-27T19:51:08Z

Yep, seems to be OK now :)

--Dan

From: Corne Versloot [mailto:notifications@github.com]
Sent: Sunday, September 25, 2016 15:19
To: Anchormen/sql4es sql4es@noreply.github.com
Cc: Markhasin, Dan dan.markhasin@intel.com; Comment comment@noreply.github.com
Subject: Re: [Anchormen/sql4es] resultsets fetch implementation (#22)

I have rebuild this part and results are always fetched in batches using a scroll. Search results are added to a single ResultSet or to multiple if configured to do so (setting results.split or using setMaxRows(...) on the Statement). Splitting results is generally speaking the better approach since it avoids loading all results before handing it to the client. Tested up to 3.6M rows with and without specifying a limit. I must say it is not particularly fast, at least not on my laptop. In the end Elasticsearch is not really build to just serve large number of records.

Feel free to use the latest 0.8.2.4 release to test it

—
You are receiving this because you commented.

Reply to this email directly, view it on GitHubhttps://github.com//issues/22#issuecomment-249418781, or mute the threadhttps://github.com/notifications/unsubscribe-auth/APcOgbp8erfhOSojVOsEtEKbrPfQJ040ks5qtmbNgaJpZM4J90sK.

Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

aronica · 2018-03-15T14:16:23Z

ES support offset since v0.9, I have added mysql style limit with offset support both in my fork of presto-parser and es4sql, maybe this would be helpful.

dmarkhas mentioned this issue Sep 18, 2016

Incorrect type for nested fields #20

Closed

corneversloot added a commit that referenced this issue Sep 25, 2016

Updated the way results are fetched and put into ResultSet objects (see

3cb787b

issue #22)

corneversloot closed this as completed Oct 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resultsets fetch implementation #22

resultsets fetch implementation #22

gogosofta commented Sep 15, 2016 •

edited

Loading

corneversloot commented Sep 15, 2016

gogosofta commented Sep 16, 2016

corneversloot commented Sep 16, 2016

corneversloot commented Sep 17, 2016 •

edited

Loading

gogosofta commented Sep 17, 2016

dmarkhas commented Sep 18, 2016 •

edited

Loading

corneversloot commented Sep 18, 2016

dmarkhas commented Sep 21, 2016

corneversloot commented Sep 22, 2016

dmarkhas commented Sep 22, 2016

corneversloot commented Sep 22, 2016

corneversloot commented Sep 25, 2016

dmarkhas commented Sep 27, 2016

aronica commented Mar 15, 2018

resultsets fetch implementation #22

resultsets fetch implementation #22

Comments

gogosofta commented Sep 15, 2016 • edited Loading

corneversloot commented Sep 15, 2016

gogosofta commented Sep 16, 2016

corneversloot commented Sep 16, 2016

corneversloot commented Sep 17, 2016 • edited Loading

gogosofta commented Sep 17, 2016

dmarkhas commented Sep 18, 2016 • edited Loading

corneversloot commented Sep 18, 2016

dmarkhas commented Sep 21, 2016

corneversloot commented Sep 22, 2016

dmarkhas commented Sep 22, 2016

Reply to this email directly, view it on GitHubhttps://github.com//issues/22#issuecomment-248820913, or mute the threadhttps://github.com/notifications/unsubscribe-auth/APcOgeBMHourYUBwSmzFdKvZeu3O9Vjpks5qshxdgaJpZM4J90sK.

corneversloot commented Sep 22, 2016

corneversloot commented Sep 25, 2016

dmarkhas commented Sep 27, 2016

Reply to this email directly, view it on GitHubhttps://github.com//issues/22#issuecomment-249418781, or mute the threadhttps://github.com/notifications/unsubscribe-auth/APcOgbp8erfhOSojVOsEtEKbrPfQJ040ks5qtmbNgaJpZM4J90sK.

aronica commented Mar 15, 2018

gogosofta commented Sep 15, 2016 •

edited

Loading

corneversloot commented Sep 17, 2016 •

edited

Loading

dmarkhas commented Sep 18, 2016 •

edited

Loading