-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resultsets fetch implementation #22
Comments
The fetch is implemented as a scroll and it will put fetch.size number of results within a single ResultSet. More results (if they exist) can be fetches using the Statement's getMoreResults() function which is part of the JDBC specification. SQLWorkbench for example will execute this automatically and show a tab for each ResultSet (Result 1, Result 2, ..., Result N). |
Got it, but it can't be achieved by means of SQL, which makes the feature inaccessible for a non-programmable client (simple SQL data source). |
What exactly do you mean by 'means of SQL'? The getMoreResults is part of the JDBC specification and clients can use it (SQLWorkbench does). Maybe better to approach it from another side, what are you trying to achieve using which client? Understanding this will enable me to help more efficiently i think :) As a side node, implementing OFFSET or LIMIT X, Y is not planned atm. The SQL parser used (from the Presto project) does not even support parsing these I think. |
I have added an option (results.split) which can be used to control how results from Elasticsearch are put in ResultSet objects. The current default is that all results, no matter the fetch size, are put in a single ResultSet. Setting results.split to true will split the ResultSet into multiple ResultSet objects each containing at most fetch.size number of records. I understand that it is not really an answer to your issue as it does not implement a way to use offset (which Elasticsearch does not support either). But I hope it helps :) EDIT: Elasticsearch actually does support it so might be interesting to see if I can add it |
Sounds great! That's what I actually wanted. |
In ESQueryState, I think the limit is not calculated correctly in lines 138-148. |
Whow, good call on this issue. I have implemented a fix to avoid setting the number of results to -1. If there is no limit or maxResults specified the driver will fetch Integer.MAX_VALUE rows. I have explored the option to use OFFSET using query.setFrom(N) and it turns out that its use is limited because Elasticsearch throws an exception when offset + limit > 10000. Given this limitation I have chosen not to implement OFFSET at this time. Offsetting a result will have to be done by the client by iterating over the results. Obviously this is not efficient but ES does not really provide the support for this scenario at this point. |
Looks like there is another issue - when not specifying results.split/fetch.size in the URL, and with no limit provided in the query, I am getting IndexOutOfBoundException, in Squirrel/WorkbenchJ/Drill. |
Workbench's logs (help -> view logfile) could provide some information on the Exception. One way this can happen if when your documents have more than 250 fields (including nested objects) which does not fit in the default array initialised to hold results. If this is the case you can increase the number of fields by setting the 'default.row.length' parameter |
I wonder if this is maybe coming back from ES itself? When I set a limit of 10,000 or 20,000, it doesn't happen and I get all results correctly. It only happens when the limit is really big (MAX_INT)... I've also found this thread which seems related: --Dan From: Corne Versloot [mailto:notifications@github.com] Workbench's logs (help -> view logfile) could provide some information on the Exception. One way this can happen if when your documents have more than 250 fields (including nested objects) which does not fit in the default array initialised to hold results. If this is the case you can increase the number of fields by setting the 'default.row.length' parameter — Reply to this email directly, view it on GitHubhttps://github.com//issues/22#issuecomment-248820913, or mute the threadhttps://github.com/notifications/unsubscribe-auth/APcOgeBMHourYUBwSmzFdKvZeu3O9Vjpks5qshxdgaJpZM4J90sK.Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for |
Seems very likely. I actually assumed ES would fetch the MAX_INT results using the scroll in 10.000 increments but I do not think it is the case. Setting the size to 10.000 and executing the scroll in the driver until the LIMIT number of results or the end of the result has been reached should fix this. |
I have rebuild this part and results are always fetched in batches using a scroll. Search results are added to a single ResultSet or to multiple if configured to do so (setting results.split or using setMaxRows(...) on the Statement). Splitting results is generally speaking the better approach since it avoids loading all results before handing it to the client. Tested up to 3.6M rows with and without specifying a limit. I must say it is not particularly fast, at least not on my laptop. In the end Elasticsearch is not really build to just serve large number of records. Feel free to use the latest 0.8.2.4 release to test it |
Yep, seems to be OK now :) --Dan From: Corne Versloot [mailto:notifications@github.com] I have rebuild this part and results are always fetched in batches using a scroll. Search results are added to a single ResultSet or to multiple if configured to do so (setting results.split or using setMaxRows(...) on the Statement). Splitting results is generally speaking the better approach since it avoids loading all results before handing it to the client. Tested up to 3.6M rows with and without specifying a limit. I must say it is not particularly fast, at least not on my laptop. In the end Elasticsearch is not really build to just serve large number of records. Feel free to use the latest 0.8.2.4 release to test it — Reply to this email directly, view it on GitHubhttps://github.com//issues/22#issuecomment-249418781, or mute the threadhttps://github.com/notifications/unsubscribe-auth/APcOgbp8erfhOSojVOsEtEKbrPfQJ040ks5qtmbNgaJpZM4J90sK.Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for |
ES support offset since v0.9, I have added mysql style limit with offset support both in my fork of presto-parser and es4sql, maybe this would be helpful. |
Is the resultset FETCH implemented based on this concept https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-search-scrolling.html or is it restricted by the greatest of index.max_result_window and fetch.size ?
Currently I can't find a way to get the entire result set from a SELECT query if the result contains more than fetch.size (or index.max_result_window) rows because offsets are currently not implemented in sql4es.
Any hints/workarounds or plans for the future? :)
The text was updated successfully, but these errors were encountered: