New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elasticsearch enable Point In Time based searches #30824
Elasticsearch enable Point In Time based searches #30824
Conversation
Run Java_ElasticSearch_IO_Direct PreCommit |
…ion entry points for PIT sort properties
Run Java_ElasticSearch_IO_Direct PreCommit |
Run Java_ElasticSearch_IO_Direct PreCommit |
Assigning reviewers. If you would like to opt out of this review, comment R: @Abacn for label java. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
Reminder, please take a look at this pr: @Abacn @johnjcasey |
sorry for delay, taking a look |
This change adds a BoundedReader implementation based on the newer Point In Time search API (see doc). This API is the recommended mode when reading data from large indexes that imply retrieving documents using deep iteration (more than 10000 hits per slice).
This mode should be available for Elasticsearch clusters with version 8 and up.
When using the default read implementation, based on Scrolls, when the index is big enough and the number of documents is large, the creation of slices with more than 10000 elements to be iterated would make the read operations to fail. Now, with the PIT implementation, this is not a problem since ES does not have to keep track of the whole state of the index while iterating, only the current window for the PIT iterator.
Adding @lord-skinner from Elastic.co for visibility.