Elasticsearch enable Point In Time based searches #30824

prodriguezdefino · 2024-04-02T06:55:01Z

This change adds a BoundedReader implementation based on the newer Point In Time search API (see doc). This API is the recommended mode when reading data from large indexes that imply retrieving documents using deep iteration (more than 10000 hits per slice).

This mode should be available for Elasticsearch clusters with version 8 and up.

When using the default read implementation, based on Scrolls, when the index is big enough and the number of documents is large, the creation of slices with more than 10000 elements to be iterated would make the read operations to fail. Now, with the PIT implementation, this is not a problem since ES does not have to keep track of the whole state of the index while iterating, only the current window for the PIT iterator.

Adding @lord-skinner from Elastic.co for visibility.

prodriguezdefino · 2024-04-02T22:12:34Z

Run Java_ElasticSearch_IO_Direct PreCommit

…lated changes.

…ion entry points for PIT sort properties

prodriguezdefino · 2024-04-09T06:05:28Z

Run Java_ElasticSearch_IO_Direct PreCommit

prodriguezdefino · 2024-04-09T06:13:28Z

Run Java_ElasticSearch_IO_Direct PreCommit

…a query

github-actions · 2024-04-10T06:36:04Z

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @Abacn for label java.
R: @johnjcasey for label io.

Available commands:

stop reviewer notifications - opt out of the automated review tooling
remind me after tests pass - tag the comment author after tests pass
waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

github-actions · 2024-04-17T12:13:43Z

Reminder, please take a look at this pr: @Abacn @johnjcasey

Abacn · 2024-04-17T13:37:24Z

sorry for delay, taking a look

...sts-common/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTestCommon.java

...ava/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java

prodriguezdefino added 2 commits March 27, 2024 20:50

first implementation for a PIT iterator on read PTransform

f3a6123

adding deprecation of the scroll keep alive setting

12fa9ec

github-actions bot added java io elasticsearch labels Apr 2, 2024

remove deprecation from tests and fixed empty query diff between readers

50b1dc6

prodriguezdefino added 6 commits April 2, 2024 15:53

adding try catch to debug IT tests

2e82039

fixes match all default query for both readers types, remove debug re…

a366af6

…lated changes.

adding first tests for PIT reads

c53c252

added default timestamp field on test documents, and added configurat…

c3b7597

…ion entry points for PIT sort properties

fixing query format for PIT search test

2c18ab7

simplify query handling

65ffbd4

trigger checks

a5e0908

prodriguezdefino added 4 commits April 9, 2024 11:18

debug tests

b9f72f4

fixes the case when the reader slice only had 1 document as hits for …

91d1ac6

…a query

spotless

0ca88f8

java doc update

e9e0325

prodriguezdefino marked this pull request as ready for review April 10, 2024 06:21

github-actions bot added the Next Action: Reviewers label Apr 10, 2024

github-actions bot added the slow-review label Apr 17, 2024

github-actions bot removed the slow-review label Apr 17, 2024

Abacn reviewed Apr 18, 2024

View reviewed changes

...sts-common/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTestCommon.java Show resolved Hide resolved

...ava/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java Outdated Show resolved Hide resolved

addressing comments from review

d132035

prodriguezdefino requested a review from Abacn April 30, 2024 02:28

Abacn reviewed Apr 30, 2024

View reviewed changes

...ava/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java Outdated Show resolved Hide resolved

...ava/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java Outdated Show resolved Hide resolved

reverting scroll rename

0a3df5b

Abacn approved these changes Apr 30, 2024

View reviewed changes

Abacn merged commit 9612fe1 into apache:master Apr 30, 2024
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elasticsearch enable Point In Time based searches #30824

Elasticsearch enable Point In Time based searches #30824

prodriguezdefino commented Apr 2, 2024 •

edited

prodriguezdefino commented Apr 2, 2024

prodriguezdefino commented Apr 9, 2024

prodriguezdefino commented Apr 9, 2024

github-actions bot commented Apr 10, 2024

github-actions bot commented Apr 17, 2024

Abacn commented Apr 17, 2024

Elasticsearch enable Point In Time based searches #30824

Elasticsearch enable Point In Time based searches #30824

Conversation

prodriguezdefino commented Apr 2, 2024 • edited

prodriguezdefino commented Apr 2, 2024

prodriguezdefino commented Apr 9, 2024

prodriguezdefino commented Apr 9, 2024

github-actions bot commented Apr 10, 2024

github-actions bot commented Apr 17, 2024

Abacn commented Apr 17, 2024

prodriguezdefino commented Apr 2, 2024 •

edited