[FLINK-30488] OpenSearch implementation of Async Sink #5

reta · 2022-12-30T19:52:22Z

OpenSearch implementation of Async Sink (https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink), a few TODO items:

More test cases
More sensitive defaults

reta · 2023-01-03T20:53:04Z

...pensearch/src/test/java/org/apache/flink/connector/opensearch/sink/OpensearchSinkITCase.java

-            env.setRestartStrategy(RestartStrategies.noRestart());
-        }
-        DataStream<Long> stream = env.fromSequence(1, 5);
+        try (final StreamExecutionEnvironment env = new LocalStreamEnvironment()) {


Not related but StreamExecutionEnvironment is AutoCloseable, slightly changing the test case

reta · 2023-01-03T21:03:46Z

@zentol @MartijnVisser would appreciate if you guys have time for the review, adding AsyncSink support for OpenSearch, discussed initially [1]

[1] apache/flink#18541 (comment)

reta · 2023-02-06T16:54:17Z

@zentol @MartijnVisser doing my one per month ping diligence :-), please

MartijnVisser · 2023-02-07T13:55:48Z

@zentol @MartijnVisser doing my one per month ping diligence :-), please

I'm currently a bit over capacity. Don't know if the same applies for @zentol tbh

@dannycranmer Could you potentially help out? You also have the experience with the Async API, or perhaps @hlteoh37 ?

hlteoh37 · 2023-02-07T15:07:51Z

Sure, I can take a look

hlteoh37 · 2023-02-07T20:27:33Z

...rch-e2e-tests/src/main/java/org/apache/flink/streaming/tests/OpensearchAsyncSinkExample.java

+        List<HttpHost> httpHosts = new ArrayList<>();
+        httpHosts.add(new HttpHost("127.0.0.1", 9200, "http"));


not sure what this is used for, shall we remove it?

That's just an example of using AsyncSync, we used to have them https://github.com/apache/flink-connector-opensearch/tree/main/flink-connector-opensearch-e2e-tests/src/main/java/org/apache/flink/streaming/tests

hlteoh37

Added some comments!

hlteoh37 · 2023-02-07T20:28:21Z

...rch-e2e-tests/src/main/java/org/apache/flink/streaming/tests/OpensearchAsyncSinkExample.java

+
+        OpensearchAsyncSinkBuilder<Tuple2<String, String>> osSinkBuilder =
+                OpensearchAsyncSink.<Tuple2<String, String>>builder()
+                        .setHosts(new HttpHost("localhost:9200"))


Hm, should we instead define a constant something like OPENSEARCH_DOMAIN so users can use the example more easily?

Oh I see, there are 2 places where same host is used, should be 1, I will fix that, thank you

...opensearch/src/main/java/org/apache/flink/connector/opensearch/sink/OpensearchAsyncSink.java

reta · 2023-02-08T13:57:02Z

...opensearch/src/main/java/org/apache/flink/connector/opensearch/sink/OpensearchAsyncSink.java

+            int maxBufferedRequests,
+            long maxBatchSizeInBytes,
+            long maxTimeInBufferMS,
+            long maxRecordSizeInBytes,


This is specific to AWS OpenSearch managed service, this is not applicable to OpenSearch in general.

hlteoh37 · 2023-02-07T20:49:38Z

...rch/src/main/java/org/apache/flink/connector/opensearch/sink/OpensearchAsyncSinkBuilder.java

+     */
+    public OpensearchAsyncSinkBuilder<InputT> setHosts(HttpHost... hosts) {
+        checkNotNull(hosts);
+        checkState(hosts.length > 0, "Hosts cannot be empty.");


would checkArgument be a better method to call here?

Hm, also we check this twice, once in builder, and once in constructor. Would it be better to just validate this in the constructor ?

I thing checking twice is acceptable here: we should fail as early as possible, allowing to construct a builder with possibly illegal arguments and carrying it over could potentially raise an exception down the stack, when build method is called. By validating early, we are preventing that.

hlteoh37 · 2023-02-07T21:33:34Z

...ensearch/src/main/java/org/apache/flink/connector/opensearch/sink/OpensearchAsyncWriter.java

+    }
+
+    private void handleFullyFailedBulkRequest(
+            Throwable err,


Should we consider logging this error? Otherwise the sink can get stuck in a retry loop without any logs.

Are there any exceptions we want to classify as non-retryable and fail the Flink job? For example "domain doesn't exist" or "insufficient permissions"?

See example here
https://github.com/apache/flink-connector-aws/blob/9e09d57210/flink-connector-dynamodb/src/main/java/org/apache/flink/connector/dynamodb/sink/DynamoDbSinkWriter.java#L104

Certainly makes sense!

hlteoh37 · 2023-02-07T21:39:27Z

...ensearch/src/main/java/org/apache/flink/connector/opensearch/sink/OpensearchAsyncWriter.java

+        final BulkItemResponse[] items = response.getItems();
+
+        for (int i = 0; i < items.length; i++) {
+            if (items[i].getFailure() != null) {


Should we consider logging this error?

I don't think so, there could be massive amount of items in bulk request, logging 10k failures (failure is reported per item), would probably flood the logs

hlteoh37 · 2023-02-07T21:42:47Z

...ensearch/src/main/java/org/apache/flink/connector/opensearch/sink/OpensearchAsyncWriter.java

+        if (networkClientConfig.getConnectionRequestTimeout() != null
+                || networkClientConfig.getConnectionTimeout() != null
+                || networkClientConfig.getSocketTimeout() != null) {
+            builder.setRequestConfigCallback(
+                    requestConfigBuilder -> {
+                        if (networkClientConfig.getConnectionRequestTimeout() != null) {
+                            requestConfigBuilder.setConnectionRequestTimeout(
+                                    networkClientConfig.getConnectionRequestTimeout());
+                        }
+                        if (networkClientConfig.getConnectionTimeout() != null) {
+                            requestConfigBuilder.setConnectTimeout(
+                                    networkClientConfig.getConnectionTimeout());
+                        }
+                        if (networkClientConfig.getSocketTimeout() != null) {
+                            requestConfigBuilder.setSocketTimeout(
+                                    networkClientConfig.getSocketTimeout());
+                        }
+                        return requestConfigBuilder;
+                    });
+        }


nit: Seems unnecessary to do 2 null checks. Should we instead just remove the outer if?

The presence of first if helps to eliminate the need to create the RequestConfigCallback instance at all if there is nothing to configure.

hlteoh37 · 2023-02-07T22:05:09Z

...tor-opensearch/src/main/java/org/apache/flink/connector/opensearch/sink/DocSerdeRequest.java

+    private static DocWriteRequest<?> readDocumentRequest(StreamInput in) throws IOException {
+        byte type = in.readByte();
+        DocWriteRequest<?> docWriteRequest;
+        if (type == 0) {
+            docWriteRequest = new IndexRequest(in);
+        } else if (type == 1) {
+            docWriteRequest = new DeleteRequest(in);
+        } else if (type == 2) {
+            docWriteRequest = new UpdateRequest(in);
+        } else {
+            throw new IllegalStateException("Invalid request type [" + type + " ]");
+        }
+        return docWriteRequest;


These methods are untested. Should we add unit tests for them?

They are tested in scope if integration test, OpensearchAsyncSinkITCase, both reading and writing side.

+1 for a unit test. Unless there is a good reason not to, unit tests give quicker feedback.

hlteoh37 · 2023-02-07T22:09:14Z

...rch-e2e-tests/src/main/java/org/apache/flink/streaming/tests/OpensearchAsyncSinkExample.java

+                                        new IndexRequest("my-index")
+                                                .id(element.f0.toString())
+                                                .source(element.f1));


Hmm.. Since we have to implement a DocSerdeRequest, should we consider exposing this in the interface instead of OpenSearch classes? This might be helpful in the event OpenSearch's interface changes.

The DocSerdeRequest is sadly a necessary leaking abstraction (AsyncSink requires Serializable), we should export in the places when it is inevitable but in general we should only operate over OpenSearch APIs.

This is a shame indeed, because Async Sink does not actually need Serializable. https://issues.apache.org/jira/browse/FLINK-27537

.github/workflows/push_pr.yml

reta · 2023-02-08T14:57:37Z

Thanks a lot for the review @hlteoh37 , I believe I addressed or/and answered all your comments, please let me know if I missed something

.github/workflows/push_pr.yml

dannycranmer · 2023-02-10T16:25:39Z

...rch-e2e-tests/src/main/java/org/apache/flink/streaming/tests/OpensearchAsyncSinkExample.java

+                                        new IndexRequest("my-index")
+                                                .id(element.f0.toString())
+                                                .source(element.f1));


This is a shame indeed, because Async Sink does not actually need Serializable. https://issues.apache.org/jira/browse/FLINK-27537

dannycranmer · 2023-02-10T16:42:08Z

...tor-opensearch/src/main/java/org/apache/flink/connector/opensearch/sink/DocSerdeRequest.java

+ * @param <T> type of the write request
+ */
+@PublicEvolving
+public class DocSerdeRequest<T> implements Serializable {


Ii think the class level generics are redundant here. We are using <?> throughout. Consider changing private final DocWriteRequest<T> request; to private final DocWriteRequest<?> request; and removing class generics. This makes the Sink interface a bit messy extends AsyncSinkBase<InputT, DocSerdeRequest<?>>

~~This is my bad, the T must be constrained, I will fix it~~ Removed T, not necessary indeed

dannycranmer · 2023-02-10T16:44:34Z

...tor-opensearch/src/main/java/org/apache/flink/connector/opensearch/sink/DocSerdeRequest.java

+    private static DocWriteRequest<?> readDocumentRequest(StreamInput in) throws IOException {
+        byte type = in.readByte();
+        DocWriteRequest<?> docWriteRequest;
+        if (type == 0) {
+            docWriteRequest = new IndexRequest(in);
+        } else if (type == 1) {
+            docWriteRequest = new DeleteRequest(in);
+        } else if (type == 2) {
+            docWriteRequest = new UpdateRequest(in);
+        } else {
+            throw new IllegalStateException("Invalid request type [" + type + " ]");
+        }
+        return docWriteRequest;


+1 for a unit test. Unless there is a good reason not to, unit tests give quicker feedback.

dannycranmer · 2023-02-10T16:51:57Z

...rch/src/main/java/org/apache/flink/connector/opensearch/sink/OpensearchAsyncSinkBuilder.java

+                        1000), /* OpensearchConnectorOptions.BULK_FLUSH_MAX_ACTIONS_OPTION */
+                nonNullOrDefault(
+                        getMaxInFlightRequests(), 1), /* BulkProcessor::concurrentRequests */
+                nonNullOrDefault(getMaxBufferedRequests(), 10000),


Can we also promote the other magic numbers to constants? 10000 and 2 * 1024 * 1024

dannycranmer · 2023-02-10T16:56:13Z

...ensearch/src/main/java/org/apache/flink/connector/opensearch/sink/OpensearchAsyncWriter.java

+                    if (networkClientConfig.getPassword() != null
+                            && networkClientConfig.getUsername() != null) {
+                        final CredentialsProvider credentialsProvider =
+                                new BasicCredentialsProvider();
+                        credentialsProvider.setCredentials(
+                                AuthScope.ANY,
+                                new UsernamePasswordCredentials(
+                                        networkClientConfig.getUsername(),
+                                        networkClientConfig.getPassword()));
+
+                        httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider);
+                    }
+
+                    if (networkClientConfig.isAllowInsecure().orElse(false)) {
+                        try {
+                            httpClientBuilder.setSSLContext(
+                                    SSLContexts.custom()
+                                            .loadTrustMaterial(new TrustAllStrategy())
+                                            .build());
+                        } catch (final NoSuchAlgorithmException
+                                | KeyStoreException
+                                | KeyManagementException ex) {
+                            throw new IllegalStateException(
+                                    "Unable to create custom SSL context", ex);
+                        }
+                    }
+
+                    return httpClientBuilder;


nit: Should we move this out to a separate class?

why? that's the only place is needed actually, seems like sealing it in place is acceptable

In my opinion it breaches the single responsibility philosophy. The writer is responsible for writing and knowing how to construct the client. I am less concerned how many times it is used. However, I marked as nit since I am not marking this as a blocker.

Thanks @dannycranmer , I think this is a good idea (more over, I was not correct, there was another place with the similar instantiation logic present), extracted the utility class

reta · 2023-02-10T20:08:36Z

Thanks @dannycranmer , I think I went through all your comments, thanks a lot, really appreciate it.

dannycranmer · 2023-02-23T15:30:45Z

...opensearch/src/test/java/org/apache/flink/connector/opensearch/sink/DocSerdeRequestTest.java

+    @Test
+    @SuppressWarnings("unchecked")
+    void unsupportedRequestType() throws IOException {
+        final DocSerdeRequest serialized = DocSerdeRequest.from(mock(DocWriteRequest.class));


Mockito is banned. Since you only have one usage here can we remove it?

dannycranmer · 2023-02-23T15:42:48Z

@reta The PR looks good to me minus the Mockito comment. However I have questions over the approach here. We are adding a new sink alongside the existing sink, we will have OpensearchSink and OpensearchAsyncSink. How do the users know which one to pick? Why not replace the existing sink with the new implementation? The Jira mentions docs, however there is no update here. Will you create a followup PR for that?

If this has already been discussed on mailing lists I missed that, please give me a link :D

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

…more) Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

reta · 2023-03-03T16:12:16Z

Thanks a lot for review @dannycranmer

@reta The PR looks good to me minus the Mockito comment. However I have questions over the approach here. We are adding a new sink alongside the existing sink, we will have OpensearchSink and OpensearchAsyncSink. How do the users know which one to pick? Why not replace the existing sink with the new implementation?

This is a indeed a good question, I think the main difference between those are within internal APIs the implementation is based upon:

OpensearchAsyncSink uses RestHighLevelClient::bulkdAsync directly to dispatch the bulk requests
OpensearchSink uses BulkProcessor and offers more flexibility with respect to failure handling and backoff policies (no straight equivalent in RestHighLevelClient)

I have covered this part in the docs, thank you.

The Jira mentions docs, however there is no update here. Will you create a followup PR for that?

Updated the documentation, thank you

If this has already been discussed on mailing lists I missed that, please give me a link :D

You mean the AsyncSync implementation for OpenSearch? No, it was not discussed on mailing list but was mentioned on the initial pull request apache/flink#18541 (comment)

reta · 2023-03-28T14:59:55Z

@dannycranmer would appreciate if you could take a look, thank you

dannycranmer · 2023-03-29T10:13:26Z

@reta I am reluctant to introduce a new Sink API based on the internal implementation unless there is a really good/semantic reason. I would prefer to encapsulate the internals via a single Flink layer that can support either RestHighLevelClient/BulkProcessor based on configuration. How will this look for SQL? We usually use a simple identifier like "opensearch", I fear that "opensearch-async" adds no semantic value to the user.

We should keep the Sink API as simple as possible with sensible defaults, and allow advanced users to configure as they wish. For instance, a user should not need decide to use OpensearchSink vs OpensearchAsyncSink, they should just use OpensearchSink and configure as needed.

There could be reasons to have 2x Sinks if they support fundamentally different features/APIs but I would expect the naming to reflect this, for example OpensearchRestHighLevelClientSink/OpensearchBulkProcessorSink.

Apologies for raising these concerns late in the process but I cannot see this has been considered before. @MartijnVisser what are your thoughts?

reta · 2023-03-29T17:43:40Z

I am reluctant to introduce a new Sink API based on the internal implementation unless there is a really good/semantic reason.

Thanks @dannycranmer , I understand your concerns. I will move this pull request to draft (for now) so we could get to it at some point in the future, when migrating off the RestHighLevelClient to opensearch-java, thanks again for review and your thoughts.

boring-cyborg bot added the component=Connectors/Opensearch label Dec 30, 2022

reta force-pushed the FLINK-30488 branch 4 times, most recently from 2aacffb to 46b018e Compare January 3, 2023 20:47

reta commented Jan 3, 2023

View reviewed changes

reta marked this pull request as ready for review January 3, 2023 20:55

hlteoh37 reviewed Feb 7, 2023

View reviewed changes

hlteoh37 requested changes Feb 7, 2023

View reviewed changes

boring-cyborg bot added the component=BuildSystem label Feb 8, 2023

reta force-pushed the FLINK-30488 branch from 8a2529f to c3cdffc Compare February 8, 2023 14:22

reta commented Feb 8, 2023

View reviewed changes

.github/workflows/push_pr.yml Outdated Show resolved Hide resolved

reta force-pushed the FLINK-30488 branch from b05bcb8 to bf1e2de Compare February 8, 2023 16:40

dannycranmer requested changes Feb 10, 2023

View reviewed changes

reta force-pushed the FLINK-30488 branch 3 times, most recently from ab7f2eb to 455048c Compare February 13, 2023 14:58

dannycranmer reviewed Feb 23, 2023

View reviewed changes

reta added 5 commits March 3, 2023 10:40

[FLINK-30488] OpenSearch implementation of Async Sink

cfa50af

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

Addressing code review comments

cf0992f

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

Update Apache Flink to 1.16.1 (1.16.0 artifacts are not available any…

7213a0c

…more) Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

Addressing code review comments

5eb2fa4

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

Addressing code review comments

418c0e0

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

reta added 3 commits March 3, 2023 10:40

Addressing code review comments

a029985

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

Addressing code review comments

0d1c957

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

Address code review comments

4cd6423

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

reta force-pushed the FLINK-30488 branch from 455048c to 4cd6423 Compare March 3, 2023 15:59

boring-cyborg bot added the component=Documentation label Mar 3, 2023

reta requested a review from dannycranmer March 3, 2023 16:12

reta marked this pull request as draft March 29, 2023 17:44

mtfelisb mentioned this pull request Sep 16, 2023

[FLINK-26088][Connectors/ElasticSearch] Add Elasticsearch 8.0 support apache/flink-connector-elasticsearch#53

Merged

reta mentioned this pull request Feb 22, 2024

[FLINK-33859] Support OpenSearch v2 #38

Merged

		List<HttpHost> httpHosts = new ArrayList<>();
		httpHosts.add(new HttpHost("127.0.0.1", 9200, "http"));

[FLINK-30488] OpenSearch implementation of Async Sink #5

Are you sure you want to change the base?

[FLINK-30488] OpenSearch implementation of Async Sink #5

Conversation

reta commented Dec 30, 2022 • edited Loading

reta Jan 3, 2023 • edited Loading

Choose a reason for hiding this comment

reta commented Jan 3, 2023

reta commented Feb 6, 2023

MartijnVisser commented Feb 7, 2023

hlteoh37 commented Feb 7, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hlteoh37 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as resolved.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reta Feb 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reta commented Feb 8, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reta Feb 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reta commented Feb 10, 2023

Choose a reason for hiding this comment

dannycranmer commented Feb 23, 2023

reta commented Mar 3, 2023

reta commented Mar 28, 2023

dannycranmer commented Mar 29, 2023

reta commented Mar 29, 2023

reta commented Dec 30, 2022 •

edited

Loading

reta Jan 3, 2023 •

edited

Loading

reta Feb 8, 2023 •

edited

Loading

reta Feb 10, 2023 •

edited

Loading