[FLINK-25569][core] Add decomposed Sink V2 interface #18302

fapaul · 2022-01-07T16:32:21Z

What is the purpose of the change

This is the first PR of FLIP-191 (https://issues.apache.org/jira/browse/FLINK-25555) it introduces the basic decomposed interfaces that are the replacement for the existing Sink V1 interfaces. The PR only adds the public-facing interfaces and does not implement the stream graph translation yet. It is a follow-up task

Brief change log

fe71154 Clarifies the retry behaviour of committer and global committer
5022abf Fixes a problem if Transformations do not properly implement hashcode and equals
d742c25 Expose parts of the internal ProcessingTimeService through an external facade
412c841 Introduction of the Sink V2 interfaces

Verifying this change

The PR mostly consists of interface additions and file movements.

Added a test for 5022abf

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (yes / no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
The serializers: (yes / no / don't know)
The runtime per-record code paths (performance sensitive): (yes / no / don't know)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
The S3 file system connector: (yes / no / don't know)

Documentation

Does this pull request introduce a new feature? (yes / no)
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

fapaul · 2022-01-07T16:32:43Z

@gaoyunhaii do you also want to have a look at this PR?

flinkbot · 2022-01-07T16:34:53Z

CI report:

9089425 Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

flinkbot · 2022-01-07T16:37:39Z

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 412c841 (Fri Jan 07 16:37:39 UTC 2022)

Warnings:

No documentation files were touched! Remember to keep the Flink docs up to date!

_{Mention the bot in a comment to re-run the automated checks.}

Review Progress

❓ 1. The [description] looks good.
❓ 2. There is [consensus] that the contribution should go into to Flink.
❓ 3. Needs [attention] from.
❓ 4. The change fits into the overall [architecture].
❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
@flinkbot approve all to approve all aspects
@flinkbot approve-until architecture to approve everything until architecture
@flinkbot attention @username1 [@username2 ..] to require somebody's attention
@flinkbot disapprove architecture to remove an approval you gave earlier

gaoyunhaii · 2022-01-07T17:18:14Z

Very thanks @fapaul for drafting the PR! I'll also have a look soon~

alpreu

I wanted to have a look at this because I'm interested in the new Sink interface as well. I left a few comments :)

flink-core/src/main/java/org/apache/flink/api/connector/sink/GlobalCommitter.java

flink-core/src/main/java/org/apache/flink/api/connector/sink2/Committer.java

flink-core/src/main/java/org/apache/flink/api/connector/sink2/TwoPhaseCommittingSink.java

...eaming-java/src/test/java/org/apache/flink/streaming/api/graph/StreamGraphGeneratorTest.java

alpreu · 2022-01-10T08:21:19Z

flink-core/src/main/java/org/apache/flink/api/connector/sink2/Committer.java

+         * <p>Currently calling this method only logs the error, discards the comittable and
+         * continues. In the future the behaviour might be configurable.


I checked the usages (also of the other methods in the CommitRequest interface to which this applies as well) but I don't see any reference implementation. Is the docstring correct as it is then?

This PR only introduces the public-facing API but not the internal implementation. I did this to split the PR into more reviewable chunks.

In general, the two failure methods are designed to provide in the future the possibility to add failure side channels but in the first version, they will only log or fail the job.

gaoyunhaii

Very thanks @fapaul for drafting the PR! I have left some comments~

flink-core/src/main/java/org/apache/flink/api/connector/sink2/Committer.java

gaoyunhaii · 2022-01-10T09:45:23Z

flink-core/src/main/java/org/apache/flink/api/common/operators/ProcessingTimeService.java

+ * the given {@link ProcessingTimeCallback} when firing.
+ */
+@PublicEvolving
+public interface ProcessingTimeService {


Hi @fapaul~ could you elaborate me a bit why we want to split the ProcessingTimeService into two classes~? I'm asking since the remaining methods seems to be similar to registerTimer, like scheduleWithFixedDelay. Is it possible we directly move the original ProcessingTimeService into core~?

The idea here is to decouple the internal ProcessingTimeService from the one we want to expose. The internal one for example implements ProcessingTimeService#quiesce which we should not expose to the user.

Regarding the methods, you have mentioned we can migrate them in the future to the public ProcessingTimeService but currently I do not see the need yet.

I'm still a bit concern since tasks.ProcessingTimeService is also exposed to users via classes like ProcessingTimeServiceAware and AbstractStreamOperator. But also since of that, we indeed have to also keep the tasks.ProcessingTimeService there, and if we want to break the reverse dependency, we could indeed only either introduce a separate sets of processing timer service or extract a new super interface. So currently let's keep the current option.

We might further consider it when it is acceptable to do some api-break changes, perhaps like rename the tasks. ProcessingTimeService.

flink-core/src/main/java/org/apache/flink/api/connector/sink2/Sink.java

gaoyunhaii · 2022-01-10T09:51:51Z

flink-core/src/main/java/org/apache/flink/api/connector/sink2/Committer.java

+         * <p>Currently calling this method only logs the error, discards the comittable and
+         * continues. In the future the behaviour might be configurable.
+         */
+        void failedWithKnownReason(Throwable t);


Are the methods expected to be called by Committer? If so would failWithKnownReason and failWithUnknownReason be better~?

I think the term failed is correct here because it describes the state of the comittable. In the future, we may add a general configuration on how to handle failures i.e. submit to dead letter queue.

Since we cannot really rename the method anymore I already made it "future-proof".

flink-core/src/main/java/org/apache/flink/api/connector/sink2/Committer.java

flink-core/src/main/java/org/apache/flink/api/connector/sink2/Sink.java

flink-core/src/main/java/org/apache/flink/api/connector/sink2/SinkWriter.java

flink-core/src/main/java/org/apache/flink/api/connector/sink2/TwoPhaseCommittingSink.java

fapaul · 2022-01-10T13:32:18Z

@zentol FYI 5c2fd49 adds public annotations to the metrics classes

fapaul · 2022-01-10T14:24:40Z

Thanks for the review @gaoyunhaii @alpreu I have addressed all your comments. PTAL.

gaoyunhaii

Very thanks @fapaul for the PR! LGTM~

…ng a IdentityHashMap to track transformations. The already transformed transformation are copied into a different map and compared. If the transformation does not properly implement equals the isTransformed check may fail and the transformation is copied multiple times. Now that is hardened because we check the object reference.

…ink-core

The new interface separates concerns and will make future refactorings and extensions easier. The user immediately which methods needs to be implemented.

JingGe · 2022-01-12T18:48:55Z

flink-core/src/main/java/org/apache/flink/api/connector/sink2/Committer.java

+         * permanently fail after reaching that maximum. Else the committable will be retried as
+         * long as this method is invoked after each attempt.
+         */
+        void retryLater();


I am not sure if I understood it correctly after reading the java doc. Does it mean that this method will be called as long as the maximum is not exceeded? The name retryLater sounds like an asynch call, Is that your intention? The follow-up question will be how late? Will the time period be controlled by the configuration, since there is no input of this method?

So far none of our sinks sets a number of maximum retries but in the future, we might consider it. The retry mechanism will work internally similar to the current implementation [1]. As soon as the committable is retried we enqueue in the mailbox that is polled "periodically" and retried. Moreover during the next checkpoint, the committable is retried as well.

[1]

flink/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/operators/sink/CommitterOperator.java

Line 96 in dbbf2a3

commitRetrier.retryWithDelay();

JingGe · 2022-01-12T21:17:15Z

flink-core/src/main/java/org/apache/flink/api/connector/sink2/StatefulSink.java

+     *
+     * @return the serializer of the writer's state type.
+     */
+    SimpleVersionedSerializer<WriterStateT> getWriterStateSerializer();


Optional has been removed from multiple methods, this is one of them. Could you explain a little more about your thoughts?

Removing all these optionals was one of the intentions behind designing the new interfaces. Sink developers can now explicitly decide which functionality they want to support and implement the interfaces accordingly [1]. With the Sink V1 interfaces they basically always had to implement everything except that some of the methods have default implementations.

[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-191%3A+Extend+unified+Sink+interface+to+support+small+file+compaction#FLIP191:ExtendunifiedSinkinterfacetosupportsmallfilecompaction-SimpleSink

fapaul force-pushed the FLINK-25569 branch from 412c841 to 1f32fec Compare January 7, 2022 16:38

rmetzger added the component=Connectors/Common label Jan 7, 2022

alpreu reviewed Jan 10, 2022

View reviewed changes

gaoyunhaii reviewed Jan 10, 2022

View reviewed changes

fapaul force-pushed the FLINK-25569 branch from 1f32fec to cd00723 Compare January 10, 2022 13:31

fapaul force-pushed the FLINK-25569 branch from cd00723 to 9b02dde Compare January 10, 2022 14:11

fapaul force-pushed the FLINK-25569 branch 2 times, most recently from 146bb76 to 9500281 Compare January 11, 2022 09:18

fapaul mentioned this pull request Jan 11, 2022

[FLINK-25608][metrics] Annotate metrics classes with Public(Evolving) #18325

Merged

gaoyunhaii approved these changes Jan 11, 2022

View reviewed changes

fapaul mentioned this pull request Jan 11, 2022

[FLINK-25570][streaming] Add topology extension points to Sink V2 #18330

Merged

AHeise and others added 6 commits January 12, 2022 15:24

[hotfix][core] Fix javadoc in (Global)committer for retries.

f474577

[FLINK-25569][core] Extract public facing ProcessingTimeService to fl…

87fe2f4

…ink-core

[FLINK-25569][core] Mark UserCodeClassLoader as PublicEvolving

846e6c1

[FLINK-25569][core] Mark SimpleVersionedSerializer as PublicEvolving

4bf46e8

[FLINK-25569][core] Add decomposed Sink V2 interface.

9089425

The new interface separates concerns and will make future refactorings and extensions easier. The user immediately which methods needs to be implemented.

fapaul force-pushed the FLINK-25569 branch from 9500281 to 9089425 Compare January 12, 2022 14:26

JingGe reviewed Jan 12, 2022

View reviewed changes

fapaul merged commit 0619274 into apache:master Jan 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-25569][core] Add decomposed Sink V2 interface #18302

[FLINK-25569][core] Add decomposed Sink V2 interface #18302

fapaul commented Jan 7, 2022

fapaul commented Jan 7, 2022

flinkbot commented Jan 7, 2022 •

edited

flinkbot commented Jan 7, 2022

gaoyunhaii commented Jan 7, 2022

alpreu left a comment

alpreu Jan 10, 2022 •

edited

fapaul Jan 10, 2022

gaoyunhaii left a comment

gaoyunhaii Jan 10, 2022

fapaul Jan 10, 2022

gaoyunhaii Jan 11, 2022

gaoyunhaii Jan 11, 2022 •

edited

gaoyunhaii Jan 10, 2022

fapaul Jan 10, 2022

fapaul commented Jan 10, 2022

fapaul commented Jan 10, 2022

gaoyunhaii left a comment

JingGe Jan 12, 2022

fapaul Jan 13, 2022

JingGe Jan 12, 2022

fapaul Jan 13, 2022

		* <p>Currently calling this method only logs the error, discards the comittable and
		* continues. In the future the behaviour might be configurable.

[FLINK-25569][core] Add decomposed Sink V2 interface #18302

[FLINK-25569][core] Add decomposed Sink V2 interface #18302

Conversation

fapaul commented Jan 7, 2022

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

fapaul commented Jan 7, 2022

flinkbot commented Jan 7, 2022 • edited

CI report:

flinkbot commented Jan 7, 2022

Automated Checks

Review Progress

gaoyunhaii commented Jan 7, 2022

alpreu left a comment

Choose a reason for hiding this comment

alpreu Jan 10, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gaoyunhaii left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gaoyunhaii Jan 11, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fapaul commented Jan 10, 2022

fapaul commented Jan 10, 2022

gaoyunhaii left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flinkbot commented Jan 7, 2022 •

edited

alpreu Jan 10, 2022 •

edited

gaoyunhaii Jan 11, 2022 •

edited