-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-25569][core] Add decomposed Sink V2 interface #18302
Conversation
@gaoyunhaii do you also want to have a look at this PR? |
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 412c841 (Fri Jan 07 16:37:39 UTC 2022) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
Very thanks @fapaul for drafting the PR! I'll also have a look soon~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to have a look at this because I'm interested in the new Sink interface as well. I left a few comments :)
flink-core/src/main/java/org/apache/flink/api/connector/sink/GlobalCommitter.java
Outdated
Show resolved
Hide resolved
flink-core/src/main/java/org/apache/flink/api/connector/sink2/Committer.java
Outdated
Show resolved
Hide resolved
flink-core/src/main/java/org/apache/flink/api/connector/sink2/TwoPhaseCommittingSink.java
Outdated
Show resolved
Hide resolved
flink-core/src/main/java/org/apache/flink/api/connector/sink2/TwoPhaseCommittingSink.java
Outdated
Show resolved
Hide resolved
...eaming-java/src/test/java/org/apache/flink/streaming/api/graph/StreamGraphGeneratorTest.java
Outdated
Show resolved
Hide resolved
* <p>Currently calling this method only logs the error, discards the comittable and | ||
* continues. In the future the behaviour might be configurable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the usages (also of the other methods in the CommitRequest
interface to which this applies as well) but I don't see any reference implementation. Is the docstring correct as it is then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR only introduces the public-facing API but not the internal implementation. I did this to split the PR into more reviewable chunks.
In general, the two failure methods are designed to provide in the future the possibility to add failure side channels but in the first version, they will only log or fail the job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very thanks @fapaul for drafting the PR! I have left some comments~
flink-core/src/main/java/org/apache/flink/api/connector/sink2/Committer.java
Outdated
Show resolved
Hide resolved
* the given {@link ProcessingTimeCallback} when firing. | ||
*/ | ||
@PublicEvolving | ||
public interface ProcessingTimeService { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @fapaul~ could you elaborate me a bit why we want to split the ProcessingTimeService
into two classes~? I'm asking since the remaining methods seems to be similar to registerTimer
, like scheduleWithFixedDelay
. Is it possible we directly move the original ProcessingTimeService
into core~?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here is to decouple the internal ProcessingTimeService
from the one we want to expose. The internal one for example implements ProcessingTimeService#quiesce
which we should not expose to the user.
Regarding the methods, you have mentioned we can migrate them in the future to the public ProcessingTimeService
but currently I do not see the need yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a bit concern since tasks.ProcessingTimeService
is also exposed to users via classes like ProcessingTimeServiceAware
and AbstractStreamOperator
. But also since of that, we indeed have to also keep the tasks.ProcessingTimeService
there, and if we want to break the reverse dependency, we could indeed only either introduce a separate sets of processing timer service or extract a new super interface. So currently let's keep the current option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might further consider it when it is acceptable to do some api-break changes, perhaps like rename the tasks. ProcessingTimeService
.
flink-core/src/main/java/org/apache/flink/api/connector/sink2/Sink.java
Outdated
Show resolved
Hide resolved
* <p>Currently calling this method only logs the error, discards the comittable and | ||
* continues. In the future the behaviour might be configurable. | ||
*/ | ||
void failedWithKnownReason(Throwable t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the methods expected to be called by Committer
? If so would failWithKnownReason
and failWithUnknownReason
be better~?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the term failed
is correct here because it describes the state of the comittable. In the future, we may add a general configuration on how to handle failures i.e. submit to dead letter queue.
Since we cannot really rename the method anymore I already made it "future-proof".
flink-core/src/main/java/org/apache/flink/api/connector/sink2/Committer.java
Outdated
Show resolved
Hide resolved
flink-core/src/main/java/org/apache/flink/api/connector/sink2/SinkWriter.java
Outdated
Show resolved
Hide resolved
flink-core/src/main/java/org/apache/flink/api/connector/sink2/TwoPhaseCommittingSink.java
Outdated
Show resolved
Hide resolved
flink-core/src/main/java/org/apache/flink/api/connector/sink2/TwoPhaseCommittingSink.java
Outdated
Show resolved
Hide resolved
Thanks for the review @gaoyunhaii @alpreu I have addressed all your comments. PTAL. |
146bb76
to
9500281
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very thanks @fapaul for the PR! LGTM~
…ng a IdentityHashMap to track transformations. The already transformed transformation are copied into a different map and compared. If the transformation does not properly implement equals the isTransformed check may fail and the transformation is copied multiple times. Now that is hardened because we check the object reference.
The new interface separates concerns and will make future refactorings and extensions easier. The user immediately which methods needs to be implemented.
* permanently fail after reaching that maximum. Else the committable will be retried as | ||
* long as this method is invoked after each attempt. | ||
*/ | ||
void retryLater(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if I understood it correctly after reading the java doc. Does it mean that this method will be called as long as the maximum is not exceeded? The name retryLater
sounds like an asynch call, Is that your intention? The follow-up question will be how late? Will the time period be controlled by the configuration, since there is no input of this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far none of our sinks sets a number of maximum retries but in the future, we might consider it. The retry mechanism will work internally similar to the current implementation [1]. As soon as the committable is retried we enqueue in the mailbox that is polled "periodically" and retried. Moreover during the next checkpoint, the committable is retried as well.
[1]
Line 96 in dbbf2a3
commitRetrier.retryWithDelay(); |
* | ||
* @return the serializer of the writer's state type. | ||
*/ | ||
SimpleVersionedSerializer<WriterStateT> getWriterStateSerializer(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional has been removed from multiple methods, this is one of them. Could you explain a little more about your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing all these optionals was one of the intentions behind designing the new interfaces. Sink developers can now explicitly decide which functionality they want to support and implement the interfaces accordingly [1]. With the Sink V1 interfaces they basically always had to implement everything except that some of the methods have default implementations.
What is the purpose of the change
This is the first PR of FLIP-191 (https://issues.apache.org/jira/browse/FLINK-25555) it introduces the basic decomposed interfaces that are the replacement for the existing Sink V1 interfaces. The PR only adds the public-facing interfaces and does not implement the stream graph translation yet. It is a follow-up task
Brief change log
Verifying this change
The PR mostly consists of interface additions and file movements.
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (yes / no)Documentation