[FLINK-22902] Port KafkaSink to FLIP-143 #16676

fapaul · 2021-08-02T11:42:27Z

What is the purpose of the change

This commit introduces a new KafkaSink which is based on FLIP-143.

Brief change log

Besides adding the new KafkaSink the PR has the following additional commits.

ca1c2e6 to extract a test utility for finding the latest completed checkpoint of a job
b640ffc introduces a utility to extract needed information from the Sink.InitContext to a SerializationSchema

Verifying this change

The changes are covered by multiple unit tests and also integration tests against a real Kafka cluster.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (yes / no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
The serializers: (yes / no / don't know)
The runtime per-record code paths (performance sensitive): (yes / no / don't know)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
The S3 file system connector: (yes / no / don't know)

Documentation

Does this pull request introduce a new feature? (yes / no)
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

We also plan to add the documentation which is tracked as part of https://issues.apache.org/jira/browse/FLINK-23664

flinkbot · 2021-08-02T11:45:10Z

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 483a1a6 (Sat Aug 28 13:11:08 UTC 2021)

Warnings:

No documentation files were touched! Remember to keep the Flink docs up to date!

_{Mention the bot in a comment to re-run the automated checks.}

Review Progress

❓ 1. The [description] looks good.
❓ 2. There is [consensus] that the contribution should go into to Flink.
❓ 3. Needs [attention] from.
❓ 4. The change fits into the overall [architecture].
❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
@flinkbot approve all to approve all aspects
@flinkbot approve-until architecture to approve everything until architecture
@flinkbot attention @username1 [@username2 ..] to require somebody's attention
@flinkbot disapprove architecture to remove an approval you gave earlier

flinkbot · 2021-08-02T12:31:41Z

CI report:

0fd6c57 UNKNOWN
5f63691 UNKNOWN
96995fe UNKNOWN
483a1a6 Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run travis re-run the last Travis build
@flinkbot run azure re-run the last Azure build

AHeise

Thank you very much for your contribution. The general structure looks good and I'm leaving a first impression of the production code.

...rs/flink-connector-base/src/main/java/org/apache/flink/connector/base/DeliveryGuarantee.java

.../src/main/java/org/apache/flink/streaming/connectors/kafka/sink/DefaultKafkaSinkContext.java

...c/main/java/org/apache/flink/streaming/connectors/kafka/sink/FlinkKafkaInternalProducer.java

...onnector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/sink/KafkaSink.java

flink-connectors/flink-connector-kafka/pom.xml

...r-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/sink/KafkaCommittable.java

...or-kafka/src/test/java/org/apache/flink/streaming/connectors/kafka/sink/KafkaSinkITCase.java

fapaul · 2021-08-04T14:23:27Z

@AHeise thanks for your review. I have addressed all your comments, please have another look.

fapaul · 2021-08-05T06:29:47Z

@flinkbot run azure

AHeise

A more detailed round. I have not deeply looked into the IT but the structure looks good and the covered cases should be sufficient.

flink-tests/src/test/java/org/apache/flink/test/checkpointing/UnalignedCheckpointTestBase.java

...rs/flink-connector-base/src/main/java/org/apache/flink/connector/base/DeliveryGuarantee.java

.../src/main/java/org/apache/flink/streaming/connectors/kafka/sink/DefaultKafkaSinkContext.java

...c/main/java/org/apache/flink/streaming/connectors/kafka/sink/FlinkKafkaInternalProducer.java

...nector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/sink/KafkaWriter.java

AHeise · 2021-08-05T08:45:21Z

...r-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/sink/KafkaWriterState.java

+class KafkaWriterState {
+    private final int subtaskId;
+    private final long transactionalIdOffset;
+    private final String transactionalIdPrefix;


By storing it here, do you effectively allow users to change the prefix even when resuming from checkpoint?

I have to store it here to abort transactions from previous runs. If the job is stopped and started with a new prefix the new one is used for all newly created states.

Okay nice. Do we want to expose it to the user that they may change the prefix or should we communicate that the prefix should remain stable? I'm assuming quite a few edge cases would not work well if a prefix is changed (think of lingering transactions opened before downscaling without recent checkpoint). So I would probably communicate that the prefix is supposed to be stable for now.

Yeah downscaling before checkpoint case is definitely a problem, I can update the doc string to hint that the prefix should remain stable.

...r-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/sink/KafkaSinkBuilder.java

AHeise · 2021-08-05T08:49:19Z

...or-kafka/src/test/java/org/apache/flink/streaming/connectors/kafka/sink/KafkaSinkITCase.java

+     * Exposes information about how man records have been emitted overall and at the beginning of a
+     * checkpoint.
+     */
+    private static final class InfiniteIntegerSource


Can you just use a env.fromSequence(0, Long.MAX_VALUE) with a chained map that implements this functionality? You are making our future lives harder :/

I do not see how I can easily replace it because I am relying on the fact that the Source finishes after the first checkpointCompleted event.

Ah correct. Let's keep it this way then.

...-kafka/src/test/java/org/apache/flink/streaming/connectors/kafka/sink/KafkaWriterITCase.java

AHeise

A few more nits.

AHeise · 2021-08-06T10:26:00Z

...c/main/java/org/apache/flink/api/connector/sink/InitContextInitializationContextAdapter.java

+/**
+ * Adapter between {@link Sink.InitContext} and {@link SerializationSchema.InitializationContext}.
+ */
+public class InitContextInitializationContextAdapter


I see that this is implemented similarly to RuntimeContextSerializationInitializationContextAdapter but has the same flaws.
In general, when implementing a small class it should not rely on a big class to be injected. That makes reuse and testing much harder. In this case, it would be much better to just pass the UserCodeClassLoader and the MetricGroup directly and have no dependency to the InitContext.
Now the metric group should only be added when it's actually needed, so here I would rather use a Supplier<MetricGroup> (it doesn't make sense that the InitContext is passed to this Adapter and then passed back to the mapMetricGroup; the InitContext is by definition available on call-site)

So signature should be

public InitContextInitializationContextAdapter( Supplier<MetricGroup> metricGroupSupplier, UserCodeClassLoader userCodeClassLoader)

Finally we should cache the result from the Supplier most easily by using Supplier#memoize of Guava. I'd probably wrap the ctor parameter before assigning it to the field.

I wonder if this should be in flink-connector-base instead.

I wonder if this should be in flink-connector-base instead.

It can also live in flink-connector-base I just refrained from putting it there because there is no sink-specific code yet and all these adapters are currently also in core.

AHeise · 2021-08-06T10:27:41Z

flink-tests/src/test/java/org/apache/flink/test/util/TestUtils.java

@@ -84,4 +92,32 @@ public static void waitUntilJobInitializationFinished(
                () -> clusterClient.requestJobResult(id).get(),
                userCodeClassloader);
    }
+
+    public static File getMostRecentCompletedCheckpoint(File checkpointDir) throws IOException {
+        return Files.find(checkpointDir.toPath(), 2, TestUtils::isCompletedCheckpoint)


FYI there is a bug here https://issues.apache.org/jira/browse/FLINK-23647. But we would fix it with that ticket.

Which ticket do you mean by that ticket?

AHeise · 2021-08-06T10:33:25Z

...or-kafka/src/test/java/org/apache/flink/streaming/connectors/kafka/sink/KafkaSinkITCase.java

+    private void testRecoveryWithAssertion(
+            DeliveryGuarantee guarantee, java.util.function.Consumer<List<Long>> recordsAssertion)
+            throws Exception {


FYI, there is another pattern that can be used to implement such a wrapping setup/cleanup code.

AutoCloseableResult testRecovery(DeliveryGuarantee guarantee) { // execute common code result = // fetch result AutoCloseable after = () -> { // after code }; return wrap(result, after); }

You can then use the return value in a try-and-resource and add all your assertions in the block. It has huge benefits over your pattern when you have checked exceptions and often is easier on the eye when the auto-formatter went over it.

Thanks for the suggestion, I get the idea. I will try to facilitate it next time or do you want to have the tests refactored?

AHeise · 2021-08-06T10:36:16Z

...a/src/main/java/org/apache/flink/streaming/connectors/kafka/sink/TransactionalIdFactory.java

+     *
+     * @param transactionalIdPrefix prefix for the id
+     * @param subtaskId describing the subtask which is opening the transaction
+     * @param offset an always incrementing number usually capturing the number of checkpoints taken


This could be confused with Kafka offset. Maybe use seq number?

I renamed it to checkpointOffset and I hope the docstring for the parameter explains enough to make it apparent it has nothing to do with the partition offset

AHeise · 2021-08-06T10:38:20Z

...a/src/main/java/org/apache/flink/streaming/connectors/kafka/sink/TransactionalIdFactory.java

+    public static String buildTransactionalId(
+            String transactionalIdPrefix, int subtaskId, long offset) {


Here we could also have an instantiable TransactionalIdFactory with constant transactionalIdPrefix and subtaskId. You could then have a pre-computed subtask prefix consisting of

prefix = sb.append(transactionalIdPrefix) .append(TRANSACTIONAL_ID_DELIMITER) .append(subtaskId) .append(TRANSACTIONAL_ID_DELIMITER) ``` Then this method would just `return prefix + offset`;

Hmm, it means we need to instantiate the factory basically for every transaction. What would be the benefit?

...a/src/main/java/org/apache/flink/streaming/connectors/kafka/sink/TransactionalIdFactory.java

…Schema.InitializationContext

This commit introduces a new KafkaSink which is based on FLIP-143.

AHeise

LGTM. Thank you very much!

rmetzger added the review=description? label Aug 2, 2021

rmetzger added the component=Connectors/Kafka label Aug 2, 2021

fapaul force-pushed the FLINK-23124 branch 2 times, most recently from 22e7bb8 to 0f0ab58 Compare August 2, 2021 15:14

AHeise self-assigned this Aug 3, 2021

AHeise reviewed Aug 3, 2021

View reviewed changes

fapaul force-pushed the FLINK-23124 branch 7 times, most recently from b13e3be to 3b5f480 Compare August 4, 2021 14:20

fapaul force-pushed the FLINK-23124 branch from 3b5f480 to 5916417 Compare August 4, 2021 14:55

AHeise requested changes Aug 5, 2021

View reviewed changes

fapaul force-pushed the FLINK-23124 branch 2 times, most recently from e8c6ec0 to 5f63691 Compare August 6, 2021 09:50

[hotfix][tests] Move getMostRecentCompletedCheckpoint to TestUtils

ca1c2e6

fapaul force-pushed the FLINK-23124 branch from 5f63691 to 055d0fd Compare August 6, 2021 10:07

fapaul marked this pull request as ready for review August 6, 2021 10:11

AHeise reviewed Aug 6, 2021

View reviewed changes

fapaul force-pushed the FLINK-23124 branch from 96995fe to 2d1d736 Compare August 6, 2021 11:48

Fabian Paul added 2 commits August 6, 2021 14:19

[hotfix][core] Add adapter between Sink.InitContext and Serialization…

aa43436

…Schema.InitializationContext

[FLINK-22902][connectors/kafka] Port KafkaSink to FLIP-143

483a1a6

This commit introduces a new KafkaSink which is based on FLIP-143.

fapaul force-pushed the FLINK-23124 branch from 2d1d736 to 483a1a6 Compare August 6, 2021 12:22

AHeise approved these changes Aug 6, 2021

View reviewed changes

AHeise merged commit 8719481 into apache:master Aug 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-22902] Port KafkaSink to FLIP-143 #16676

[FLINK-22902] Port KafkaSink to FLIP-143 #16676

fapaul commented Aug 2, 2021 •

edited

Loading

flinkbot commented Aug 2, 2021 •

edited

Loading

flinkbot commented Aug 2, 2021 •

edited

Loading

AHeise left a comment

fapaul commented Aug 4, 2021

fapaul commented Aug 5, 2021

AHeise left a comment

AHeise Aug 5, 2021

fapaul Aug 6, 2021

AHeise Aug 6, 2021

fapaul Aug 6, 2021

AHeise Aug 5, 2021

fapaul Aug 6, 2021

AHeise Aug 6, 2021

AHeise left a comment

AHeise Aug 6, 2021

AHeise Aug 6, 2021

fapaul Aug 6, 2021

AHeise Aug 6, 2021

fapaul Aug 6, 2021

AHeise Aug 6, 2021

fapaul Aug 6, 2021

AHeise Aug 6, 2021

fapaul Aug 6, 2021 •

edited

Loading

AHeise Aug 6, 2021

fapaul Aug 6, 2021

AHeise left a comment

		public static String buildTransactionalId(
		String transactionalIdPrefix, int subtaskId, long offset) {

[FLINK-22902] Port KafkaSink to FLIP-143 #16676

[FLINK-22902] Port KafkaSink to FLIP-143 #16676

Conversation

fapaul commented Aug 2, 2021 • edited Loading

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

flinkbot commented Aug 2, 2021 • edited Loading

Automated Checks

Review Progress

flinkbot commented Aug 2, 2021 • edited Loading

CI report:

AHeise left a comment

Choose a reason for hiding this comment

fapaul commented Aug 4, 2021

fapaul commented Aug 5, 2021

AHeise left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AHeise left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fapaul Aug 6, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AHeise left a comment

Choose a reason for hiding this comment

fapaul commented Aug 2, 2021 •

edited

Loading

flinkbot commented Aug 2, 2021 •

edited

Loading

flinkbot commented Aug 2, 2021 •

edited

Loading

fapaul Aug 6, 2021 •

edited

Loading