[FLINK-4027] Flush FlinkKafkaProducer on checkpoints #2108

rmetzger · 2016-06-15T20:08:07Z

A user on the mailing list raised the point that our Kafka producer can be made at-least-once quite easily.
The current producer code doesn't have any guarantees

We are using the producer's callbacks to account for unacknowledged records. When a checkpoint barrier reaches the sink, it will confirm the checkpoint once all pending records have been acked.

eliaslevy · 2016-06-15T21:20:32Z

...a-base/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaProducerBase.java

@@ -51,10 +54,11 @@
 * Flink Sink to produce data into a Kafka topic.
 *
 * Please note that this producer does not have any reliability guarantees.
+ * The producer implements the checkpointed interface for allowing synchronization on checkpoints.


May want to change:

note that this producer does not have any reliability guarantees.

to

note that this producer provides at-least-once reliability guarantees when checkpoints are enabled.

rmetzger · 2016-06-16T09:53:43Z

Thank you for the review @eliaslevy!

zentol · 2016-06-16T10:00:31Z

...afka-0.8/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaProducer08.java

+	protected void flush() {
+		// The Kafka 0.8 producer doesn't support flushing, therefore, we are using an inefficient
+		// busy wait approach
+		while(pendingRecords > 0) {


missing space after while

eliaslevy · 2016-06-16T20:40:42Z

👍

tillrohrmann · 2016-06-22T16:00:05Z

...a-base/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaProducerBase.java

- * Please note that this producer does not have any reliability guarantees.
+ * Please note that this producer provides at-least-once reliability guarantees when
+ * checkpoints are enabled and setFlushOnCheckpoint(true) is set.
+ * Otherwise, the producer doesn't provide any reliability guarantees.


Does it make sense to completely remove the old behaviour and always enable flush on checkpoint? I'm wondering, because who would like to use a KafkaProducer with not processing guarantees?

My reasoning here was that we first provide this as an optional feature to those users who know what they are doing / what they need to give the feature exposure.
I want to be certain that it works in all environments before we activate it by default.

Alright, this makes totally sense

tillrohrmann · 2016-06-22T16:34:10Z

Good work @rmetzger. Well documented code and a good idea to solve the problem. I had some comments concerning concurrent accesses to pendingRecords and test stability on Travis. Once we've solved theses points, I think the PR is good to be merged :-)

rmetzger · 2016-06-30T13:05:32Z

Thank you for your review @tillrohrmann and @zentol . I tried addressing all your concerns.
Please let me know what you think about it.

tillrohrmann · 2016-06-30T13:21:42Z

...-base/src/test/java/org/apache/flink/streaming/connectors/kafka/AtLeastOnceProducerTest.java

+			threadB.join(500);
+		}
+		Assert.assertFalse("Thread A is expected to be finished at this point. If not, the test is prone to fail", threadB.isAlive());
+		if (runnableError.f0 != null) {
 			runnableError.f0.printStackTrace();
 			Assert.fail("Error from thread B: " + runnableError.f0 );


Printing the stack trace to stdout is imo not so good. The problem is that the stack trace will be intermingled with the rest of the testing log output. I think it's better to simply rethrow the Throwable here.

tillrohrmann · 2016-06-30T13:58:57Z

Changes look good to me @rmetzger :-) I had only some minor comments. +1 for merging after addressing the comments.

eliaslevy reviewed Jun 15, 2016
View reviewed changes

rmetzger force-pushed the flink4027 branch from d657ca8 to a58da2e Compare June 16, 2016 07:40

zentol reviewed Jun 16, 2016
View reviewed changes

tillrohrmann reviewed Jun 22, 2016
View reviewed changes

[FLINK-4027] Flush FlinkKafkaProducer on checkpoints

fa0fca1

rmetzger force-pushed the flink4027 branch from 20e05c0 to 588e644 Compare June 30, 2016 12:46

rmetzger added 2 commits June 30, 2016 14:51

Address pull request comments

741c65c

address PR comments

12fc81a

rmetzger force-pushed the flink4027 branch from 588e644 to 12fc81a Compare June 30, 2016 12:51

tillrohrmann reviewed Jun 30, 2016
View reviewed changes

last PR comments addressed

0fceaa9

asfgit closed this in 7206b0e Jul 4, 2016

rmetzger deleted the flink4027 branch July 4, 2016 19:11

tzulitai mentioned this pull request Feb 8, 2017

[FLINK-5701] [kafka] FlinkKafkaProducer should check asyncException on checkpoints #3278

Closed

rmetzger added the component=Connectors/Kafka label Mar 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-4027] Flush FlinkKafkaProducer on checkpoints #2108

[FLINK-4027] Flush FlinkKafkaProducer on checkpoints #2108

rmetzger commented Jun 15, 2016

eliaslevy Jun 15, 2016

rmetzger commented Jun 16, 2016

zentol Jun 16, 2016

eliaslevy commented Jun 16, 2016

tillrohrmann Jun 22, 2016

rmetzger Jun 22, 2016

tillrohrmann Jun 22, 2016

tillrohrmann commented Jun 22, 2016

rmetzger commented Jun 30, 2016

tillrohrmann Jun 30, 2016

tillrohrmann commented Jun 30, 2016

[FLINK-4027] Flush FlinkKafkaProducer on checkpoints #2108

[FLINK-4027] Flush FlinkKafkaProducer on checkpoints #2108

Conversation

rmetzger commented Jun 15, 2016

eliaslevy Jun 15, 2016

Choose a reason for hiding this comment

rmetzger commented Jun 16, 2016

zentol Jun 16, 2016

Choose a reason for hiding this comment

eliaslevy commented Jun 16, 2016

tillrohrmann Jun 22, 2016

Choose a reason for hiding this comment

rmetzger Jun 22, 2016

Choose a reason for hiding this comment

tillrohrmann Jun 22, 2016

Choose a reason for hiding this comment

tillrohrmann commented Jun 22, 2016

rmetzger commented Jun 30, 2016

tillrohrmann Jun 30, 2016

Choose a reason for hiding this comment

tillrohrmann commented Jun 30, 2016