Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hotfix][kafka][docs] Add warning regarding data losses when writing … #4631

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
24 changes: 22 additions & 2 deletions docs/dev/connectors/kafka.md
Original file line number Diff line number Diff line change
Expand Up @@ -475,8 +475,14 @@ are other constructor variants that allow providing the following:

### Kafka Producers and Fault Tolerance

With Flink's checkpointing enabled, the Flink Kafka Producer can provide
at-least-once delivery guarantees.
#### Kafka 0.8

Before 0.9 Kafka did not provide any mechanisms to guarantee at-least-once or exactly-once semantics.

#### Kafka 0.9 and 0.10

With Flink's checkpointing enabled, the `FlinkKafkaProducer09` and `FlinkKafkaProducer010`
can provide at-least-once delivery guarantees.

Besides enabling Flink's checkpointing, you should also configure the setter
methods `setLogFailuresOnly(boolean)` and `setFlushOnCheckpoint(boolean)` appropriately,
Expand All @@ -499,6 +505,20 @@ we recommend setting the number of retries to a higher value.
**Note**: There is currently no transactional producer for Kafka, so Flink can not guarantee exactly-once delivery
into a Kafka topic.

<div class="alert alert-warning">
<strong>Attention:</strong> Depending on your Kafka configuration, even after Kafka acknowledges
writes you can still experience data losses. In particular keep in mind about following properties
in Kafka config:
<ul>
<li><tt>acks</tt></li>
<li><tt>log.flush.interval.messages</tt></li>
<li><tt>log.flush.interval.ms</tt></li>
<li><tt>log.flush.*</tt></li>
</ul>
Default values for above options are easily prone to data losses. Please refer to Kafka documentation
for more explanation.
</div>

## Using Kafka timestamps and Flink event time in Kafka 0.10

Since Apache Kafka 0.10+, Kafka's messages can carry [timestamps](https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message), indicating
Expand Down