Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion docs/structured-streaming-kafka-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -878,7 +878,13 @@ group id, however, please read warnings for this option and use it with caution.
where to start instead. Structured Streaming manages which offsets are consumed internally, rather
than rely on the kafka Consumer to do it. This will ensure that no data is missed when new
topics/partitions are dynamically subscribed. Note that `startingOffsets` only applies when a new
streaming query is started, and that resuming will always pick up from where the query left off.
streaming query is started, and that resuming will always pick up from where the query left off. Note
that when the offsets consumed by a streaming application no longer exist in Kafka (e.g., topics are deleted,
offsets are out of range, or offsets are removed after retention period), the offsets will not be reset
and the streaming application will see data loss. In extreme cases, for example the throughput of the
streaming application cannot catch up the retention speed of Kafka, the input rows of a batch might be
gradually reduced until zero when the offset ranges of the batch are completely not in Kafka. Enabling
`failOnDataLoss` option can ask Structured Streaming to fail the query for such cases.
- **key.deserializer**: Keys are always deserialized as byte arrays with ByteArrayDeserializer. Use
DataFrame operations to explicitly deserialize the keys.
- **value.deserializer**: Values are always deserialized as byte arrays with ByteArrayDeserializer.
Expand Down