From 1a915bf20f75682b11765ccb9e885f47adaaf2d0 Mon Sep 17 00:00:00 2001 From: Gabor Somogyi Date: Sun, 23 Jun 2019 19:23:57 -0500 Subject: [PATCH] [MINOR][SQL][DOCS] failOnDataLoss has effect on batch queries so fix the doc ## What changes were proposed in this pull request? According to the [Kafka integration document](https://spark.apache.org/docs/2.4.0/structured-streaming-kafka-integration.html) `failOnDataLoss` has effect only on streaming queries. While I was implementing the DSv2 Kafka batch sources I've realized it's not true. This feature is covered in [KafkaDontFailOnDataLossSuite](https://github.com/apache/spark/blob/54da3bbfb2c936827897c52ed6e5f0f428b98e9f/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala#L180). In this PR I've updated the doc to reflect this behavior. ## How was this patch tested? ``` cd docs/ SKIP_API=1 jekyll build ``` Manual webpage check. Closes #24932 from gaborgsomogyi/failOnDataLoss. Authored-by: Gabor Somogyi Signed-off-by: Sean Owen --- docs/structured-streaming-kafka-integration.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/docs/structured-streaming-kafka-integration.md b/docs/structured-streaming-kafka-integration.md index bbff82259e56f..d5224da2cf3f7 100644 --- a/docs/structured-streaming-kafka-integration.md +++ b/docs/structured-streaming-kafka-integration.md @@ -355,11 +355,10 @@ The following configurations are optional: failOnDataLoss true or false true - streaming query + streaming and batch Whether to fail the query when it's possible that data is lost (e.g., topics are deleted, or offsets are out of range). This may be a false alarm. You can disable it when it doesn't work - as you expected. Batch queries will always fail if it fails to read any data from the provided - offsets due to lost data. + as you expected. kafkaConsumer.pollTimeoutMs