Skip to content

Commit

Permalink
[MINOR][SQL][DOCS] failOnDataLoss has effect on batch queries so fix …
Browse files Browse the repository at this point in the history
…the doc

## What changes were proposed in this pull request?

According to the [Kafka integration document](https://spark.apache.org/docs/2.4.0/structured-streaming-kafka-integration.html) `failOnDataLoss` has effect only on streaming queries. While I was implementing the DSv2 Kafka batch sources I've realized it's not true. This feature is covered in [KafkaDontFailOnDataLossSuite](https://github.com/apache/spark/blob/54da3bbfb2c936827897c52ed6e5f0f428b98e9f/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala#L180).

In this PR I've updated the doc to reflect this behavior.

## How was this patch tested?

```
cd docs/
SKIP_API=1 jekyll build
```
Manual webpage check.

Closes #24932 from gaborgsomogyi/failOnDataLoss.

Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
  • Loading branch information
gaborgsomogyi authored and srowen committed Jun 24, 2019
1 parent 5a7aa6f commit 1a915bf
Showing 1 changed file with 2 additions and 3 deletions.
5 changes: 2 additions & 3 deletions docs/structured-streaming-kafka-integration.md
Expand Up @@ -355,11 +355,10 @@ The following configurations are optional:
<td>failOnDataLoss</td>
<td>true or false</td>
<td>true</td>
<td>streaming query</td>
<td>streaming and batch</td>
<td>Whether to fail the query when it's possible that data is lost (e.g., topics are deleted, or
offsets are out of range). This may be a false alarm. You can disable it when it doesn't work
as you expected. Batch queries will always fail if it fails to read any data from the provided
offsets due to lost data.</td>
as you expected.</td>
</tr>
<tr>
<td>kafkaConsumer.pollTimeoutMs</td>
Expand Down

0 comments on commit 1a915bf

Please sign in to comment.