[MINOR][SS] Add some description about auto reset and data loss note to SS doc#31089
[MINOR][SS] Add some description about auto reset and data loss note to SS doc#31089viirya wants to merge 4 commits intoapache:masterfrom
Conversation
| streaming query is started, and that resuming will always pick up from where the query left off. | ||
| streaming query is started, and that resuming will always pick up from where the query left off. Note | ||
| that when the offsets consumed by a streaming application is not in Kafka (e.g., topics are deleted, | ||
| offsets are out of range, or offsets are removed after offset retention period), because the offsets |
There was a problem hiding this comment.
Maybe, do we need to remove because in this sentence?
| streaming query is started, and that resuming will always pick up from where the query left off. Note | ||
| that when the offsets consumed by a streaming application is not in Kafka (e.g., topics are deleted, | ||
| offsets are out of range, or offsets are removed after offset retention period), because the offsets | ||
| are not reset and the streaming application will see data lost. In extream cases, for example the |
sunchao
left a comment
There was a problem hiding this comment.
I'm not familiar with SS, so just trying to help on the grammar. :)
| topics/partitions are dynamically subscribed. Note that `startingOffsets` only applies when a new | ||
| streaming query is started, and that resuming will always pick up from where the query left off. | ||
| streaming query is started, and that resuming will always pick up from where the query left off. Note | ||
| that when the offsets consumed by a streaming application is not in Kafka (e.g., topics are deleted, |
| streaming query is started, and that resuming will always pick up from where the query left off. | ||
| streaming query is started, and that resuming will always pick up from where the query left off. Note | ||
| that when the offsets consumed by a streaming application is not in Kafka (e.g., topics are deleted, | ||
| offsets are out of range, or offsets are removed after offset retention period), the offsets |
There was a problem hiding this comment.
"offset retention period" : not sure if the offset is redundant.
Also, perhaps "the offsets are not reset" -> "they will not be reset".
| streaming query is started, and that resuming will always pick up from where the query left off. Note | ||
| that when the offsets consumed by a streaming application is not in Kafka (e.g., topics are deleted, | ||
| offsets are out of range, or offsets are removed after offset retention period), the offsets | ||
| are not reset and the streaming application will see data lost. In extreme cases, for example the |
There was a problem hiding this comment.
"see data lost" -> "see data loss"
|
Test build #133824 has finished for PR 31089 at commit
|
|
Thanks @dongjoon-hyun @sunchao |
|
Kubernetes integration test starting |
|
Test build #133826 has finished for PR 31089 at commit
|
|
Kubernetes integration test status success |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
cc @HeartSaVioR |
| topics/partitions are dynamically subscribed. Note that `startingOffsets` only applies when a new | ||
| streaming query is started, and that resuming will always pick up from where the query left off. | ||
| streaming query is started, and that resuming will always pick up from where the query left off. Note | ||
| that when the offsets consumed by a streaming application are not in Kafka (e.g., topics are deleted, |
There was a problem hiding this comment.
I feel more natural to say "no longer exist in" instead of "are not in", but as I'm not a native speaker, please treat this as a grain of salt.
There was a problem hiding this comment.
thanks. ok for me. updated.
|
I feel this is better to be added in "failOnDataLoss", instead of "auto reset", but let's hear others' voices. |
|
Test build #133859 has finished for PR 31089 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Merged to master for Apache Spark 3.2.0. |
What changes were proposed in this pull request?
This patch adds a few description to SS doc about offset reset and data loss.
Why are the changes needed?
During recent SS test, the behavior of gradual reducing input rows are confusing me. Comparing with Flink, I do not see a similar behavior. After looking into the code and doing some tests, I feel it is better to add some more description there in SS doc.
Does this PR introduce any user-facing change?
No, doc only.
How was this patch tested?
Doc only.