Skip to content

Commit

Permalink
Explain the doc
Browse files Browse the repository at this point in the history
  • Loading branch information
HeartSaVioR committed May 22, 2020
1 parent c30d458 commit 9cc0443
Showing 1 changed file with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions docs/structured-streaming-programming-guide.md
Expand Up @@ -1674,6 +1674,16 @@ Any of the stateful operation(s) after any of below stateful operations can have
As Spark cannot check the state function of `mapGroupsWithState`/`flatMapGroupsWithState`, Spark assumes that the state function
emits late rows if the operator uses Append mode.

Spark provides two ways to check the number of late rows on stateful operators which would help you identify the issue:

1. On Spark UI: check the metrics in stateful operator nodes in query execution details page in SQL tab
2. On Streaming Query Listener: check "numLateInputs" in "stateOperators" in QueryProcessEvent.

Please note that the definition of "input" is relative: it doesn't always mean "input rows" for the operator.
Streaming aggregation does pre-aggregate input rows and checks the late inputs against pre-aggregated inputs,
hence the number is not same as the number of original input rows. You'd like to check the fact whether the value is zero
or non-zero.

There's a known workaround: split your streaming query into multiple queries per stateful operator, and ensure
end-to-end exactly once per query. Ensuring end-to-end exactly once for the last query is optional.

Expand Down

0 comments on commit 9cc0443

Please sign in to comment.