-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23702][SS] Forbid watermarks on both sides of stateful streaming operators. #20859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| val statefulChildren = e.collect { | ||
| case a: Aggregate if a.isStreaming => a | ||
| case d: Deduplicate if d.isStreaming => d | ||
| case f: FlatMapGroupsWithState if f.isStreaming => f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be for joins as well.
| } | ||
| statefulChildren.foreach { statefulNode => | ||
| if (statefulNode.collectFirst{ case e: EventTimeWatermark => e }.isDefined) { | ||
| throwError("Watermarks both before and after a stateful operator in a streaming " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gives the impression that it makes sense but we dont support it. In fact, its just ill-defined. May change this to something like ... Multiple watermarks before and after stateful operators is not well-defined in a streaming query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT of something like "watermarks may not be present...". Talking about "well-defined" seems a bit confusing to me.
| CalendarInterval.fromString("interval 2 seconds"), | ||
| streamRelation))), | ||
| outputMode = Append, | ||
| expectedMsgs = Seq("both before and after")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add for other stateful operators as well.
|
Test build #88383 has finished for PR 20859 at commit
|
|
Test build #88424 has finished for PR 20859 at commit
|
|
Test build #88432 has finished for PR 20859 at commit
|
|
How would we like to handle this patch? I guess we add feature on handling multiple watermarks in #21701 so based on the direction this patch might be going to be abandoned. IMHO I'm not 100% sure we have clear use cases for defining multiple watermarks for one source, so this patch might help on simplify handling watermark. |
|
Test build #97493 has finished for PR 20859 at commit
|
What changes were proposed in this pull request?
Forbid watermarks on both sides of stateful streaming operators.
Multiple sequential watermarks are in general not supported by the execution engine; support is only in parallel, e.g. on both sides of a join. We can normally resolve this by simply picking the topmost watermark operator and ignoring the rest, but this is not semantically valid when there's a stateful operator in between.
How was this patch tested?
new unit test