-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-24699] [SS]Make watermarks work with Trigger.Once by saving updated watermark to commit log #21746
Conversation
Test build #92852 has finished for PR 21746 at commit
|
Test build #93301 has finished for PR 21746 at commit
|
Test build #93300 has finished for PR 21746 at commit
|
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@tdas this will not be included in |
@c-horn it's in 2.4.0. I just fixed the ticket. |
Thanks! |
I have the same problem in PySpark using Spark 2.4.0, that is: Streaming queries with watermarks do not work with .trigger(once=True). When can we expect the same fix for PySpark? |
I believe the current fix only fully processes the on-hand data when window aggregating when you run the Trigger.Once twice. It shouldn't matter that you are using pyspark, it is the same streaming code. My initial test case was altered such that only the watermark is updated in the first pass, the second pass will process the data that fell before the watermark. |
Can you provide me with a simple example of the current fix please? |
Can someone expound on this comment? Specifically, how do I programmatically "run the Trigger.once twice"? In its simplest form, I tried this without success:
Any hints would be greatly appreciated... |
What changes were proposed in this pull request?
Streaming queries with watermarks do not work with Trigger.Once because of the following.
The simple solution is to persist the updated watermark value in the commit log when a batch is marked as completed. Then the next batch, in the next trigger.once run can pick it up from the commit log.
How was this patch tested?
new unit tests
Co-authored-by: Tathagata Das tathagata.das1565@gmail.com
Co-authored-by: c-horn chorn4033@gmail.com