New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-38349][SS] No need to filter events when sessionwindow gapDuration greater than 0 #35680
Conversation
I will check it later. |
Can one of the admins verify this patch? |
cc @HeartSaVioR FYI |
@nyingping |
@HeartSaVioR yeah,resolved.thanks for your time! |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
Sorry I have been very busy with handling stuff. I see the rationalization, but it is a bit unclear in the code which cases we want to optimize (the case gapDuration produces > 0 consistently), and the code change does not seem to be intuitive given that the code tries to disassemble with Cast. |
cc. @viirya since I see your client seems to use session window. For static gap duration this would reduce consistent overhead (filter with comparison between timestamp) per row. |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rationalization makes sense, but the current approach looks problematic.
I guess this is being complicated to also handle the case of dynamic gap with "case when" & literals. |
sure,I think it is acceptable that only deal with static gaps will greatly reduce uncertainty. |
I've narrowed it down to Static gap duration. |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/DataFrameSessionWindowingSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/DataFrameSessionWindowingSuite.scala
Outdated
Show resolved
Hide resolved
@viirya @HeartSaVioR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only nits about style
sql/core/src/test/scala/org/apache/spark/sql/DataFrameSessionWindowingSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/DataFrameSessionWindowingSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/DataFrameSessionWindowingSuite.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Thanks! Merging to master. |
Thanks @nyingping for your contribution! I merged this to master. |
@HeartSaVioR @viirya Thanks a lot! |
What changes were proposed in this pull request?
Static gapDuration on session Window,No need to filter events when sessionwindow gapDuration greater than 0.
Why are the changes needed?
save calculation resources and improve performance.
Does this PR introduce any user-facing change?
No
How was this patch tested?
add new UT and benchmark.
a simple benchmark on [9dae8a5] . thanks again HeartSaVioR@d532b6f.
case 1
Result: