Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-38349][SS] No need to filter events when sessionwindow gapDuration greater than 0 #35680

Closed
wants to merge 15 commits into from

Conversation

nyingping
Copy link
Contributor

@nyingping nyingping commented Feb 28, 2022

What changes were proposed in this pull request?

Static gapDuration on session Window,No need to filter events when sessionwindow gapDuration greater than 0.

Why are the changes needed?

save calculation resources and improve performance.

Does this PR introduce any user-facing change?

No

How was this patch tested?

add new UT and benchmark.

a simple benchmark on [9dae8a5] . thanks again HeartSaVioR@d532b6f.


case 1

spark.range(numOfRow)
      .selectExpr("CAST(id AS timestamp) AS time")
      .select(session_window(col("time"), "10 seconds"))
      .count()

Result:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_291-b10 on Windows 10 10.0
AMD64 Family 23 Model 96 Stepping 1, AuthenticAMD
session windows:                          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
old logic                                          1688           1730          61          5.9         168.8       1.0X
new logic                                            21             26           5        487.3           2.1      82.3X

@github-actions github-actions bot added the SQL label Feb 28, 2022
@nyingping nyingping changed the title Spark 38349 [SPARK-38214]No need to filter events when sessionwindow gapDuration greater than 0 Feb 28, 2022
@nyingping nyingping changed the title [SPARK-38214]No need to filter events when sessionwindow gapDuration greater than 0 [SPARK-38214][SS]No need to filter events when sessionwindow gapDuration greater than 0 Feb 28, 2022
@nyingping
Copy link
Contributor Author

I will check it later.

@nyingping nyingping changed the title [SPARK-38214][SS]No need to filter events when sessionwindow gapDuration greater than 0 [SPARK-38349][SS]No need to filter events when sessionwindow gapDuration greater than 0 Feb 28, 2022
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@HyukjinKwon HyukjinKwon changed the title [SPARK-38349][SS]No need to filter events when sessionwindow gapDuration greater than 0 [SPARK-38349][SS] No need to filter events when sessionwindow gapDuration greater than 0 Mar 2, 2022
@HyukjinKwon
Copy link
Member

cc @HeartSaVioR FYI

@HeartSaVioR
Copy link
Contributor

@nyingping
Now this PR has a small conflict; could you please resolve the conflict? Thanks in advance!

@nyingping
Copy link
Contributor Author

@HeartSaVioR yeah,resolved.thanks for your time!

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Mar 23, 2022

Sorry I have been very busy with handling stuff. I see the rationalization, but it is a bit unclear in the code which cases we want to optimize (the case gapDuration produces > 0 consistently), and the code change does not seem to be intuitive given that the code tries to disassemble with Cast.

@HeartSaVioR
Copy link
Contributor

cc. @viirya since I see your client seems to use session window. For static gap duration this would reduce consistent overhead (filter with comparison between timestamp) per row.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationalization makes sense, but the current approach looks problematic.

@HeartSaVioR
Copy link
Contributor

I guess this is being complicated to also handle the case of dynamic gap with "case when" & literals.
What about just narrowing down the scope to the static gap? This is a majority of use case for session window, and this will simplify the logic a lot - probably need to handle Literal only.

@nyingping
Copy link
Contributor Author

sure,I think it is acceptable that only deal with static gaps will greatly reduce uncertainty.

@nyingping
Copy link
Contributor Author

I've narrowed it down to Static gap duration.
@HeartSaVioR @viirya Could you have a look when you have a chance?Thanks in advance!

@nyingping
Copy link
Contributor Author

@viirya @HeartSaVioR
Thanks for suggestion,Thanks a lot!

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only nits about style

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@HeartSaVioR
Copy link
Contributor

Thanks! Merging to master.

@HeartSaVioR
Copy link
Contributor

Thanks @nyingping for your contribution! I merged this to master.

@nyingping
Copy link
Contributor Author

@HeartSaVioR @viirya Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants