-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-36172][SS] Document session window into Structured Streaming guide doc #33433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc. @viirya @xuanyuanking |
|
Test build #141307 has finished for PR 33433 at commit
|
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't w1 ending at 12:14 overlapping (1min) with w2 starting at 12:15 and w2 ending at 12:20 overlapping (2mins) w3 starting at 12:22 in the gap duration (5mins)?
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc looks okay, just a question about the diagram.
|
Not sure I understand. w1 ends at 12:14 (12:09 + 5 mins) hence doesn't overlap with w2 which starts from 12:15. Vice versa for w2 and w3. |
|
Oh, I misunderstood the diagram. |
|
Maybe we can add a sentence explaining it too? |
|
OK we can probably add explanation for session end, as it looks to be not intuitive enough. Let me work on it. Thanks! |
|
Just updated the figure and also updated the screenshot of the PR description. |
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What structured-streaming.pptx is used for?
|
Looks like Spark project is doing great to source control everything including figures. Figures in doc seem to be created from Powerpoint and each PPTX file contains the figures per corresponding doc page. I also created the new figure from PPTX file and left it there. |
|
Test build #141317 has finished for PR 33433 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
I'll merge this late tomorrow if there's no further comment. |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
| if following input has been received within gap duration. A session window closes when there's no input | ||
| received within gap duration after receiving the latest input. | ||
|
|
||
| Session window uses `session_window` function. The usage of the function is similar to the `window` function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to mention that the session_window function should be a grouping key and should not be the only grouping key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually there's no such restriction, like we have no such restriction for window function. You can use session_window anywhere we expect expression. We just don't calculate any windowing aggregation if the function is not used in groping key, same as we do for window function.
It should not be the only grouping key <= this restriction is only applied to the streaming query. In batch query we allow global window.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yea make sense, I only considered the SS scenario.
xuanyuanking
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Just an open question left.
|
Thanks all for reviewing! Merging to master/3.2 |
…uide doc ### What changes were proposed in this pull request? This PR documents a new feature "native support of session window" into Structured Streaming guide doc. Screenshots are following:   ### Why are the changes needed? This change is needed to explain a new feature to the end users. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Documentation changes. Closes #33433 from HeartSaVioR/SPARK-36172. Authored-by: Jungtaek Lim <kabhwan.opensource@gmail.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com> (cherry picked from commit 0eb31a0) Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
|
Thanks. I merged this into master/3.2. |
…uide doc ### What changes were proposed in this pull request? This PR documents a new feature "native support of session window" into Structured Streaming guide doc. Screenshots are following:   ### Why are the changes needed? This change is needed to explain a new feature to the end users. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Documentation changes. Closes apache#33433 from HeartSaVioR/SPARK-36172. Authored-by: Jungtaek Lim <kabhwan.opensource@gmail.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
What changes were proposed in this pull request?
This PR documents a new feature "native support of session window" into Structured Streaming guide doc.
Screenshots are following:
Why are the changes needed?
This change is needed to explain a new feature to the end users.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Documentation changes.