Skip to content

Conversation

@HeartSaVioR
Copy link
Contributor

@HeartSaVioR HeartSaVioR commented Jul 20, 2021

What changes were proposed in this pull request?

This PR documents a new feature "native support of session window" into Structured Streaming guide doc.

Screenshots are following:

스크린샷 2021-07-20 오후 5 04 20

스크린샷 2021-07-20 오후 3 34 38

Why are the changes needed?

This change is needed to explain a new feature to the end users.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Documentation changes.

@HeartSaVioR
Copy link
Contributor Author

cc. @viirya @xuanyuanking

@github-actions github-actions bot added the DOCS label Jul 20, 2021
@SparkQA
Copy link

SparkQA commented Jul 20, 2021

Test build #141307 has finished for PR 33433 at commit fded124.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't w1 ending at 12:14 overlapping (1min) with w2 starting at 12:15 and w2 ending at 12:20 overlapping (2mins) w3 starting at 12:22 in the gap duration (5mins)?

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc looks okay, just a question about the diagram.

@HeartSaVioR
Copy link
Contributor Author

Not sure I understand. w1 ends at 12:14 (12:09 + 5 mins) hence doesn't overlap with w2 which starts from 12:15. Vice versa for w2 and w3.

@viirya
Copy link
Member

viirya commented Jul 20, 2021

Oh, I misunderstood the diagram.

@viirya
Copy link
Member

viirya commented Jul 20, 2021

Maybe we can add a sentence explaining it too?

@HeartSaVioR
Copy link
Contributor Author

OK we can probably add explanation for session end, as it looks to be not intuitive enough. Let me work on it. Thanks!

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Jul 20, 2021

Just updated the figure and also updated the screenshot of the PR description.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What structured-streaming.pptx is used for?

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Jul 20, 2021

Looks like Spark project is doing great to source control everything including figures. Figures in doc seem to be created from Powerpoint and each PPTX file contains the figures per corresponding doc page. I also created the new figure from PPTX file and left it there.

@SparkQA
Copy link

SparkQA commented Jul 20, 2021

Test build #141317 has finished for PR 33433 at commit 5f625eb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45822/

@SparkQA
Copy link

SparkQA commented Jul 20, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45822/

@HeartSaVioR
Copy link
Contributor Author

I'll merge this late tomorrow if there's no further comment.

@SparkQA
Copy link

SparkQA commented Jul 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45831/

@SparkQA
Copy link

SparkQA commented Jul 20, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45831/

if following input has been received within gap duration. A session window closes when there's no input
received within gap duration after receiving the latest input.

Session window uses `session_window` function. The usage of the function is similar to the `window` function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to mention that the session_window function should be a grouping key and should not be the only grouping key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually there's no such restriction, like we have no such restriction for window function. You can use session_window anywhere we expect expression. We just don't calculate any windowing aggregation if the function is not used in groping key, same as we do for window function.

It should not be the only grouping key <= this restriction is only applied to the streaming query. In batch query we allow global window.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yea make sense, I only considered the SS scenario.

Copy link
Member

@xuanyuanking xuanyuanking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Just an open question left.

@HeartSaVioR
Copy link
Contributor Author

Thanks all for reviewing! Merging to master/3.2

HeartSaVioR added a commit that referenced this pull request Jul 21, 2021
…uide doc

### What changes were proposed in this pull request?

This PR documents a new feature "native support of session window" into Structured Streaming guide doc.

Screenshots are following:

![스크린샷 2021-07-20 오후 5 04 20](https://user-images.githubusercontent.com/1317309/126284848-526ec056-1028-4a70-a1f4-ae275d4b5437.png)

![스크린샷 2021-07-20 오후 3 34 38](https://user-images.githubusercontent.com/1317309/126276763-763cf841-aef7-412a-aa03-d93273f0c850.png)

### Why are the changes needed?

This change is needed to explain a new feature to the end users.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Documentation changes.

Closes #33433 from HeartSaVioR/SPARK-36172.

Authored-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
(cherry picked from commit 0eb31a0)
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
@HeartSaVioR
Copy link
Contributor Author

Thanks. I merged this into master/3.2.

flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
…uide doc

### What changes were proposed in this pull request?

This PR documents a new feature "native support of session window" into Structured Streaming guide doc.

Screenshots are following:

![스크린샷 2021-07-20 오후 5 04 20](https://user-images.githubusercontent.com/1317309/126284848-526ec056-1028-4a70-a1f4-ae275d4b5437.png)

![스크린샷 2021-07-20 오후 3 34 38](https://user-images.githubusercontent.com/1317309/126276763-763cf841-aef7-412a-aa03-d93273f0c850.png)

### Why are the changes needed?

This change is needed to explain a new feature to the end users.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Documentation changes.

Closes apache#33433 from HeartSaVioR/SPARK-36172.

Authored-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants