Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35799][SS] Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec #32952

Closed
wants to merge 2 commits into from

Conversation

vkorukanti
Copy link
Member

What changes were proposed in this pull request?

Fix how we measure the metric allUpdatesTimeMs in FlatMapGroupsWithStateExec similar to other streaming stateful operators.

Why are the changes needed?

Metric allUpdatesTimeMs meant to capture the start to end walltime of the operator FlatMapGroupsWithStateExec, but currently it just captures the iterator creation time.

Fix it to measure similar to how other stateful operators measure. Example one here. This measurement is not perfect due to the nature of the lazy iterator and also includes the time the consumer operator spent in processing the current operator output, but it should give a good signal when comparing the metric in one microbatch to the metric in another microbatch.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing UTs for regression. Due to the nature of metric type (time), it is hard to write a UT, but have manually verified.

@SparkQA
Copy link

SparkQA commented Jun 17, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44469/

@SparkQA
Copy link

SparkQA commented Jun 17, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44469/

@SparkQA
Copy link

SparkQA commented Jun 17, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44475/

@SparkQA
Copy link

SparkQA commented Jun 17, 2021

Test build #139942 has finished for PR 32952 at commit bd16f80.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 17, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44475/

@SparkQA
Copy link

SparkQA commented Jun 18, 2021

Test build #139949 has finished for PR 32952 at commit 6897578.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
Neither (current and proposed change) is accurate by nature, but this looks to be more meaningful.

I'll merge once build passes.

@HeartSaVioR
Copy link
Contributor

retest this, please

@SparkQA
Copy link

SparkQA commented Jun 22, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44633/

@SparkQA
Copy link

SparkQA commented Jun 22, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44633/

@SparkQA
Copy link

SparkQA commented Jun 22, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44636/

@SparkQA
Copy link

SparkQA commented Jun 22, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44636/

@SparkQA
Copy link

SparkQA commented Jun 22, 2021

Test build #140105 has finished for PR 32952 at commit 6897578.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor

Jenkins passed. Thanks! Merging to master.

@HeartSaVioR
Copy link
Contributor

Thanks @vkorukanti for your contribution! I merged into master.
Sorry I forgot to cherry-pick this. Could you please raise a PR for branch-3.1 as well so we can fix there as well?

@SparkQA
Copy link

SparkQA commented Jun 22, 2021

Test build #140108 has finished for PR 32952 at commit 95486e2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

vkorukanti added a commit to vkorukanti/spark that referenced this pull request Jun 22, 2021
…pGroupsWithStateExec

### What changes were proposed in this pull request?

Fix how we measure the metric `allUpdatesTimeMs` in `FlatMapGroupsWithStateExec` similar to other streaming stateful operators.

### Why are the changes needed?

Metric `allUpdatesTimeMs` meant to capture the start to end walltime of the operator `FlatMapGroupsWithStateExec`, but currently it just [captures](https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala#L121) the iterator creation time.

Fix it to measure similar to how other stateful operators measure. Example one [here](https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala#L406). This measurement is not perfect due to the nature of the lazy iterator and also includes the time the consumer operator spent in processing the current operator output, but it should give a good signal when comparing the metric in one microbatch to the metric in another microbatch.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UTs for regression. Due to the nature of metric type (time), it is hard to write a UT, but have manually verified.

Closes apache#32952 from vkorukanti/SPARK-35799.

Authored-by: Venki Korukanti <venki.korukanti@gmail.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants