Skip to content

[SPARK-35896][SS] Include more granular metrics for stateful operators in StreamingQueryProgress#33091

Closed
vkorukanti wants to merge 3 commits intoapache:masterfrom
vkorukanti:SPARK-35896
Closed

[SPARK-35896][SS] Include more granular metrics for stateful operators in StreamingQueryProgress#33091
vkorukanti wants to merge 3 commits intoapache:masterfrom
vkorukanti:SPARK-35896

Conversation

@vkorukanti
Copy link
Member

@vkorukanti vkorukanti commented Jun 25, 2021

What changes were proposed in this pull request?

Currently the StateOperatorProgress in StreamingQueryProgress is missing few metrics.

Why are the changes needed?

The main motivation is find hotspots and have better visibility in the stateful operations. Detailed explanations are in SPARK-35896.

Does this PR introduce any user-facing change?

Yes. The StateOperatorProgress entries within StreamingQueryProgress now contain additional fields as listed in SPARK-35896. Example StreamingQueryProgress output in JSON form.
Before:

{

  "id" : "510be3cd-a955-4faf-8456-d97c78d39af5",
  ....
  "durationMs" : {
    "triggerExecution" : 2856,
    ....
  },
  "stateOperators" : [ {
    "numRowsTotal" : 1,
    "numRowsUpdated" : 1,
    "numRowsDroppedByWatermark" : 0,
    "customMetrics" : {
      "loadedMapCacheHitCount" : 0,
      "loadedMapCacheMissCount" : 0,
      "stateOnCurrentVersionSizeBytes" : 392
    }
  }],
  ....
}

After:

{
  "id" : "510be3cd-a955-4faf-8456-d97c78d39af5",
  ....
  "durationMs" : {
    "triggerExecution" : 2856,
    ....
  },
  "stateOperators" : [ {
    "operatorName" : "dedupe", <-- new
    "numRowsTotal" : 1,
    "numRowsUpdated" : 1, <-- new
    "allUpdatesTimeMs" : 56, <-- new
    "numRowsRemoved" : 2, <-- new
    "allRemovalsTimeMs" : 45, <-- new
    "commitTimeMs" : 40, <-- new
    "numRowsDroppedByWatermark" : 0,
    "numShufflePartitions" : 2, <-- new
    "numStateStoreInstances" : 2, <-- new
    "customMetrics" : {
      "loadedMapCacheHitCount" : 0,
      "loadedMapCacheMissCount" : 0,
      "stateOnCurrentVersionSizeBytes" : 392
    }
  }],
  ....
}

How was this patch tested?

Existing tests for regressions. Added new UTs.

@SparkQA
Copy link

SparkQA commented Jun 25, 2021

Test build #140332 has finished for PR 33091 at commit 2788de7.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 25, 2021

Test build #140334 has finished for PR 33091 at commit 2b1d758.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 25, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44864/

@SparkQA
Copy link

SparkQA commented Jun 25, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44864/

@vkorukanti vkorukanti changed the title [WIP][SPARK-35896][SS] Include more granular metrics for stateful operators in StreamingQueryProgress [SPARK-35896][SS] Include more granular metrics for stateful operators in StreamingQueryProgress Jun 28, 2021
Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round.

@SparkQA
Copy link

SparkQA commented Jun 29, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44898/

@SparkQA
Copy link

SparkQA commented Jun 30, 2021

Test build #140367 has finished for PR 33091 at commit ffbf818.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 pending build.

@SparkQA
Copy link

SparkQA commented Jun 30, 2021

Test build #140417 has finished for PR 33091 at commit c3f8a0d.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor

retest this, please

@SparkQA
Copy link

SparkQA commented Jun 30, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44932/

@HeartSaVioR
Copy link
Contributor

GA passed. Thanks! Merging to master.

@SparkQA
Copy link

SparkQA commented Jun 30, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44932/

@SparkQA
Copy link

SparkQA commented Jun 30, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44936/

@SparkQA
Copy link

SparkQA commented Jun 30, 2021

Test build #140421 has finished for PR 33091 at commit c3f8a0d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please confirm this change in the following PR.

val memoryUsedBytes: Long,
val numRowsDroppedByWatermark: Long,
val numShufflePartitions: Long,
val numStateStoreInstances: Long,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is detected as a binary incompatibility. It will be okay because this is Evolving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants