Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34898][CORE] We should log SparkListenerExecutorMetricsUpdateEvent of driver appropriately when spark.eventLog.logStageExecutorMetrics is true #31992

Closed
wants to merge 8 commits into from

Conversation

AngersZhuuuu
Copy link
Contributor

@AngersZhuuuu AngersZhuuuu commented Mar 29, 2021

What changes were proposed in this pull request?

In current EventLoggingListener, we won't write SparkListenerExecutorMetricsUpdate message to event log file at all

override def onExecutorMetricsUpdate(event: SparkListenerExecutorMetricsUpdate): Unit = {
  if (shouldLogStageExecutorMetrics) {
    event.executorUpdates.foreach { case (stageKey1, newPeaks) =>
      liveStageExecutorMetrics.foreach { case (stageKey2, metricsPerExecutor) =>
        // If the update came from the driver, stageKey1 will be the dummy key (-1, -1),
        // so record those peaks for all active stages.
        // Otherwise, record the peaks for the matching stage.
        if (stageKey1 == DRIVER_STAGE_KEY || stageKey1 == stageKey2) {
          val metrics = metricsPerExecutor.getOrElseUpdate(
            event.execId, new ExecutorMetrics())
          metrics.compareAndUpdatePeakValues(newPeaks)
        }
      }
    }
  }
}

In history server's restful API about executor, we can get Executor's metrics but can't get all driver's metrics. Executor's executor metrics can be updated with TaskEnd event etc...

So in this pr, I add support to log SparkListenerExecutorMetricsUpdateEvent of driver when spark.eventLog.logStageExecutorMetrics is true.

Why are the changes needed?

Make user can got driver's peakMemoryMetrics in SHS.

Does this PR introduce any user-facing change?

user can got driver's executor metrics in SHS's restful API.

How was this patch tested?

Mannul test

@SparkQA
Copy link

SparkQA commented Mar 29, 2021

Test build #136645 has finished for PR 31992 at commit 58ace57.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@github-actions github-actions bot added the CORE label Mar 29, 2021
@HyukjinKwon
Copy link
Member

cc @Ngone51 FYI

@SparkQA
Copy link

SparkQA commented Mar 29, 2021

Test build #136646 has finished for PR 31992 at commit 936eefd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 29, 2021

Test build #136653 has started for PR 31992 at commit 7054b58.

@SparkQA
Copy link

SparkQA commented Mar 29, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41227/

@SparkQA
Copy link

SparkQA commented Mar 29, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41227/

@SparkQA
Copy link

SparkQA commented Mar 29, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41228/

@SparkQA
Copy link

SparkQA commented Mar 29, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41228/

@SparkQA
Copy link

SparkQA commented Mar 29, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41235/

@SparkQA
Copy link

SparkQA commented Mar 29, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41235/

Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is consuming this event in spark ?
Btw, if all you want is monitor jvm stats (outside of spark ui), use codahale integration instead ?

@@ -249,6 +249,9 @@ private[spark] class EventLoggingListener(
}

override def onExecutorMetricsUpdate(event: SparkListenerExecutorMetricsUpdate): Unit = {
if (event.execId == SparkContext.DRIVER_IDENTIFIER) {
logEvent(event)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do this only when shouldLogStageExecutorMetrics is enabled ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do this only when shouldLogStageExecutorMetrics is enabled ?

I don't thinks it should be controlled by shouldLogStageExecutorMetrics . since driver's metrics is not related to stage executor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we have a single event for both driver and executor metrics update - differentiated by exec id.
I dont have strong opinions on this, but if we have a flag (shouldLogStageExecutorMetrics) controlling whether metrics are to be updated, we should consistently apply it IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we have a single event for both driver and executor metrics update - differentiated by exec id.
I dont have strong opinions on this, but if we have a flag (shouldLogStageExecutorMetrics) controlling whether metrics are to be updated, we should consistently apply it IMO.

@mridulm Follow this comment, how about current.

@AngersZhuuuu
Copy link
Contributor Author

Who is consuming this event in spark ?
Btw, if all you want is monitor jvm stats (outside of spark ui), use codahale integration instead ?

AppStatusListener will consumer this

 override def onExecutorMetricsUpdate(event: SparkListenerExecutorMetricsUpdate): Unit = {

I know spark metrics frame work can have a monitor on this. But when we want to build an application analysis system like Dr.elephant. We want to get app status from restful api, right? Driver's memory usage is an important index too.

@SparkQA
Copy link

SparkQA commented Mar 30, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41262/

@SparkQA
Copy link

SparkQA commented Mar 30, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41262/

@SparkQA
Copy link

SparkQA commented Mar 30, 2021

Test build #136680 has finished for PR 31992 at commit 02ad9de.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

ping @mridulm Any more update?

@mridulm
Copy link
Contributor

mridulm commented Apr 5, 2021

As I understood based on description and proposed changes, the right level of integration for this would be with metrics subsystem.. While we can do it via other means, it is not optimal to do so.
Having said that, I dont have strong opinions on this ... will let some more involved with this comment better.

@AngersZhuuuu
Copy link
Contributor Author

AngersZhuuuu commented Apr 7, 2021

As I understood based on description and proposed changes, the right level of integration for this would be with metrics subsystem.. While we can do it via other means, it is not optimal to do so.
Having said that, I dont have strong opinions on this ... will let some more involved with this comment better.

Yea, also ping @srowen @Ngone51 @gengliangwang

@srowen
Copy link
Member

srowen commented Apr 7, 2021

I don't know enough to have an opinion on this. I think the key questions are - what is the most consistent thing to do, and, are there any performance problems with adding this information to events?

@SparkQA
Copy link

SparkQA commented Apr 19, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42129/

@SparkQA
Copy link

SparkQA commented Apr 19, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42129/

@SparkQA
Copy link

SparkQA commented Apr 19, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42136/

@SparkQA
Copy link

SparkQA commented Apr 19, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42136/

@SparkQA
Copy link

SparkQA commented Apr 19, 2021

Test build #137559 has finished for PR 31992 at commit 02ad9de.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 19, 2021

Test build #137583 has finished for PR 31992 at commit 02ad9de.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

@AngersZhuuuu BTW did you disable GA in your fork repo? It should be enabled so PR leverage the GA resources in your forked repo.

@HyukjinKwon
Copy link
Member

cc @HeartSaVioR too FYI

@AngersZhuuuu
Copy link
Contributor Author

@AngersZhuuuu, mind making the PR description disambiguous? what's "driver executor peakMemoryMetrics"?

How about current

@AngersZhuuuu
Copy link
Contributor Author

@AngersZhuuuu BTW did you disable GA in your fork repo? It should be enabled so PR leverage the GA resources in your forked repo.

No, but this pr is a little long.

@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Test build #139843 has finished for PR 31992 at commit 87a079e.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44365/

@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44371/

@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44370/

@AngersZhuuuu
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44371/

@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44370/

@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Test build #139837 has finished for PR 31992 at commit c9a4a67.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Test build #139842 has finished for PR 31992 at commit 6c81e2d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44379/

@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44378/

@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Test build #139850 has finished for PR 31992 at commit 87a079e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44379/

@AngersZhuuuu
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44378/

@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Test build #139866 has finished for PR 31992 at commit 87a079e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44396/

@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44396/

@AngersZhuuuu
Copy link
Contributor Author

Any more suggestion?

@HyukjinKwon
Copy link
Member

I'll leave it to @mridulm and @Ngone51

@asfgit asfgit closed this in 79362c4 Jun 17, 2021
@mridulm
Copy link
Contributor

mridulm commented Jun 17, 2021

Merging to master, thanks for working on this and pushing this through @AngersZhuuuu !

Thanks for the reviews @HyukjinKwon, @srowen !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants