[SPARK-24958][WIP] Report executors' process tree total memory information to heartbeat signals #21916

rezasafi · 2018-07-30T14:53:22Z

This is a work in progress for SPARK-24958 and this PR is opened on top of the PR for SPARK-23429:
#21221
To view the changes that are only related to SPARK-24958 you can check the following view:
rezasafi#1
Spark executors' process tree total memory information can be really useful. Currently such information are not available. The goal of this PR is to compute such information for each executor, add these information to the heartbeat signals, and compute the peaks at the driver.

This PR is tested by running the current unit tests and the ones that are added by the PR for SPARK-23429. I have also tested this on our internal cluster and have verified that it is working.

…xecutors REST API Add new executor level memory metrics (JVM used memory, on/off heap execution memory, on/off heap storage memory), and expose via the executors REST API. This information will help provide insight into how executor and driver JVM memory is used, and for the different memory regions. It can be used to help determine good values for spark.executor.memory, spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction. Add an ExecutorMetrics class, with jvmUsedMemory, onHeapExecutionMemory, offHeapExecutionMemory, onHeapStorageMemory, and offHeapStorageMemory. The new ExecutorMetrics will be sent by executors to the driver as part of Heartbeat. A heartbeat will be added for the driver as well, to collect these metrics for the driver. Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there is a new peak value for any of the memory metrics for an executor and stage. Only the ExecutorMetrics will be logged, and not the TaskMetrics, to minimize additional logging. Modify the AppStatusListener to record the peak values for each memory metric. Add the new memory metrics to the executors REST API.

…etricsUpdate

…enabled to enable/disable executor metrics update logging. Code review comments.

Metric enums

… move logic for getting metrics to Heartbeater), and modifiy tests for the new ExecutorMetrics format.

…xecutors REST API Add new executor level memory metrics (JVM used memory, on/off heap execution memory, on/off heap storage memory), and expose via the executors REST API. This information will help provide insight into how executor and driver JVM memory is used, and for the different memory regions. It can be used to help determine good values for spark.executor.memory, spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction. Add an ExecutorMetrics class, with jvmUsedMemory, onHeapExecutionMemory, offHeapExecutionMemory, onHeapStorageMemory, and offHeapStorageMemory. The new ExecutorMetrics will be sent by executors to the driver as part of Heartbeat. A heartbeat will be added for the driver as well, to collect these metrics for the driver. Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there is a new peak value for any of the memory metrics for an executor and stage. Only the ExecutorMetrics will be logged, and not the TaskMetrics, to minimize additional logging. Modify the AppStatusListener to record the peak values for each memory metric. Add the new memory metrics to the executors REST API.

…etricsUpdate

…enabled to enable/disable executor metrics update logging. Code review comments.

… move logic for getting metrics to Heartbeater), and modifiy tests for the new ExecutorMetrics format.

…o SPARK-23429.2

- remove timestamp - change ExecutorMetrics to Array[Long] - create new SparkListenerStageExecutorMetrics for recording stage executor metric peaks in the history log Fix issue where metrics for a removed executor were ignored (save dead executors while there currently active stages that the executor was alive for).

…nerExecutorMetricsUpdate not optional. These are no longer logged, and backward compatibility should not be an issue. These events should only be used to send task and executor updates for heartbeats, and executors and driver should be the same Spark version.

… optional again, in case of existing users of SparkListenerExecutorMetricsUpdate.

…nd add ExecutorMetrics, with getMetricValue() method for accessing executor metric values. Rename MetricGetter to ExecutorMetricType. Should ExecutorMetricType be moved to executor package, or ExecutorMetrics be moved to metrics package? Should Json (de)serialization functions be moved from api.scala to ExecutorMetrics?

holdensmagicalunicorn · 2018-07-30T14:53:27Z

@rezasafi, thanks! I am a bot who has found some folks who might be able to help with the review:@JoshRosen, @vanzin and @pwendell

AmplabJenkins · 2018-07-30T14:58:03Z

Can one of the admins verify this patch?

ankuriitg

Overall LGTM. Some minor comments about code structure.

ankuriitg · 2018-07-30T17:52:17Z