From ca3887a0de31fa78097ca7ee92ead914a3ce050c Mon Sep 17 00:00:00 2001
From: Luca Canali <luca.canali@cern.ch>
Date: Mon, 30 Mar 2020 18:00:54 -0700
Subject: [PATCH] [SPARK-30775][DOC] Improve the description of executor
 metrics in the monitoring documentation

### What changes were proposed in this pull request?
This PR (SPARK-30775) aims to improve the description of the executor metrics in the monitoring documentation.

### Why are the changes needed?
Improve and clarify monitoring documentation by:
- adding reference to the Prometheus end point, as implemented in [SPARK-29064]
- extending the list and descripion of executor metrics, following up from [SPARK-27157]

### Does this PR introduce any user-facing change?
Documentation update.

### How was this patch tested?
n.a.

Closes #27526 from LucaCanali/docPrometheusMetricsFollowupSpark29064.

Authored-by: Luca Canali <luca.canali@cern.ch>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit aa98ac52dbbe3fc2d3b152af9324a71f48439a38)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
---
 docs/monitoring.md | 58 ++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 51 insertions(+), 7 deletions(-)
diff --git a/docs/monitoring.md b/docs/monitoring.md
index ba3f1dc86becc..131cd2a844e44 100644
--- a/docs/monitoring.md
+++ b/docs/monitoring.md
@@ -689,31 +689,75 @@ A list of the available metrics, with a short description:
 ### Executor Metrics
 
 Executor-level metrics are sent from each executor to the driver as part of the Heartbeat to describe the performance metrics of Executor itself like JVM heap memory, GC information.
-Executor metric values and their measured peak values per executor are exposed via the REST API at the end point `/applications/[app-id]/executors`.
-In addition, aggregated per-stage peak values of the executor metrics are written to the event log if `spark.eventLog.logStageExecutorMetrics` is true.
-Executor metrics are also exposed via the Spark metrics system based on the Dropwizard metrics library.
+Executor metric values and their measured memory peak values per executor are exposed via the REST API in JSON format and in Prometheus format.
+The JSON end point is exposed at: `/applications/[app-id]/executors`, and the Prometheus endpoint at: `/metrics/executors/prometheus`.
+The Prometheus endpoint is conditional to a configuration parameter: `spark.ui.prometheus.enabled=true` (the default is `false`).
+In addition, aggregated per-stage peak values of the executor memory metrics are written to the event log if
+`spark.eventLog.logStageExecutorMetrics` is true.  
+Executor memory metrics are also exposed via the Spark metrics system based on the Dropwizard metrics library.
 A list of the available metrics, with a short description:
 
 <table class="table">
   <tr><th>Executor Level Metric name</th>
       <th>Short description</th>
   </tr>
+  <tr>
+    <td>rddBlocks</td>
+    <td>RDD blocks in the block manager of this executor.</td>
+  </tr>
+  <tr>
+    <td>memoryUsed</td>
+    <td>Storage memory used by this executor.</td>
+  </tr>
+  <tr>
+    <td>diskUsed</td>
+    <td>Disk space used for RDD storage by this executor.</td>
+  </tr>
+  <tr>
+    <td>totalCores</td>
+    <td>Number of cores available in this executor.</td>
+  </tr>
+  <tr>
+    <td>maxTasks</td>
+    <td>Maximum number of tasks that can run concurrently in this executor.</td>
+  </tr>
+  <tr>
+    <td>activeTasks</td>
+    <td>Number of tasks currently executing.</td>
+  </tr>
+  <tr>
+    <td>failedTasks</td>
+    <td>Number of tasks that have failed in this executor.</td>
+  </tr>
+  <tr>
+    <td>completedTasks</td>
+    <td>Number of tasks that have completed in this executor.</td>
+  </tr>
+  <tr>
+    <td>totalTasks</td>
+    <td>Total number of tasks (running, failed and completed) in this executor.</td>
+  </tr>
+  <tr>
+    <td>totalDuration</td>
+    <td>Elapsed time the JVM spent executing tasks in this executor.
+    The value is expressed in milliseconds.</td>
+  </tr>
   <tr>
     <td>totalGCTime</td>
-    <td>Elapsed time the JVM spent in garbage collection summed in this Executor.
+    <td>Elapsed time the JVM spent in garbage collection summed in this executor.
     The value is expressed in milliseconds.</td>
   </tr>
   <tr>
     <td>totalInputBytes</td>
-    <td>Total input bytes summed in this Executor.</td>
+    <td>Total input bytes summed in this executor.</td>
   </tr>
   <tr>
     <td>totalShuffleRead</td>
-    <td>Total shuffer read bytes summed in this Executor.</td>
+    <td>Total shuffle read bytes summed in this executor.</td>
   </tr>
   <tr>
     <td>totalShuffleWrite</td>
-    <td>Total shuffer write bytes summed in this Executor.</td>
+    <td>Total shuffle write bytes summed in this executor.</td>
   </tr>
   <tr>
     <td>maxMemory</td>

Executor Level Metric name	Short description
rddBlocks	RDD blocks in the block manager of this executor.
memoryUsed	Storage memory used by this executor.
diskUsed	Disk space used for RDD storage by this executor.
totalCores	Number of cores available in this executor.
maxTasks	Maximum number of tasks that can run concurrently in this executor.
activeTasks	Number of tasks currently executing.
failedTasks	Number of tasks that have failed in this executor.
completedTasks	Number of tasks that have completed in this executor.
totalTasks	Total number of tasks (running, failed and completed) in this executor.
totalDuration	Elapsed time the JVM spent executing tasks in this executor. + The value is expressed in milliseconds.
totalGCTime	Elapsed time the JVM spent in garbage collection summed in this Executor. +	Elapsed time the JVM spent in garbage collection summed in this executor. The value is expressed in milliseconds.
totalInputBytes	Total input bytes summed in this Executor.	Total input bytes summed in this executor.
totalShuffleRead	Total shuffer read bytes summed in this Executor.	Total shuffle read bytes summed in this executor.
totalShuffleWrite	Total shuffer write bytes summed in this Executor.	Total shuffle write bytes summed in this executor.
maxMemory