New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-31270][CORE] Expose executor metrics in task detail table in StageLevel #28061
Conversation
ok to test |
@@ -572,7 +600,7 @@ $(document).ready(function () { | |||
var row1 = createRowMetadataForColumn( | |||
columnKey, taskMetricsResponse[columnKey], 4); | |||
var row2 = createRowMetadataForColumn( | |||
"shuffleWriteTime", taskMetricsResponse[columnKey], 21); | |||
"shuffleWriteTime", taskMetricsResponse[columnKey], 23); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @AngersZhuuuu . If possible, could you try to avoid this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, could you try to a
Since add new column in Additional Metric, the column index is changed. So need to do this
To be honest , I spend a lot of time to make sure the column change when add new column, since the spark web ui code changed a lot, since origin pr in stagepage.js
have not do things like add column.
If it's ok, this pr can be a demo for new develope of spark web UI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the exact reason why I asked that. I've seen several UI patches like this. :)
Since add new column in Additional Metric, the column index is changed. So need to do this
There exists lots of similar commits already for new develop of Spark web UI. Always, the commits(already reviewed and merged) are the best demo for developers.
If it's ok, this pr can be a demo for new develope of spark web UI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There exists lots of similar commits already for new develop of Spark web UI. Always, the commits(already reviewed and merged) are the best demo for developers.
Each page's JS script isn't the same, and I have find that for stage page, don't have merged pr about add new additional column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AngersZhuuuu I found following issues.
- When a job performs shuffle, StagePage for shuffle write stage has a problem.
Even if you visit StagePage where the corresponding stage performs shuffle write and checkShuffle Write Time
,
the metrics is not shown.
- Web UI will be hung-up
When you visit a StagePage where the corresponding stage performs shuffle read and reload, UI will be hung-up.
Those issues are caused because column indices are not changed properly.
To avoid such issues, please keep it simple. I believe column indices need not to be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe,
peakJvmHeapMemory
-> peakJvmOnHeapMemory
?
Peak JVM Heap Memory
-> Peak JVM On Heap Memory
?
cc @sarutak |
Test build #120542 has finished for PR 28061 at commit
|
Sure, make it more clear |
I'll review within a week. |
Test build #120555 has finished for PR 28061 at commit
|
Test build #120556 has finished for PR 28061 at commit
|
Thank you, @sarutak ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inspecting the change now but I have one fundamental question.
Peak JVM On/Off Heap Memory
are Executor metrics and they don't indicate the peak values for each task.
So I wonder it's not helpful that those metrics are displayed for each task.
To explain concretely, I tested this change with following code in spark-shell.
$ bin/spark-shell --master local[1] --conf spark.executor.heartbeatInterval=1s
sc.parallelize(List(1, 2, 3), 3).map { e =>
val context = org.apache.spark.TaskContext.get
if (context.partitionId == 0) {
val dummyData = List.tabulate(Int.MaxValue / 250){i => "Index " + i}
}
Thread.sleep(3000)
e
}.collect
With this code, all the tasks run on the same Executor and task0 is memory consuming while task1 and task2 are not.
Although task1 and task2 should not consume much memory than task0, those Peak JVM On/Off Heap Memory
indicate greater values than those of task0.
This is because those metrics are accumulated within the same JVM.
executorMetrics: Option[ExecutorMetrics] = None): v1.TaskMetrics = { | ||
executorMetrics.foreach { | ||
this.executorMetrics.compareAndUpdatePeakValues | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about executorMetrics.foreach(this.executorMetrics.compareAndUpdatePeakValues)
?
Additional comments:
|
Such as Stage Executor Metrics, it show each executors's peak memory during this stage, but there is still a problem that during this stage, other stage's task will run in same executor too when executor's core num is bigger than 1. We just show each task's running status about JVM, as an SRE, we need to know task running status as many as possible. Any one of these metrics can help the user find problems and provide help. |
O.K, I understand the motivation. |
@@ -327,6 +336,13 @@ $(document).ready(function () { | |||
"should be approximately the sum of the peak sizes across all such data structures created " + | |||
"in this task. For SQL jobs, this only tracks all unsafe operators, broadcast joins, and " + | |||
"external sort."); | |||
$('#peak_jvm_on_heap_memory').attr("data-toggle", "tooltip") | |||
.attr("data-placement", "top") | |||
.attr("title", "Peak Executor JVM on-heap memory usage during this task running"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tasks
might be proper rather than this task
.
.attr("title", "Peak Executor JVM on-heap memory usage during this task running"); | ||
$('#peak_jvm_off_heap_memory').attr("data-toggle", "tooltip") | ||
.attr("data-placement", "top") | ||
.attr("title", "Peak Executor JVM off-heap memory usage during this task running"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as here.
@AngersZhuuuu Sorry to reply late. How about just displaying the max on/off heap memory in the executor page? |
how about this ? #28036 |
ok, I will focus on that pr and follow your advise |
I also prefer that. |
cc @sarutak @gengliangwang Close it |
What changes were proposed in this pull request?
Expose executor metrics in task detail table in StageLevel
Why are the changes needed?
help developer to check app running status.
Does this PR introduce any user-facing change?
Yea, User can see executor's peak memory usage witch this task running on by clock
Additional Metrics
's checkbox ofPeak JVM On/Off Heap Memory
How was this patch tested?
Run will spark-sql