Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31081][UI][SQL] Make the display of stageId/stageAttemptId/taskId of sql metrics configurable in UI #27849

Closed
wants to merge 4 commits into from

Conversation

Ngone51
Copy link
Member

@Ngone51 Ngone51 commented Mar 8, 2020

What changes were proposed in this pull request?

This PR adds a conf to make the display of stageId/StageAttemptId/taskId of SQL metrics configurable in UI. And also propose to not display them by default.

Why are the changes needed?

It makes metrics harder to read after #26843 and user may not interest in extra info(stageId/StageAttemptId/taskId ) when they do not need debug. So, I think we need to a way to make the display configurable.

Does this PR introduce any user-facing change?

Yes.

Before this PR:

before

After this PR without stageId/StageAttemptId/taskId:
wihoutId

How was this patch tested?

Updated SQLMetricsSuite and tested manually.

@Ngone51
Copy link
Member Author

Ngone51 commented Mar 8, 2020

ping @cloud-fan @gengliangwang @nartal1 Please take a look, thanks!

@SparkQA
Copy link

SparkQA commented Mar 8, 2020

Test build #119534 has finished for PR 27849 at commit 6f61906.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @Ngone51 .
The PR title Make SQLMetrics more readable from UI sounds too broad. Could you revise more specifically?

@Ngone51 Ngone51 changed the title [SPARK-31081][UI][SQL] Make SQLMetrics more readable from UI [SPARK-31081][UI][SQL] Make metrics in SparkPlanGraphNode more readable from Spark UI Mar 9, 2020
acc
}

def createNanoTimingMetric(sc: SparkContext, name: String): SQLMetric = {
// Same with createTimingMetric, just normalize the unit of time to millisecond.
val acc = new SQLMetric(NS_TIMING_METRIC, -1)
acc.register(sc, name = Some(s"$name total (min, med, max (stageId (attemptId): taskId))"),
countFailedValues = false)
acc.register(sc, name = Some(s"name"), countFailedValues = false)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: fix "name"

@Ngone51 Ngone51 changed the title [SPARK-31081][UI][SQL] Make metrics in SparkPlanGraphNode more readable from Spark UI [SPARK-31081][UI][SQL] Make the display of stageId/stageAttemptId/taskId of sql metrics configurable in UI Mar 11, 2020
@dongjoon-hyun
Copy link
Member

Thank you for narrowing down title, @Ngone51 .

buildStaticConf("spark.sql.ui.displayTaskInfoForMaxMetric")
.doc("If turn on, Spark will display stageId-stageAttemptId-taskId of the max metrics to " +
"tell where the max value comes from. It's useful to help debug job quicker.")
.version("3.0.0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ngone51 .
Since SPARK-31081 is filed as an Improvement, this should be 3.1.0.
If you want to consider this as a bug or regression at 3.0.0, you had better change JIRA first.

cc @rxin since he is the release manager of 3.0.0 (also cc @gatorsmile ).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a UI issue to me. The SQL web UI is really hard to read now with the stage id stuff. Ideally, we should revisit #26843 and think of a better way to improve readability. But no one has proposed an idea yet.

This PR disables the stage id stuff, which seems like a good compromise for 3.0: we keep the UI unchanged, but for people who really need the stage id info, they can still enable it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for making this 3.0. The UI changes in #26843 can be confusing to users after 3.0 release.

@@ -116,8 +117,8 @@ object SQLMetrics {
// data size total (min, med, max):
// 100GB (100MB, 1GB, 10GB)
val acc = new SQLMetric(SIZE_METRIC, -1)
acc.register(sc, name = Some(s"$name total (min, med, max (stageId (attemptId): taskId))"),
countFailedValues = false)
acc.register(sc, name = Some(s"$name total (min, med, ${attachTaskId("max", sc.conf)})"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new indentation looks incorrect at this line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also noted that, thanks.

@@ -233,4 +228,13 @@ object SQLMetrics {
SparkListenerDriverAccumUpdates(executionId.toLong, metrics.map(m => m.id -> m.value)))
}
}

private def attachTaskId(name: String, conf: SparkConf): String = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are attaching stageId and attemptId, too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I know. I'm thinking about a better name, while adding stageId/attemptId makes me feel too long for the method name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attachTaskInfo? stage id is part of the task info which tells which stage the task comes from.

@SparkQA
Copy link

SparkQA commented Mar 11, 2020

Test build #119674 has finished for PR 27849 at commit a41d9c4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -135,6 +135,14 @@ object StaticSQLConf {
.intConf
.createWithDefault(1000)

val DISPLAY_TASK_ID_FOR_MAX_METRIC =
buildStaticConf("spark.sql.ui.displayTaskInfoForMaxMetric")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we make it an internal conf?

@gengliangwang
Copy link
Member

cc @tgravescs as well

@tgravescs
Copy link
Contributor

Sorry, I disagree. I understand that it adds more clutter and am definitely open to improvements, but I think it does bring valuable information that once people realize is there will use it. This entire ui is for debugging purposes so I don't get your argument with that. If a user isn't actively looking to see what went on with their job why would they go to this UI at all.
If you have a better way to map these SQL boxes to the stage it ran in, please let me know?
This also quickly tells you which task was the slowest to aid in debugging further.

Can we leave on by default or don't display and have a checkbox like we have on other tables? (disclaimer - I haven't looked how hard that would be in code to add the checkbox)

@tgravescs
Copy link
Contributor

BTW thanks @gengliangwang for adding me, it would have been nice if I was added right away since I added this feature that you are now trying to take back out.
I really hope in the future with all the things being reverted that I see we copy the original authors and reviewers.

@Ngone51
Copy link
Member Author

Ngone51 commented Mar 13, 2020

This entire ui is for debugging purposes so I don't get your argument with that. If a user isn't actively looking to see what went on with their job why would they go to this UI at all.

I agree that UI is mostly for debugging purposes. But please also note that this is living UI(rather than SHS UI). And living UI may shutdown at any time before user figure out the root cause (except those long running applications). So, personally, I aways need to restart and insert Thread.sleep when I need to debug on living UI in case it shutdown before I figure out the root cause. So, I think user could always turn on spark.sql.ui.displayTaskInfoForMaxMetric when they realize they need to restart and debug on living UI.

And if they really want to see those ids at any time, they can just turn it on always.

Or if we can have better way to display ids, then we can turn it on by default. And here's my previous propose to display ids in case you would like it:

ids

I really hope in the future with all the things being reverted that I see we copy the original authors and reviewers.

Sorry, this is my bad. I'll take care in the future.

@cloud-fan
Copy link
Contributor

The UI is mostly for debugging but the stage id stuff is not always needed. At least this has been distracting me multiple times when I try to debug something with the SQL UI.

The new proposal from @Ngone51 looks good, it's much less distracting to have a short (1.0:14) at the end. @tgravescs If you don't like it, I think we at least should have this config so that people can disable it.

@gengliangwang
Copy link
Member

@tgravescs as @gatorsmile and I mentioned in #26843 (comment), the new change is not readable. I prefer to hide the slowest task as well.

Can we leave on by default or don't display and have a checkbox like we have on other tables? (disclaimer - I haven't looked how hard that would be in code to add the checkbox)

I like the idea of "don't display and have a checkbox like we have on other tables". @Ngone51 will you consider that?

@gengliangwang
Copy link
Member

I checked with @Ngone51 offline. He is fine with the checkbox idea. But he is not very familiar with the front-end work. I am also busy with other priorities recently.

@sarutak @HeartSaVioR would you like to take this task (create a checkbox for hild/show the max stage/stage attamp/task id in the graph) over?

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Mar 13, 2020

Thanks for pinging me and I'm sorry I'm not familiar with front-end work as well - that's why I only had to skim the front-end side while reviewing structured streaming UI PR.

@tgravescs
Copy link
Contributor

the problem with a config is that if you turn it off, especially by default most users won't see it to know its even an option.
I'd much prefer the checkbox type approach (note again I haven't looked to see how complicated is with these boxes).
I'm also fine with the other format (0:0:14) as long as the legend if obvious on what it is. could have a hover there or something as well.

Perhaps I should ask everyone this - when you were looking at the UI, why were you looking at it? What values in there were you using for instance? A lot of people have asked me ok, I did this groupby or join and I see it in that box, but how do I know what stage that ran in..... this at least helps with that question. So how do you do that now? my previous answer was you had to click through a lot of jobs/stages to figure it out, which is very time consuming.

@sarutak
Copy link
Member

sarutak commented Mar 16, 2020

@gengliangwang I'll try to add the checkbox.

gengliangwang pushed a commit that referenced this pull request Mar 24, 2020
…of sql metrics toggleable

### What changes were proposed in this pull request?

This is another solution for `SPARK-31081` and #27849 .
I added a checkbox which can toggle display of stageId/taskid in the SQL's DAG page.
Mainly, I implemented the toggleable texts in boxes with HTML label feature provided by `dagre-d3`.
The additional metrics are enclosed by `<span>` and control the appearance of the text.
But the exception is additional metrics in clusters.
We can use HTML label for cluster but layout will be broken so I choosed normal text label for clusters.
Due to that, this solution contains a little bit tricky code in`spark-sql-viz.js` to manipulate the metric texts and generate DOMs.

### Why are the changes needed?

It makes metrics harder to read after #26843 and user may not interest in extra info(stageId/StageAttemptId/taskId ) when they do not need debug.
#27849 control the appearance by a new configuration property but providing a checkbox is more flexible.

### Does this PR introduce any user-facing change?

Yes.
[Additional metrics shown]
![with-checked](https://user-images.githubusercontent.com/4736016/77244214-0f6cd780-6c56-11ea-9275-a30758dd5339.png)

[Additional metrics hidden]
![without-chedked](https://user-images.githubusercontent.com/4736016/77244219-14ca2200-6c56-11ea-9874-33a466085fce.png)

### How was this patch tested?

Tested manually with a simple DataFrame operation.
* The appearance of additional metrics in the boxes are controlled by the newly added checkbox.
* No error found with JS-debugger.
* Checked/not-checked state is preserved after reloading.

Closes #27927 from sarutak/SPARK-31081.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
gengliangwang pushed a commit that referenced this pull request Mar 24, 2020
…of sql metrics toggleable

### What changes were proposed in this pull request?

This is another solution for `SPARK-31081` and #27849 .
I added a checkbox which can toggle display of stageId/taskid in the SQL's DAG page.
Mainly, I implemented the toggleable texts in boxes with HTML label feature provided by `dagre-d3`.
The additional metrics are enclosed by `<span>` and control the appearance of the text.
But the exception is additional metrics in clusters.
We can use HTML label for cluster but layout will be broken so I choosed normal text label for clusters.
Due to that, this solution contains a little bit tricky code in`spark-sql-viz.js` to manipulate the metric texts and generate DOMs.

### Why are the changes needed?

It makes metrics harder to read after #26843 and user may not interest in extra info(stageId/StageAttemptId/taskId ) when they do not need debug.
#27849 control the appearance by a new configuration property but providing a checkbox is more flexible.

### Does this PR introduce any user-facing change?

Yes.
[Additional metrics shown]
![with-checked](https://user-images.githubusercontent.com/4736016/77244214-0f6cd780-6c56-11ea-9275-a30758dd5339.png)

[Additional metrics hidden]
![without-chedked](https://user-images.githubusercontent.com/4736016/77244219-14ca2200-6c56-11ea-9874-33a466085fce.png)

### How was this patch tested?

Tested manually with a simple DataFrame operation.
* The appearance of additional metrics in the boxes are controlled by the newly added checkbox.
* No error found with JS-debugger.
* Checked/not-checked state is preserved after reloading.

Closes #27927 from sarutak/SPARK-31081.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
(cherry picked from commit 999c9ed)
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…of sql metrics toggleable

### What changes were proposed in this pull request?

This is another solution for `SPARK-31081` and apache#27849 .
I added a checkbox which can toggle display of stageId/taskid in the SQL's DAG page.
Mainly, I implemented the toggleable texts in boxes with HTML label feature provided by `dagre-d3`.
The additional metrics are enclosed by `<span>` and control the appearance of the text.
But the exception is additional metrics in clusters.
We can use HTML label for cluster but layout will be broken so I choosed normal text label for clusters.
Due to that, this solution contains a little bit tricky code in`spark-sql-viz.js` to manipulate the metric texts and generate DOMs.

### Why are the changes needed?

It makes metrics harder to read after apache#26843 and user may not interest in extra info(stageId/StageAttemptId/taskId ) when they do not need debug.
apache#27849 control the appearance by a new configuration property but providing a checkbox is more flexible.

### Does this PR introduce any user-facing change?

Yes.
[Additional metrics shown]
![with-checked](https://user-images.githubusercontent.com/4736016/77244214-0f6cd780-6c56-11ea-9275-a30758dd5339.png)

[Additional metrics hidden]
![without-chedked](https://user-images.githubusercontent.com/4736016/77244219-14ca2200-6c56-11ea-9874-33a466085fce.png)

### How was this patch tested?

Tested manually with a simple DataFrame operation.
* The appearance of additional metrics in the boxes are controlled by the newly added checkbox.
* No error found with JS-debugger.
* Checked/not-checked state is preserved after reloading.

Closes apache#27927 from sarutak/SPARK-31081.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jun 25, 2020
@HyukjinKwon
Copy link
Member

Closing in favour of #27927

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants