[SPARK-31440][SQL] Improve SQL Rest API #28208

erenavsarogullari · 2020-04-13T20:51:00Z

What changes were proposed in this pull request?

SQL Rest API exposes query execution metrics as Public API. This PR aims to apply following improvements on SQL Rest API by aligning Spark-UI.

Proposed Improvements:
1- Support Physical Operations and group metrics per physical operation by aligning Spark UI.
2- Support wholeStageCodegenId for Physical Operations
3- nodeId can be useful for grouping metrics and sorting physical operations (according to execution order) to differentiate same operators (if used multiple times during the same query execution) and their metrics.
4- Filter empty metrics by aligning with Spark UI - SQL Tab. Currently, Spark UI does not show empty metrics.
5- Remove line breakers(\n) from metricValue.
6- planDescription can be optional Http parameter to avoid network cost where there is specially complex jobs creating big-plans.
7- metrics attribute needs to be exposed at the bottom order as nodes. Specially, this can be useful for the user where nodes array size is high.
8- edges attribute is being exposed to show relationship between nodes.
9- Reverse order on metricDetails aims to match with Spark UI by supporting Physical Operators' execution order.

Why are the changes needed?

Proposed improvements provides more useful (e.g: physical operations and metrics correlation, grouping) and clear (e.g: filtering blank metrics, removing line breakers) result for the end-user.

Does this PR introduce any user-facing change?

Yes. Please find both current and improved versions of the results as attached for following SQL Rest Endpoint:

curl -X GET http://localhost:4040/api/v1/applications/$appId/sql/$executionId?details=true

Current version:
https://issues.apache.org/jira/secure/attachment/12999821/current_version.json

Improved version:
https://issues.apache.org/jira/secure/attachment/13000621/improved_version.json

Backward Compatibility

SQL Rest API will be started to expose with Spark 3.0 and 3.0.0-preview2 (released on 12/23/19) does not cover this API so if PR can catch 3.0 release, this will not have any backward compatibility issue.

How was this patch tested?

New Unit tests are added.
Also, patch has been tested manually through both Spark Core and History Server Rest APIs.

SparkQA · 2020-04-13T20:55:31Z

Test build #121221 has finished for PR 28208 at commit 9754b66.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-04-14T01:54:25Z

Test build #121223 has finished for PR 28208 at commit a26cbbe.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

erenavsarogullari · 2020-04-15T19:47:56Z

cc @gengliangwang @vanzin @dongjoon-hyun

SparkQA · 2020-04-20T21:44:00Z

Test build #121546 has finished for PR 28208 at commit f41435e.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class SparkPlanGraphCluster(

SparkQA · 2020-04-21T02:15:28Z

Test build #121547 has finished for PR 28208 at commit 115141c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

erenavsarogullari · 2020-04-21T03:31:25Z

Last build failure seems irrelevant:
org.apache.spark.sql.streaming.StreamingDeduplicationSuite.test no-data flag
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121547/

erenavsarogullari · 2020-04-21T03:32:46Z

retest this please

sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/api.scala

sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala

gengliangwang · 2020-04-21T05:14:18Z

Hi @erenavsarogullari Thanks for the work.
The link https://issues.apache.org/jira/secure/attachment/12999822/improved_version.json is broken now. Could you fix it?

erenavsarogullari · 2020-04-21T05:19:01Z

Hi @gengliangwang,
Thanks for the review. improved_version link has just been updated as
https://issues.apache.org/jira/secure/attachment/13000621/improved_version.json

gengliangwang · 2020-04-21T05:23:28Z

I see.
From the https://issues.apache.org/jira/secure/attachment/13000621/improved_version.json, if there are only nodes without edges, it seems providing node id is pointless.
We can build a graph if with edges.

SparkQA · 2020-04-21T07:05:01Z

Test build #121560 has finished for PR 28208 at commit 115141c.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

gengliangwang · 2020-04-21T22:52:31Z

@erenavsarogullari I meant we can expose the edges, which seems better.

erenavsarogullari · 2020-04-22T00:51:16Z

@gengliangwang Currently node and edge details are as below:

Could you please share a sample json?

SparkQA · 2020-04-22T01:49:08Z

Test build #121597 has finished for PR 28208 at commit 7792741.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-04-22T03:20:11Z

Test build #121598 has finished for PR 28208 at commit da88978.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

erenavsarogullari · 2020-04-27T01:17:02Z

Hi @gengliangwang,

Thanks again for the review. Currently, all nodes are being listed where edges infos as previous attached screenshot. Please let me know if you have any concern with current way.

Also, i created documentation for SQL Rest API - PR: #28354 as the follow-up for this work. Just fyi.

gengliangwang · 2020-04-28T00:10:51Z

Hi @erenavsarogullari ,

sorry for the late reply. I think we can output the whole SparkPlanGraph in json format:

nodes: [ {..}, {..}]
edges:  [{fromId: 1, toId, 2}, ...]

gengliangwang · 2020-04-28T00:11:52Z

sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusStore.scala

@@ -146,4 +146,6 @@ class SparkPlanGraphNodeWrapper(
 case class SQLPlanMetric(
    name: String,
    accumulatorId: Long,
-    metricType: String)
+    metricType: String,
+    nodeId: Option[Long] = None,


shall we use the node name and id in SparkPlanGraphNode instead?

Current implementation uses SQLExecutionUIData and exposes job and physicalPlanDescription details through it. However, SQLExecutionUIData does not have SparkPlanGraphNode data where it has SQLPlanMetric if makes sense.

As long as we can access the sqlStore in the SqlResource.scala, we can get the corresponding SparkPlanGraphNode

Both nodeId: Option[Long] = None and nodeName: Option[String] = None were removed from SQLPlanMetric and provided both of them by sqlStore.planGraph(executionId) instead.

SparkQA · 2020-05-02T04:53:44Z

Test build #122184 has finished for PR 28208 at commit 817ebab.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

erenavsarogullari · 2020-05-08T00:51:08Z

@gengliangwang There is optional Http parameter: details (default: false). It needs to be set in order to fetch node, edge and planDescription details as well:
http://localhost:4040/api/v1/applications//sql/0?details=true

I tried again and I can see them now. Thanks for pointing it out.
How about showing the details by default? Users might not be aware of the option.

@gengliangwang Yes, it makes sense and set details Http parameter true as default by the last patch and if the end-user needs, both details and/or planDescription can be disabled as follows:
../sql?details=false&planDescription=false
or
../sql/0?details=false&planDescription=false

erenavsarogullari · 2020-05-08T01:12:02Z

Also, i think this PR can be useful for the community if it can be covered by v3.0.0 (current jira affects version points v3.1.0). Users can start to use this version directly. WDYT?

SparkQA · 2020-05-08T04:19:35Z

Test build #122420 has finished for PR 28208 at commit 2a1e388.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-05-08T05:39:12Z

Test build #122424 has finished for PR 28208 at commit e685b94.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-05-09T00:22:10Z

Test build #122455 has finished for PR 28208 at commit 6e9b55f.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class SparkPlanGraphNode(

SparkQA · 2020-05-09T05:00:45Z

Test build #122457 has finished for PR 28208 at commit c0660b1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gengliangwang · 2020-05-12T08:29:11Z

sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala

  }

-  private def prepareExecutionData(exec: SQLExecutionUIData, details: Boolean): ExecutionData = {
+  private def getNodeIdAndWSCGIdMap(graph: SparkPlanGraph): Map[Long, Option[Long]] = {


I think the WSCG node name contains the index. Why do we need to get the mapping here?

WSCG node has both nodeId and WSCG index as follows. WSCG index comes as part of WSCG nodeName and needs to be parsed before populating of other nodes' for their wholeStageCodegenId attribute so this map is useful for computation and readability by reducing complexity. Also, i cleaned up existing implementation by covering this part. Please find final patch: 2743296

{ "nodeId": 2, "nodeName": "WholeStageCodegen (1)", "metrics": [...] }

gengliangwang · 2020-05-12T08:30:59Z

@erenavsarogullari +1 for putting this in 3.0

gengliangwang · 2020-05-12T08:33:06Z

sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala

+
+      metricValues.get(accumulatorId).map( mv => {
+        val metricValue = if (mv.startsWith("\n")) mv.substring(1, mv.length) else mv
+        Metric(metricName, metricValue)


Actually after #28037, part of the metrics name is in the value...
I can see the json output like:

{ "nodeId": 9, "nodeName": "WholeStageCodegen (1)", "metrics": [ { "name": "duration", "value": "total (min, med, max (stageId: taskId))\n4.3 s (269 ms, 270 ms, 272 ms (stage 3.0: task 16))" } ] }

But this is not strongly related to this PR. We can just fix it in another PR.

Yes, sounds good.

SparkQA · 2020-05-18T00:31:33Z

Test build #122769 has finished for PR 28208 at commit 2743296.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-05-18T06:39:11Z

Test build #122771 has finished for PR 28208 at commit 2f9522f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gengliangwang · 2020-05-19T05:01:50Z

sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala

  @GET
  def sqlList(
-      @DefaultValue("false") @QueryParam("details") details: Boolean,
+      @DefaultValue("true") @QueryParam("details") details: Boolean,
+      @DefaultValue("true") @QueryParam("planDescription") planDescription: Boolean,


@erenavsarogullari one last comment: why do we have extra option "planDescription" here. It seems reasonable to be covered by the option "details"

@gengliangwang Thanks for the review.

Please find my comments as follows:
1- planDescription exposes Physical Plan which covers dataset column-names. Column Names can be thought as customer sensitive data so with this option, end-users can disable in the light of their use-cases when they still access metrics.
2- For complex queries, planDescription can be big string and create network overhead. In this case, it can be disabled where it is not required and metrics are required(e.g: time-series monitoring - metrics need to be persisted & exposed but Physical Plan does not) (if makes sense)

gengliangwang

LGTM except one last question. Thanks for the works!

gengliangwang · 2020-05-19T06:20:47Z

Thanks, merging to master/3.0

### What changes were proposed in this pull request? SQL Rest API exposes query execution metrics as Public API. This PR aims to apply following improvements on SQL Rest API by aligning Spark-UI. **Proposed Improvements:** 1- Support Physical Operations and group metrics per physical operation by aligning Spark UI. 2- Support `wholeStageCodegenId` for Physical Operations 3- `nodeId` can be useful for grouping metrics and sorting physical operations (according to execution order) to differentiate same operators (if used multiple times during the same query execution) and their metrics. 4- Filter `empty` metrics by aligning with Spark UI - SQL Tab. Currently, Spark UI does not show empty metrics. 5- Remove line breakers(`\n`) from `metricValue`. 6- `planDescription` can be `optional` Http parameter to avoid network cost where there is specially complex jobs creating big-plans. 7- `metrics` attribute needs to be exposed at the bottom order as `nodes`. Specially, this can be useful for the user where `nodes` array size is high. 8- `edges` attribute is being exposed to show relationship between `nodes`. 9- Reverse order on `metricDetails` aims to match with Spark UI by supporting Physical Operators' execution order. ### Why are the changes needed? Proposed improvements provides more useful (e.g: physical operations and metrics correlation, grouping) and clear (e.g: filtering blank metrics, removing line breakers) result for the end-user. ### Does this PR introduce any user-facing change? Yes. Please find both current and improved versions of the results as attached for following SQL Rest Endpoint: ``` curl -X GET http://localhost:4040/api/v1/applications/$appId/sql/$executionId?details=true ``` **Current version:** https://issues.apache.org/jira/secure/attachment/12999821/current_version.json **Improved version:** https://issues.apache.org/jira/secure/attachment/13000621/improved_version.json ### Backward Compatibility SQL Rest API will be started to expose with `Spark 3.0` and `3.0.0-preview2` (released on 12/23/19) does not cover this API so if PR can catch 3.0 release, this will not have any backward compatibility issue. ### How was this patch tested? 1. New Unit tests are added. 2. Also, patch has been tested manually through both **Spark Core** and **History Server** Rest APIs. Closes #28208 from erenavsarogullari/SPARK-31440. Authored-by: Eren Avsarogullari <eren.avsarogullari@gmail.com> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com> (cherry picked from commit ab4cf49) Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>

gengliangwang · 2020-05-20T04:20:52Z

@erenavsarogullari I think we have to revert this one in branch-3.0. See my comments in #28588

### What changes were proposed in this pull request? Revert #28208 and #24076 in branch 3.0 ### Why are the changes needed? Unfortunately, the PR #28208 is merged after Spark 3.0 RC 2 cut. Although the improvement is great, we can't break the policy to add new improvement commits into branch 3.0 now. Also, if we are going to adopt the improvement in a future release, we should not release 3.0 with #24076, since the API result will be changed. After discuss with cloud-fan and gatorsmile offline, we think the best choice is to revert both commits and follow community release policy. ### Does this PR introduce _any_ user-facing change? Yes, let's hold the SQL rest API until next release. ### How was this patch tested? Jenkins unit tests. Closes #28588 from gengliangwang/revertSQLRestAPI. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…rException ### What changes were proposed in this pull request? Added null check for `exec.metricValues`. ### Why are the changes needed? When requesting Restful API {baseURL}/api/v1/applications/$appId/sql/$executionId which is introduced by this PR #28208, it can cause NullPointerException. The root cause is, when calling method doUpdate() of `LiveExecutionData`, `metricsValues` can be null. Then, when statement `printableMetrics(graph.allNodes, exec.metricValues)` is executed, it will throw NullPointerException. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tested manually. Closes #35884 from yym1995/fix-npe. Lead-authored-by: Yimin <yimin.y@outlook.com> Co-authored-by: Yimin Yang <26797163+yym1995@users.noreply.github.com> Signed-off-by: Yuming Wang <yumwang@ebay.com>

…rException ### What changes were proposed in this pull request? Added null check for `exec.metricValues`. ### Why are the changes needed? When requesting Restful API {baseURL}/api/v1/applications/$appId/sql/$executionId which is introduced by this PR #28208, it can cause NullPointerException. The root cause is, when calling method doUpdate() of `LiveExecutionData`, `metricsValues` can be null. Then, when statement `printableMetrics(graph.allNodes, exec.metricValues)` is executed, it will throw NullPointerException. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tested manually. Closes #35884 from yym1995/fix-npe. Lead-authored-by: Yimin <yimin.y@outlook.com> Co-authored-by: Yimin Yang <26797163+yym1995@users.noreply.github.com> Signed-off-by: Yuming Wang <yumwang@ebay.com> (cherry picked from commit 99992a4) Signed-off-by: Yuming Wang <yumwang@ebay.com>

…rException ### What changes were proposed in this pull request? Added null check for `exec.metricValues`. ### Why are the changes needed? When requesting Restful API {baseURL}/api/v1/applications/$appId/sql/$executionId which is introduced by this PR apache#28208, it can cause NullPointerException. The root cause is, when calling method doUpdate() of `LiveExecutionData`, `metricsValues` can be null. Then, when statement `printableMetrics(graph.allNodes, exec.metricValues)` is executed, it will throw NullPointerException. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tested manually. Closes apache#35884 from yym1995/fix-npe. Lead-authored-by: Yimin <yimin.y@outlook.com> Co-authored-by: Yimin Yang <26797163+yym1995@users.noreply.github.com> Signed-off-by: Yuming Wang <yumwang@ebay.com> (cherry picked from commit 99992a4) Signed-off-by: Yuming Wang <yumwang@ebay.com> (cherry picked from commit c3aace7)

probot-autolabeler bot added SQL WEB UI labels Apr 13, 2020

gengliangwang reviewed Apr 21, 2020

View reviewed changes

sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala Outdated Show resolved Hide resolved

erenavsarogullari mentioned this pull request Apr 27, 2020

[SPARK-31566][SQL][DOCS] Add SQL Rest API Documentation #28354

Closed

gengliangwang reviewed Apr 28, 2020

View reviewed changes

Eren Avsarogullari added 5 commits May 1, 2020 17:37

Improve SQL Rest API

c89c5a4

Fix scalastyle issues

d4a78a7

Add WholeStageCodegen Grouping Support

06bc108

Fix scalastyle issues

6c5c2f9

Review comments are addressing

817ebab

Expose edges as well

b0a9149

Set details Http Parameter to true as default

e685b94

Address review comment to enable SparkPlanGraphNode

6e9b55f

Fix scalastyle issues

c0660b1

gengliangwang reviewed May 12, 2020

View reviewed changes

Refactoring on function parameters

2743296

Fix scalastyle issue

2f9522f

gengliangwang reviewed May 19, 2020

View reviewed changes

gengliangwang approved these changes May 19, 2020

View reviewed changes

gengliangwang closed this in ab4cf49 May 19, 2020

gengliangwang mentioned this pull request May 20, 2020

Revert [SPARK-27142][SPARK-31440] SQL rest API in branch 3.0 #28588

Closed

yimin-yang mentioned this pull request Mar 17, 2022

[SPARK-38579][SQL][WEBUI]Requesting Restful API can cause NullPointerException #35884

Closed

[SPARK-31440][SQL] Improve SQL Rest API #28208

[SPARK-31440][SQL] Improve SQL Rest API #28208

Conversation

erenavsarogullari commented Apr 13, 2020 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

Backward Compatibility

How was this patch tested?

SparkQA commented Apr 13, 2020

SparkQA commented Apr 14, 2020

erenavsarogullari commented Apr 15, 2020

SparkQA commented Apr 20, 2020

SparkQA commented Apr 21, 2020

erenavsarogullari commented Apr 21, 2020 • edited Loading

erenavsarogullari commented Apr 21, 2020

gengliangwang commented Apr 21, 2020

erenavsarogullari commented Apr 21, 2020

gengliangwang commented Apr 21, 2020

SparkQA commented Apr 21, 2020

gengliangwang commented Apr 21, 2020

erenavsarogullari commented Apr 22, 2020 • edited Loading

SparkQA commented Apr 22, 2020

SparkQA commented Apr 22, 2020

erenavsarogullari commented Apr 27, 2020 • edited Loading

gengliangwang commented Apr 28, 2020

gengliangwang Apr 28, 2020

Choose a reason for hiding this comment

erenavsarogullari May 2, 2020 • edited Loading

Choose a reason for hiding this comment

gengliangwang May 5, 2020

Choose a reason for hiding this comment

erenavsarogullari May 9, 2020 • edited Loading

Choose a reason for hiding this comment

SparkQA commented May 2, 2020

erenavsarogullari commented May 8, 2020 • edited Loading

erenavsarogullari commented May 8, 2020 • edited Loading

SparkQA commented May 8, 2020

SparkQA commented May 8, 2020

SparkQA commented May 9, 2020

SparkQA commented May 9, 2020

gengliangwang May 12, 2020

Choose a reason for hiding this comment

erenavsarogullari May 18, 2020 • edited Loading

Choose a reason for hiding this comment

gengliangwang commented May 12, 2020

gengliangwang May 12, 2020

Choose a reason for hiding this comment

erenavsarogullari May 18, 2020

Choose a reason for hiding this comment

SparkQA commented May 18, 2020

SparkQA commented May 18, 2020

gengliangwang May 19, 2020

Choose a reason for hiding this comment

erenavsarogullari May 19, 2020 • edited Loading

Choose a reason for hiding this comment

gengliangwang left a comment • edited Loading

Choose a reason for hiding this comment

gengliangwang commented May 19, 2020

gengliangwang commented May 20, 2020

erenavsarogullari commented Apr 13, 2020 •

edited

Loading

erenavsarogullari commented Apr 21, 2020 •

edited

Loading

erenavsarogullari commented Apr 22, 2020 •

edited

Loading

erenavsarogullari commented Apr 27, 2020 •

edited

Loading

erenavsarogullari May 2, 2020 •

edited

Loading

erenavsarogullari May 9, 2020 •

edited

Loading

erenavsarogullari commented May 8, 2020 •

edited

Loading

erenavsarogullari commented May 8, 2020 •

edited

Loading

erenavsarogullari May 18, 2020 •

edited

Loading

erenavsarogullari May 19, 2020 •

edited

Loading

gengliangwang left a comment •

edited

Loading