Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31440][SQL] Improve SQL Rest API #28208

Closed
wants to merge 14 commits into from
Closed

[SPARK-31440][SQL] Improve SQL Rest API #28208

wants to merge 14 commits into from

Conversation

erenavsarogullari
Copy link
Member

@erenavsarogullari erenavsarogullari commented Apr 13, 2020

What changes were proposed in this pull request?

SQL Rest API exposes query execution metrics as Public API. This PR aims to apply following improvements on SQL Rest API by aligning Spark-UI.

Proposed Improvements:
1- Support Physical Operations and group metrics per physical operation by aligning Spark UI.
2- Support wholeStageCodegenId for Physical Operations
3- nodeId can be useful for grouping metrics and sorting physical operations (according to execution order) to differentiate same operators (if used multiple times during the same query execution) and their metrics.
4- Filter empty metrics by aligning with Spark UI - SQL Tab. Currently, Spark UI does not show empty metrics.
5- Remove line breakers(\n) from metricValue.
6- planDescription can be optional Http parameter to avoid network cost where there is specially complex jobs creating big-plans.
7- metrics attribute needs to be exposed at the bottom order as nodes. Specially, this can be useful for the user where nodes array size is high.
8- edges attribute is being exposed to show relationship between nodes.
9- Reverse order on metricDetails aims to match with Spark UI by supporting Physical Operators' execution order.

Why are the changes needed?

Proposed improvements provides more useful (e.g: physical operations and metrics correlation, grouping) and clear (e.g: filtering blank metrics, removing line breakers) result for the end-user.

Does this PR introduce any user-facing change?

Yes. Please find both current and improved versions of the results as attached for following SQL Rest Endpoint:

curl -X GET http://localhost:4040/api/v1/applications/$appId/sql/$executionId?details=true

Current version:
https://issues.apache.org/jira/secure/attachment/12999821/current_version.json

Improved version:
https://issues.apache.org/jira/secure/attachment/13000621/improved_version.json

Backward Compatibility

SQL Rest API will be started to expose with Spark 3.0 and 3.0.0-preview2 (released on 12/23/19) does not cover this API so if PR can catch 3.0 release, this will not have any backward compatibility issue.

How was this patch tested?

  1. New Unit tests are added.
  2. Also, patch has been tested manually through both Spark Core and History Server Rest APIs.

@SparkQA
Copy link

SparkQA commented Apr 13, 2020

Test build #121221 has finished for PR 28208 at commit 9754b66.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 14, 2020

Test build #121223 has finished for PR 28208 at commit a26cbbe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@erenavsarogullari
Copy link
Member Author

@SparkQA
Copy link

SparkQA commented Apr 20, 2020

Test build #121546 has finished for PR 28208 at commit f41435e.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class SparkPlanGraphCluster(

@SparkQA
Copy link

SparkQA commented Apr 21, 2020

Test build #121547 has finished for PR 28208 at commit 115141c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@erenavsarogullari
Copy link
Member Author

erenavsarogullari commented Apr 21, 2020

Last build failure seems irrelevant:
org.apache.spark.sql.streaming.StreamingDeduplicationSuite.test no-data flag
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121547/

@erenavsarogullari
Copy link
Member Author

retest this please

@gengliangwang
Copy link
Member

Hi @erenavsarogullari Thanks for the work.
The link https://issues.apache.org/jira/secure/attachment/12999822/improved_version.json is broken now. Could you fix it?

@erenavsarogullari
Copy link
Member Author

Hi @gengliangwang,
Thanks for the review. improved_version link has just been updated as
https://issues.apache.org/jira/secure/attachment/13000621/improved_version.json

@gengliangwang
Copy link
Member

I see.
From the https://issues.apache.org/jira/secure/attachment/13000621/improved_version.json, if there are only nodes without edges, it seems providing node id is pointless.
We can build a graph if with edges.

@SparkQA
Copy link

SparkQA commented Apr 21, 2020

Test build #121560 has finished for PR 28208 at commit 115141c.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member

@erenavsarogullari I meant we can expose the edges, which seems better.

@erenavsarogullari
Copy link
Member Author

erenavsarogullari commented Apr 22, 2020

@gengliangwang Currently node and edge details are as below:
Spark Plan Graph With Nodes and Edges

Could you please share a sample json?

@SparkQA
Copy link

SparkQA commented Apr 22, 2020

Test build #121597 has finished for PR 28208 at commit 7792741.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 22, 2020

Test build #121598 has finished for PR 28208 at commit da88978.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@erenavsarogullari
Copy link
Member Author

erenavsarogullari commented Apr 27, 2020

Hi @gengliangwang,

Thanks again for the review. Currently, all nodes are being listed where edges infos as previous attached screenshot. Please let me know if you have any concern with current way.

Also, i created documentation for SQL Rest API - PR: #28354 as the follow-up for this work. Just fyi.

@gengliangwang
Copy link
Member

Hi @erenavsarogullari ,

sorry for the late reply. I think we can output the whole SparkPlanGraph in json format:

nodes: [ {..}, {..}]
edges:  [{fromId: 1, toId, 2}, ...]

@@ -146,4 +146,6 @@ class SparkPlanGraphNodeWrapper(
case class SQLPlanMetric(
name: String,
accumulatorId: Long,
metricType: String)
metricType: String,
nodeId: Option[Long] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we use the node name and id in SparkPlanGraphNode instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current implementation uses SQLExecutionUIData and exposes job and physicalPlanDescription details through it. However, SQLExecutionUIData does not have SparkPlanGraphNode data where it has SQLPlanMetric if makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we can access the sqlStore in the SqlResource.scala, we can get the corresponding SparkPlanGraphNode

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both nodeId: Option[Long] = None and nodeName: Option[String] = None were removed from SQLPlanMetric and provided both of them by sqlStore.planGraph(executionId) instead.

@SparkQA
Copy link

SparkQA commented May 2, 2020

Test build #122184 has finished for PR 28208 at commit 817ebab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@erenavsarogullari
Copy link
Member Author

erenavsarogullari commented May 8, 2020

@gengliangwang There is optional Http parameter: details (default: false). It needs to be set in order to fetch node, edge and planDescription details as well:
http://localhost:4040/api/v1/applications//sql/0?details=true

I tried again and I can see them now. Thanks for pointing it out.
How about showing the details by default? Users might not be aware of the option.

@gengliangwang Yes, it makes sense and set details Http parameter true as default by the last patch and if the end-user needs, both details and/or planDescription can be disabled as follows:
../sql?details=false&planDescription=false
or
../sql/0?details=false&planDescription=false

@erenavsarogullari
Copy link
Member Author

erenavsarogullari commented May 8, 2020

Also, i think this PR can be useful for the community if it can be covered by v3.0.0 (current jira affects version points v3.1.0). Users can start to use this version directly. WDYT?

@SparkQA
Copy link

SparkQA commented May 8, 2020

Test build #122420 has finished for PR 28208 at commit 2a1e388.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 8, 2020

Test build #122424 has finished for PR 28208 at commit e685b94.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 9, 2020

Test build #122455 has finished for PR 28208 at commit 6e9b55f.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class SparkPlanGraphNode(

@SparkQA
Copy link

SparkQA commented May 9, 2020

Test build #122457 has finished for PR 28208 at commit c0660b1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}

private def prepareExecutionData(exec: SQLExecutionUIData, details: Boolean): ExecutionData = {
private def getNodeIdAndWSCGIdMap(graph: SparkPlanGraph): Map[Long, Option[Long]] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the WSCG node name contains the index. Why do we need to get the mapping here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WSCG node has both nodeId and WSCG index as follows. WSCG index comes as part of WSCG nodeName and needs to be parsed before populating of other nodes' for their wholeStageCodegenId attribute so this map is useful for computation and readability by reducing complexity. Also, i cleaned up existing implementation by covering this part. Please find final patch: 2743296

{
    "nodeId": 2,
    "nodeName": "WholeStageCodegen (1)",
    "metrics": [...]
}

@gengliangwang
Copy link
Member

@erenavsarogullari +1 for putting this in 3.0


metricValues.get(accumulatorId).map( mv => {
val metricValue = if (mv.startsWith("\n")) mv.substring(1, mv.length) else mv
Metric(metricName, metricValue)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually after #28037, part of the metrics name is in the value...
I can see the json output like:

{
"nodeId": 9,
"nodeName": "WholeStageCodegen (1)",
"metrics": [
{
"name": "duration",
"value": "total (min, med, max (stageId: taskId))\n4.3 s (269 ms, 270 ms, 272 ms (stage 3.0: task 16))"
}
]
}

But this is not strongly related to this PR. We can just fix it in another PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sounds good.

@SparkQA
Copy link

SparkQA commented May 18, 2020

Test build #122769 has finished for PR 28208 at commit 2743296.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 18, 2020

Test build #122771 has finished for PR 28208 at commit 2f9522f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@GET
def sqlList(
@DefaultValue("false") @QueryParam("details") details: Boolean,
@DefaultValue("true") @QueryParam("details") details: Boolean,
@DefaultValue("true") @QueryParam("planDescription") planDescription: Boolean,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erenavsarogullari one last comment: why do we have extra option "planDescription" here. It seems reasonable to be covered by the option "details"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gengliangwang Thanks for the review.

Please find my comments as follows:
1- planDescription exposes Physical Plan which covers dataset column-names. Column Names can be thought as customer sensitive data so with this option, end-users can disable in the light of their use-cases when they still access metrics.
2- For complex queries, planDescription can be big string and create network overhead. In this case, it can be disabled where it is not required and metrics are required(e.g: time-series monitoring - metrics need to be persisted & exposed but Physical Plan does not) (if makes sense)

Copy link
Member

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except one last question. Thanks for the works!

@gengliangwang
Copy link
Member

Thanks, merging to master/3.0

gengliangwang pushed a commit that referenced this pull request May 19, 2020
### What changes were proposed in this pull request?
SQL Rest API exposes query execution metrics as Public API. This PR aims to apply following improvements on SQL Rest API by aligning Spark-UI.

**Proposed Improvements:**
1- Support Physical Operations and group metrics per physical operation by aligning Spark UI.
2- Support `wholeStageCodegenId` for Physical Operations
3- `nodeId` can be useful for grouping metrics and sorting physical operations (according to execution order) to differentiate same operators (if used multiple times during the same query execution) and their metrics.
4- Filter `empty` metrics by aligning with Spark UI - SQL Tab. Currently, Spark UI does not show empty metrics.
5- Remove line breakers(`\n`) from `metricValue`.
6- `planDescription` can be `optional` Http parameter to avoid network cost where there is specially complex jobs creating big-plans.
7- `metrics` attribute needs to be exposed at the bottom order as `nodes`. Specially, this can be useful for the user where `nodes` array size is high.
8- `edges` attribute is being exposed to show relationship between `nodes`.
9- Reverse order on `metricDetails` aims to match with Spark UI by supporting Physical Operators' execution order.

### Why are the changes needed?
Proposed improvements provides more useful (e.g: physical operations and metrics correlation, grouping) and clear (e.g: filtering blank metrics, removing line breakers) result for the end-user.

### Does this PR introduce any user-facing change?
Yes. Please find both current and improved versions of the results as attached for following SQL Rest Endpoint:
```
curl -X GET http://localhost:4040/api/v1/applications/$appId/sql/$executionId?details=true
```
**Current version:**
https://issues.apache.org/jira/secure/attachment/12999821/current_version.json

**Improved version:**
https://issues.apache.org/jira/secure/attachment/13000621/improved_version.json

### Backward Compatibility
SQL Rest API will be started to expose with `Spark 3.0` and `3.0.0-preview2` (released on 12/23/19) does not cover this API so if PR can catch 3.0 release, this will not have any backward compatibility issue.

### How was this patch tested?
1. New Unit tests are added.
2. Also, patch has been tested manually through both **Spark Core** and **History Server** Rest APIs.

Closes #28208 from erenavsarogullari/SPARK-31440.

Authored-by: Eren Avsarogullari <eren.avsarogullari@gmail.com>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
(cherry picked from commit ab4cf49)
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
@gengliangwang
Copy link
Member

@erenavsarogullari I think we have to revert this one in branch-3.0. See my comments in #28588

cloud-fan pushed a commit that referenced this pull request May 20, 2020
### What changes were proposed in this pull request?

Revert #28208 and #24076 in branch 3.0
### Why are the changes needed?

Unfortunately, the PR  #28208 is merged after Spark 3.0 RC 2 cut. Although the improvement is great, we can't break the policy to add new improvement commits into branch 3.0 now.

Also, if we are going to adopt the improvement in a future release, we should not release 3.0 with #24076, since the API result will be changed.

After discuss with cloud-fan and gatorsmile offline, we think the best choice is to revert both commits and follow community release policy.

### Does this PR introduce _any_ user-facing change?

Yes, let's hold the SQL rest API until next release.

### How was this patch tested?

Jenkins unit tests.

Closes #28588 from gengliangwang/revertSQLRestAPI.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
wangyum pushed a commit that referenced this pull request Mar 22, 2022
…rException

### What changes were proposed in this pull request?

Added null check for `exec.metricValues`.

### Why are the changes needed?

When requesting Restful API  {baseURL}/api/v1/applications/$appId/sql/$executionId which is introduced by this PR #28208, it can cause NullPointerException. The root cause is, when calling method doUpdate() of `LiveExecutionData`, `metricsValues` can be null. Then, when statement `printableMetrics(graph.allNodes, exec.metricValues)` is executed, it will throw NullPointerException.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Tested manually.

Closes #35884 from yym1995/fix-npe.

Lead-authored-by: Yimin <yimin.y@outlook.com>
Co-authored-by: Yimin Yang <26797163+yym1995@users.noreply.github.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
wangyum pushed a commit that referenced this pull request Mar 22, 2022
…rException

### What changes were proposed in this pull request?

Added null check for `exec.metricValues`.

### Why are the changes needed?

When requesting Restful API  {baseURL}/api/v1/applications/$appId/sql/$executionId which is introduced by this PR #28208, it can cause NullPointerException. The root cause is, when calling method doUpdate() of `LiveExecutionData`, `metricsValues` can be null. Then, when statement `printableMetrics(graph.allNodes, exec.metricValues)` is executed, it will throw NullPointerException.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Tested manually.

Closes #35884 from yym1995/fix-npe.

Lead-authored-by: Yimin <yimin.y@outlook.com>
Co-authored-by: Yimin Yang <26797163+yym1995@users.noreply.github.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
(cherry picked from commit 99992a4)
Signed-off-by: Yuming Wang <yumwang@ebay.com>
wangyum pushed a commit that referenced this pull request Mar 22, 2022
…rException

### What changes were proposed in this pull request?

Added null check for `exec.metricValues`.

### Why are the changes needed?

When requesting Restful API  {baseURL}/api/v1/applications/$appId/sql/$executionId which is introduced by this PR #28208, it can cause NullPointerException. The root cause is, when calling method doUpdate() of `LiveExecutionData`, `metricsValues` can be null. Then, when statement `printableMetrics(graph.allNodes, exec.metricValues)` is executed, it will throw NullPointerException.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Tested manually.

Closes #35884 from yym1995/fix-npe.

Lead-authored-by: Yimin <yimin.y@outlook.com>
Co-authored-by: Yimin Yang <26797163+yym1995@users.noreply.github.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
(cherry picked from commit 99992a4)
Signed-off-by: Yuming Wang <yumwang@ebay.com>
wangyum pushed a commit that referenced this pull request Mar 22, 2022
…rException

### What changes were proposed in this pull request?

Added null check for `exec.metricValues`.

### Why are the changes needed?

When requesting Restful API  {baseURL}/api/v1/applications/$appId/sql/$executionId which is introduced by this PR #28208, it can cause NullPointerException. The root cause is, when calling method doUpdate() of `LiveExecutionData`, `metricsValues` can be null. Then, when statement `printableMetrics(graph.allNodes, exec.metricValues)` is executed, it will throw NullPointerException.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Tested manually.

Closes #35884 from yym1995/fix-npe.

Lead-authored-by: Yimin <yimin.y@outlook.com>
Co-authored-by: Yimin Yang <26797163+yym1995@users.noreply.github.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
(cherry picked from commit 99992a4)
Signed-off-by: Yuming Wang <yumwang@ebay.com>
kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
…rException

### What changes were proposed in this pull request?

Added null check for `exec.metricValues`.

### Why are the changes needed?

When requesting Restful API  {baseURL}/api/v1/applications/$appId/sql/$executionId which is introduced by this PR apache#28208, it can cause NullPointerException. The root cause is, when calling method doUpdate() of `LiveExecutionData`, `metricsValues` can be null. Then, when statement `printableMetrics(graph.allNodes, exec.metricValues)` is executed, it will throw NullPointerException.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Tested manually.

Closes apache#35884 from yym1995/fix-npe.

Lead-authored-by: Yimin <yimin.y@outlook.com>
Co-authored-by: Yimin Yang <26797163+yym1995@users.noreply.github.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
(cherry picked from commit 99992a4)
Signed-off-by: Yuming Wang <yumwang@ebay.com>
(cherry picked from commit c3aace7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants