Skip to content

[SPARK-39112][SQL] UnsupportedOperationException if spark.sql.ui.explainMode is set to cost#36488

Closed
ulysses-you wants to merge 2 commits intoapache:masterfrom
ulysses-you:SPARK-39112
Closed

[SPARK-39112][SQL] UnsupportedOperationException if spark.sql.ui.explainMode is set to cost#36488
ulysses-you wants to merge 2 commits intoapache:masterfrom
ulysses-you:SPARK-39112

Conversation

@ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented May 9, 2022

What changes were proposed in this pull request?

Add a new leaf like node LeafNodeWithoutStats and apply to the list:

  • ResolvedDBObjectName
  • ResolvedNamespace
  • ResolvedTable
  • ResolvedView
  • ResolvedNonPersistentFunc
  • ResolvedPersistentFunc

Why are the changes needed?

We enable v2 command at 3.3.0 branch by default spark.sql.legacy.useV1Command. However this is a behavior change between v1 and c2 command.

  • v1 command:
    We resolve logical plan to command at analyzer phase by ResolveSessionCatalog

  • v2 commnd:
    We resolve logical plan to v2 command at physical phase by DataSourceV2Strategy

Foe cost explain mode, we will call LogicalPlanStats.stats using optimized plan so there is a gap between v1 and v2 command.
Unfortunately, the logical plan of v2 command contains the LeafNode which does not override the computeStats. As a result, there is a error running such sql:

set spark.sql.ui.explainMode=cost;
show tables;
java.lang.UnsupportedOperationException:
	at org.apache.spark.sql.catalyst.plans.logical.LeafNode.computeStats(LogicalPlan.scala:171)
	at org.apache.spark.sql.catalyst.plans.logical.LeafNode.computeStats$(LogicalPlan.scala:171)
	at org.apache.spark.sql.catalyst.analysis.ResolvedNamespace.computeStats(v2ResolutionPlans.scala:155)
	at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.SizeInBytesOnlyStatsPlanVisitor$.default(SizeInBytesOnlyStatsPlanVisitor.scala:55)
	at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.SizeInBytesOnlyStatsPlanVisitor$.default(SizeInBytesOnlyStatsPlanVisitor.scala:27)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlanVisitor.visit(LogicalPlanVisitor.scala:49)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlanVisitor.visit$(LogicalPlanVisitor.scala:25)
	at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.SizeInBytesOnlyStatsPlanVisitor$.visit(SizeInBytesOnlyStatsPlanVisitor.scala:27)
	at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats.$anonfun$stats$1(LogicalPlanStats.scala:37)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats.stats(LogicalPlanStats.scala:33)
	at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats.stats$(LogicalPlanStats.scala:33)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.stats(LogicalPlan.scala:30)

Does this PR introduce any user-facing change?

yes, bug fix

How was this patch tested?

add test

@github-actions github-actions bot added the SQL label May 9, 2022
@ulysses-you
Copy link
Contributor Author

/**
* A resolved leaf like node that its statistics is no meaning.
*/
trait ResolvedLeafObject extends LogicalPlan with LeafLike[LogicalPlan] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: LeafNodeWithoutStats seems better

}

test("SPARK-39112: UnsupportedOperationException if spark.sql.ui.explainMode is set to cost") {
withSQLConf(SQLConf.UI_EXPLAIN_MODE.key -> "cost") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we simply test with EXPLAIN COST ... command?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make it an abstract class and extends LeafNode? I'm a bit worried about changing the class hierarchy unnecessarily in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to trait LeafNodeWithoutStats extend LeafNode

Copy link
Contributor

@cloud-fan cloud-fan May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* A resolved leaf like node that its statistics is no meaning.
* A resolved leaf node whose statistics has no meaning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test name needs update

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.3!

@cloud-fan cloud-fan closed this in 06fd340 May 11, 2022
cloud-fan pushed a commit that referenced this pull request May 11, 2022
…ainMode is set to cost

### What changes were proposed in this pull request?

Add a new leaf like node `LeafNodeWithoutStats` and apply to the list:
- ResolvedDBObjectName
- ResolvedNamespace
- ResolvedTable
- ResolvedView
- ResolvedNonPersistentFunc
- ResolvedPersistentFunc

### Why are the changes needed?

We enable v2 command at 3.3.0 branch by default `spark.sql.legacy.useV1Command`. However this is a behavior change between v1 and c2 command.

- v1 command:
  We resolve logical plan to command at analyzer phase by `ResolveSessionCatalog`

- v2 commnd:
  We resolve logical plan to v2 command at physical phase by `DataSourceV2Strategy`

Foe cost explain mode, we will call `LogicalPlanStats.stats` using optimized plan so there is a gap between v1 and v2 command.
Unfortunately, the logical plan of v2 command contains the `LeafNode` which does not override the `computeStats`. As a result, there is a error running such sql:
```sql
set spark.sql.ui.explainMode=cost;
show tables;
```

```
java.lang.UnsupportedOperationException:
	at org.apache.spark.sql.catalyst.plans.logical.LeafNode.computeStats(LogicalPlan.scala:171)
	at org.apache.spark.sql.catalyst.plans.logical.LeafNode.computeStats$(LogicalPlan.scala:171)
	at org.apache.spark.sql.catalyst.analysis.ResolvedNamespace.computeStats(v2ResolutionPlans.scala:155)
	at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.SizeInBytesOnlyStatsPlanVisitor$.default(SizeInBytesOnlyStatsPlanVisitor.scala:55)
	at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.SizeInBytesOnlyStatsPlanVisitor$.default(SizeInBytesOnlyStatsPlanVisitor.scala:27)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlanVisitor.visit(LogicalPlanVisitor.scala:49)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlanVisitor.visit$(LogicalPlanVisitor.scala:25)
	at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.SizeInBytesOnlyStatsPlanVisitor$.visit(SizeInBytesOnlyStatsPlanVisitor.scala:27)
	at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats.$anonfun$stats$1(LogicalPlanStats.scala:37)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats.stats(LogicalPlanStats.scala:33)
	at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats.stats$(LogicalPlanStats.scala:33)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.stats(LogicalPlan.scala:30)
```

### Does this PR introduce _any_ user-facing change?

yes, bug fix

### How was this patch tested?

add test

Closes #36488 from ulysses-you/SPARK-39112.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 06fd340)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@ulysses-you ulysses-you deleted the SPARK-39112 branch May 11, 2022 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants