Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-40595][SQL] Improve error message for unused CTE relations #38029

Closed
wants to merge 1 commit into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

In CheckAnalysis, we inline CTE relations first and then check the plan. This causes an issue if the CTE relation is not used, as the relation will be removed after inline. Then we will hit the last safeguard in CheckAnalysis:

    plan.foreachUp {
      case o if !o.resolved =>
        failAnalysis(s"unresolved operator ${o.simpleString(SQLConf.get.maxToStringFields)}")
      case _ =>
    }

This produces pretty bad error messages.

To fix this issue, this PR does an extra analysis check for CTE relations that are not used.

Why are the changes needed?

better error message.

Does this PR introduce any user-facing change?

no

How was this patch tested?

new test

@cloud-fan
Copy link
Contributor Author

cc @MaxGekk @srielau

@github-actions github-actions bot added the SQL label Sep 28, 2022
@MaxGekk
Copy link
Member

MaxGekk commented Sep 28, 2022

+1, LGTM. Merging to master.
Thank you, @cloud-fan and @amaliujia for review.

@MaxGekk MaxGekk closed this in 6adda25 Sep 28, 2022
cloud-fan pushed a commit that referenced this pull request Jan 6, 2023
### What changes were proposed in this pull request?

The commit #38029 actually intended to do the right thing: it checks CTE more aggressively even if a CTE is not used, which is ok. However, it triggers an existing issue where a subquery checks itself but in the CTE case if the subquery contains a CTE which is defined outside of the subquery, the check will fail as CTE not found (e.g. key not found).

So it is:

the commit checks more thus in the repro examples, every CTE is checked now (in the past only used CTE is checked).

One of the CTE that is checked after the commit in the example contains subquery.

The subquery contains another CTE which is defined outside of the subquery.

The subquery checks itself thus fail due to CTE not found.

This PR fixes the issue by removing the subquery self-validation on CTE case.

### Why are the changes needed?

This fixed a regression that
```
    val df = sql("""
                   |    WITH
                   |    cte1 as (SELECT 1 col1),
                   |    cte2 as (SELECT (SELECT MAX(col1) FROM cte1))
                   |    SELECT * FROM cte1
                   |""".stripMargin
    )
    checkAnswer(df, Row(1) :: Nil)
```

cannot pass analyzer anymore.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

UT

Closes #39414 from amaliujia/fix_subquery_validate.

Authored-by: Rui Wang <rui.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants