Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-39354][SQL] Ensure show Table or view not found even if there are dataTypeMismatchError related to Filter at the same time #36746

Closed
wants to merge 1 commit into from

Conversation

LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Jun 2, 2022

What changes were proposed in this pull request?

After SPARK-38118, dataTypeMismatchError related to Filter will be checked and throw in RemoveTempResolvedColumn, this will cause compatibility issue with exception message presentation.

For example, the following case:

spark.sql("create table t1(user_id int, auct_end_dt date) using parquet;")
spark.sql("select * from t1 join t2 on t1.user_id = t2.user_id where t1.auct_end_dt >= Date_sub('2020-12-27', 90)").show

The expected message is

Table or view not found: t2

But the actual message is

org.apache.spark.sql.AnalysisException: cannot resolve 'date_sub('2020-12-27', 90)' due to data type mismatch: argument 1 requires date type, however, ''2020-12-27'' is of string type.; line 1 pos 76

For forward compatibility, this pr change to only records DATA_TYPE_MISMATCH_ERROR_MESSAGE in the RemoveTempResolvedColumn check process , and move failAnalysis to CheckAnalysis#checkAnalysis

Why are the changes needed?

Fix analysis exception message compatibility.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Pass Github Actions and add a new test case

@github-actions github-actions bot added the SQL label Jun 2, 2022
@@ -1170,13 +1170,25 @@ class AnalysisSuite extends AnalysisTest with Matchers {
|WITH t as (SELECT true c, false d)
|SELECT (t.c AND t.d) c
|FROM t
|GROUP BY t.c
|GROUP BY t.c, t.d
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without t.d, the error message is

expression 't.d' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.

@LuciferYang
Copy link
Contributor Author

Try to fix SPARK-39354, if acceptable, I will update the pr description

@wangyum
Copy link
Member

wangyum commented Jun 2, 2022

@cloud-fan

@LuciferYang LuciferYang changed the title [DON'T MERGE] [DON'T MERGE][SQL] Show Table or view not found first even if there are dataTypeMismatchError at the same time Jun 2, 2022
@LuciferYang LuciferYang changed the title [DON'T MERGE][SQL] Show Table or view not found first even if there are dataTypeMismatchError at the same time [DON'T MERGE][SQL] Ensure show Table or view not found even if there are dataTypeMismatchError at the same time Jun 2, 2022
@LuciferYang LuciferYang changed the title [DON'T MERGE][SQL] Ensure show Table or view not found even if there are dataTypeMismatchError at the same time [SPARK-39354][SQL] Ensure show Table or view not found even if there are dataTypeMismatchError at the same time Jun 2, 2022
@LuciferYang LuciferYang marked this pull request as draft June 2, 2022 02:31
@LuciferYang LuciferYang marked this pull request as ready for review June 2, 2022 02:31
@LuciferYang LuciferYang changed the title [SPARK-39354][SQL] Ensure show Table or view not found even if there are dataTypeMismatchError at the same time [WIP][SPARK-39354][SQL] Ensure show Table or view not found even if there are dataTypeMismatchError at the same time Jun 2, 2022
@LuciferYang LuciferYang changed the title [WIP][SPARK-39354][SQL] Ensure show Table or view not found even if there are dataTypeMismatchError at the same time [WIP][SPARK-39354][SQL] Ensure show Table or view not found even if there are dataTypeMismatchError for Filter at the same time Jun 2, 2022
@LuciferYang LuciferYang changed the title [WIP][SPARK-39354][SQL] Ensure show Table or view not found even if there are dataTypeMismatchError for Filter at the same time [WIP][SPARK-39354][SQL] Ensure show Table or view not found even if there are dataTypeMismatchError related to Filter at the same time Jun 2, 2022
@LuciferYang
Copy link
Contributor Author

Try to fix SPARK-39354, if acceptable, I will update the pr description

done

@LuciferYang LuciferYang changed the title [WIP][SPARK-39354][SQL] Ensure show Table or view not found even if there are dataTypeMismatchError related to Filter at the same time [SPARK-39354][SQL] Ensure show Table or view not found even if there are dataTypeMismatchError related to Filter at the same time Jun 2, 2022
@MaxGekk
Copy link
Member

MaxGekk commented Jun 2, 2022

+1, LGTM. Merging to master/3.3.
Thank you, @LuciferYang.

MaxGekk pushed a commit that referenced this pull request Jun 2, 2022
…e are `dataTypeMismatchError` related to `Filter` at the same time

### What changes were proposed in this pull request?
After SPARK-38118,  `dataTypeMismatchError` related to `Filter` will be checked and throw in `RemoveTempResolvedColumn`,  this will cause compatibility issue with exception message presentation.

For example, the following case:

```
spark.sql("create table t1(user_id int, auct_end_dt date) using parquet;")
spark.sql("select * from t1 join t2 on t1.user_id = t2.user_id where t1.auct_end_dt >= Date_sub('2020-12-27', 90)").show
```

The expected message is

```
Table or view not found: t2
```

But the actual message is
```
org.apache.spark.sql.AnalysisException: cannot resolve 'date_sub('2020-12-27', 90)' due to data type mismatch: argument 1 requires date type, however, ''2020-12-27'' is of string type.; line 1 pos 76
```

For forward compatibility, this pr change to only records `DATA_TYPE_MISMATCH_ERROR_MESSAGE` in the `RemoveTempResolvedColumn` check  process , and move `failAnalysis` to `CheckAnalysis#checkAnalysis`

### Why are the changes needed?
Fix analysis exception message compatibility.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass Github Actions and add a new test case

Closes #36746 from LuciferYang/SPARK-39354.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit 89fdb8a)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
@MaxGekk MaxGekk closed this in 89fdb8a Jun 2, 2022
@LuciferYang
Copy link
Contributor Author

thanks @MaxGekk @wangyum ~

cloud-fan added a commit that referenced this pull request Jun 16, 2022
### What changes were proposed in this pull request?

This is a followup of #35404 and #36746 , to simplify the error handling of `TempResolvedColumn`.

The idea is:
1. The rule `ResolveAggregationFunctions` in the main resolution batch creates `TempResolvedColumn` and only removes it if the aggregate expression is fully resolved. It either strips `TempResolvedColumn` if it's inside aggregate function or group expression, or restores `TempResolvedColumn` to `UnresolvedAttribute` otherwise, hoping other rules can resolve it.
2. The rule `RemoveTempResolvedColumn` in a latter batch can still hit `TempResolvedColumn` if the aggregate expression is unresolved (due to input type mismatch for example, e.g. `avg(bool_col)`, `date_add(int_group_col, 1)`). At this stage, there is no way to restore `TempResolvedColumn` to `UnresolvedAttribute`  and resolve it differently. The query will fail and we should blindly strip `TempResolvedColumn` to provide better error message.
### Why are the changes needed?

code cleanup

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes #36809 from cloud-fan/error.

Lead-authored-by: Wenchen Fan <wenchen@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants