Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-40371][SQL] Migrate type check failures of NthValue and NTile onto error classes #38457

Closed
wants to merge 3 commits into from

Conversation

LuciferYang
Copy link
Contributor

What changes were proposed in this pull request?

This pr aims to replace TypeCheckFailure by DataTypeMismatch in type checks in window expressions, includes NthValue and NTile

Why are the changes needed?

Migration onto error classes unifies Spark SQL error messages.

Does this PR introduce any user-facing change?

Yes. The PR changes user-facing error messages.

How was this patch tested?

Pass GitHub Actions

@github-actions github-actions bot added the SQL label Nov 1, 2022
@MaxGekk
Copy link
Member

MaxGekk commented Nov 1, 2022

+1, LGTM. Merging to master.
Thank you, @LuciferYang.

@MaxGekk MaxGekk closed this in 2f6e12d Nov 1, 2022
@LuciferYang
Copy link
Contributor Author

Thanks @MaxGekk

SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
…onto error classes

### What changes were proposed in this pull request?
This pr aims to replace TypeCheckFailure by DataTypeMismatch in type checks in window expressions, includes `NthValue` and `NTile`

### Why are the changes needed?
Migration onto error classes unifies Spark SQL error messages.

### Does this PR introduce _any_ user-facing change?
Yes. The PR changes user-facing error messages.

### How was this patch tested?
Pass GitHub Actions

Closes apache#38457 from LuciferYang/SPARK-40371.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
dongjoon-hyun pushed a commit that referenced this pull request May 2, 2024
…hen argument is non-foldable or of wrong type

### What changes were proposed in this pull request?

While migrating the `NTile` expression's type check failures to the new error class framework, PR #38457 removed a pair of not-unnecessary `return` statements and thus caused certain branches' values to be discarded rather than returned.

As a result, invalid usages like

```
select ntile(99.9) OVER (order by id) from range(10)
```

trigger internal errors like errors like

```
 java.lang.ClassCastException: class org.apache.spark.sql.types.Decimal cannot be cast to class java.lang.Integer (org.apache.spark.sql.types.Decimal is in unnamed module of loader 'app'; java.lang.Integer is in module java.base of loader 'bootstrap')
  at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:99)
  at org.apache.spark.sql.catalyst.expressions.NTile.checkInputDataTypes(windowExpressions.scala:877)
```

instead of clear error framework errors like

```
org.apache.spark.sql.catalyst.ExtendedAnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "ntile(99.9)" due to data type mismatch: The first parameter requires the "INT" type, however "99.9" has the type "DECIMAL(3,1)". SQLSTATE: 42K09; line 1 pos 7;
'Project [unresolvedalias(ntile(99.9) windowspecdefinition(id#0L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())))]
+- Range (0, 10, step=1, splits=None)

  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$7(CheckAnalysis.scala:315)
```

### Why are the changes needed?

Improve error messages.

### Does this PR introduce _any_ user-facing change?

Yes, it improves an error message.

### How was this patch tested?

Added a new test case to AnalysisErrorSuite.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46333 from JoshRosen/SPARK-48081.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun pushed a commit that referenced this pull request May 2, 2024
…hen argument is non-foldable or of wrong type

### What changes were proposed in this pull request?

While migrating the `NTile` expression's type check failures to the new error class framework, PR #38457 removed a pair of not-unnecessary `return` statements and thus caused certain branches' values to be discarded rather than returned.

As a result, invalid usages like

```
select ntile(99.9) OVER (order by id) from range(10)
```

trigger internal errors like errors like

```
 java.lang.ClassCastException: class org.apache.spark.sql.types.Decimal cannot be cast to class java.lang.Integer (org.apache.spark.sql.types.Decimal is in unnamed module of loader 'app'; java.lang.Integer is in module java.base of loader 'bootstrap')
  at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:99)
  at org.apache.spark.sql.catalyst.expressions.NTile.checkInputDataTypes(windowExpressions.scala:877)
```

instead of clear error framework errors like

```
org.apache.spark.sql.catalyst.ExtendedAnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "ntile(99.9)" due to data type mismatch: The first parameter requires the "INT" type, however "99.9" has the type "DECIMAL(3,1)". SQLSTATE: 42K09; line 1 pos 7;
'Project [unresolvedalias(ntile(99.9) windowspecdefinition(id#0L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())))]
+- Range (0, 10, step=1, splits=None)

  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$7(CheckAnalysis.scala:315)
```

### Why are the changes needed?

Improve error messages.

### Does this PR introduce _any_ user-facing change?

Yes, it improves an error message.

### How was this patch tested?

Added a new test case to AnalysisErrorSuite.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46333 from JoshRosen/SPARK-48081.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit b99a64b)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun pushed a commit that referenced this pull request May 2, 2024
…hen argument is non-foldable or of wrong type

### What changes were proposed in this pull request?

While migrating the `NTile` expression's type check failures to the new error class framework, PR #38457 removed a pair of not-unnecessary `return` statements and thus caused certain branches' values to be discarded rather than returned.

As a result, invalid usages like

```
select ntile(99.9) OVER (order by id) from range(10)
```

trigger internal errors like errors like

```
 java.lang.ClassCastException: class org.apache.spark.sql.types.Decimal cannot be cast to class java.lang.Integer (org.apache.spark.sql.types.Decimal is in unnamed module of loader 'app'; java.lang.Integer is in module java.base of loader 'bootstrap')
  at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:99)
  at org.apache.spark.sql.catalyst.expressions.NTile.checkInputDataTypes(windowExpressions.scala:877)
```

instead of clear error framework errors like

```
org.apache.spark.sql.catalyst.ExtendedAnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "ntile(99.9)" due to data type mismatch: The first parameter requires the "INT" type, however "99.9" has the type "DECIMAL(3,1)". SQLSTATE: 42K09; line 1 pos 7;
'Project [unresolvedalias(ntile(99.9) windowspecdefinition(id#0L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())))]
+- Range (0, 10, step=1, splits=None)

  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$7(CheckAnalysis.scala:315)
```

### Why are the changes needed?

Improve error messages.

### Does this PR introduce _any_ user-facing change?

Yes, it improves an error message.

### How was this patch tested?

Added a new test case to AnalysisErrorSuite.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46333 from JoshRosen/SPARK-48081.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit b99a64b)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
JoshRosen added a commit to JoshRosen/spark that referenced this pull request May 2, 2024
…hen argument is non-foldable or of wrong type

### What changes were proposed in this pull request?

While migrating the `NTile` expression's type check failures to the new error class framework, PR apache#38457 removed a pair of not-unnecessary `return` statements and thus caused certain branches' values to be discarded rather than returned.

As a result, invalid usages like

```
select ntile(99.9) OVER (order by id) from range(10)
```

trigger internal errors like errors like

```
 java.lang.ClassCastException: class org.apache.spark.sql.types.Decimal cannot be cast to class java.lang.Integer (org.apache.spark.sql.types.Decimal is in unnamed module of loader 'app'; java.lang.Integer is in module java.base of loader 'bootstrap')
  at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:99)
  at org.apache.spark.sql.catalyst.expressions.NTile.checkInputDataTypes(windowExpressions.scala:877)
```

instead of clear error framework errors like

```
org.apache.spark.sql.catalyst.ExtendedAnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "ntile(99.9)" due to data type mismatch: The first parameter requires the "INT" type, however "99.9" has the type "DECIMAL(3,1)". SQLSTATE: 42K09; line 1 pos 7;
'Project [unresolvedalias(ntile(99.9) windowspecdefinition(id#0L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())))]
+- Range (0, 10, step=1, splits=None)

  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$7(CheckAnalysis.scala:315)
```

### Why are the changes needed?

Improve error messages.

### Does this PR introduce _any_ user-facing change?

Yes, it improves an error message.

### How was this patch tested?

Added a new test case to AnalysisErrorSuite.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46333 from JoshRosen/SPARK-48081.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
JoshRosen added a commit to JoshRosen/spark that referenced this pull request May 2, 2024
…hen argument is non-foldable or of wrong type

### What changes were proposed in this pull request?

While migrating the `NTile` expression's type check failures to the new error class framework, PR apache#38457 removed a pair of not-unnecessary `return` statements and thus caused certain branches' values to be discarded rather than returned.

As a result, invalid usages like

```
select ntile(99.9) OVER (order by id) from range(10)
```

trigger internal errors like errors like

```
 java.lang.ClassCastException: class org.apache.spark.sql.types.Decimal cannot be cast to class java.lang.Integer (org.apache.spark.sql.types.Decimal is in unnamed module of loader 'app'; java.lang.Integer is in module java.base of loader 'bootstrap')
  at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:99)
  at org.apache.spark.sql.catalyst.expressions.NTile.checkInputDataTypes(windowExpressions.scala:877)
```

instead of clear error framework errors like

```
org.apache.spark.sql.catalyst.ExtendedAnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "ntile(99.9)" due to data type mismatch: The first parameter requires the "INT" type, however "99.9" has the type "DECIMAL(3,1)". SQLSTATE: 42K09; line 1 pos 7;
'Project [unresolvedalias(ntile(99.9) windowspecdefinition(id#0L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())))]
+- Range (0, 10, step=1, splits=None)

  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$7(CheckAnalysis.scala:315)
```

### Why are the changes needed?

Improve error messages.

### Does this PR introduce _any_ user-facing change?

Yes, it improves an error message.

### How was this patch tested?

Added a new test case to AnalysisErrorSuite.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46333 from JoshRosen/SPARK-48081.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun pushed a commit that referenced this pull request May 2, 2024
…aTypes() when argument is non-foldable or of wrong type

branch-3.5 pick of PR #46333 , fixing test issue due to difference in expected error message parameter formatting across branches; original description follows below:

---

### What changes were proposed in this pull request?

While migrating the `NTile` expression's type check failures to the new error class framework, PR #38457 removed a pair of not-unnecessary `return` statements and thus caused certain branches' values to be discarded rather than returned.

As a result, invalid usages like

```
select ntile(99.9) OVER (order by id) from range(10)
```

trigger internal errors like errors like

```
 java.lang.ClassCastException: class org.apache.spark.sql.types.Decimal cannot be cast to class java.lang.Integer (org.apache.spark.sql.types.Decimal is in unnamed module of loader 'app'; java.lang.Integer is in module java.base of loader 'bootstrap')
  at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:99)
  at org.apache.spark.sql.catalyst.expressions.NTile.checkInputDataTypes(windowExpressions.scala:877)
```

instead of clear error framework errors like

```
org.apache.spark.sql.catalyst.ExtendedAnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "ntile(99.9)" due to data type mismatch: The first parameter requires the "INT" type, however "99.9" has the type "DECIMAL(3,1)". SQLSTATE: 42K09; line 1 pos 7;
'Project [unresolvedalias(ntile(99.9) windowspecdefinition(id#0L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())))]
+- Range (0, 10, step=1, splits=None)

  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$7(CheckAnalysis.scala:315)
```

### Why are the changes needed?

Improve error messages.

### Does this PR introduce _any_ user-facing change?

Yes, it improves an error message.

### How was this patch tested?

Added a new test case to AnalysisErrorSuite.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46336 from JoshRosen/SPARK-48081-branch-3.5.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun pushed a commit that referenced this pull request May 2, 2024
…aTypes() when argument is non-foldable or of wrong type

branch-3.4 pick of PR #46333 , fixing test issue due to difference in expected error message parameter formatting across branches; original description follows below:

---

### What changes were proposed in this pull request?

While migrating the `NTile` expression's type check failures to the new error class framework, PR #38457 removed a pair of not-unnecessary `return` statements and thus caused certain branches' values to be discarded rather than returned.

As a result, invalid usages like

```
select ntile(99.9) OVER (order by id) from range(10)
```

trigger internal errors like errors like

```
 java.lang.ClassCastException: class org.apache.spark.sql.types.Decimal cannot be cast to class java.lang.Integer (org.apache.spark.sql.types.Decimal is in unnamed module of loader 'app'; java.lang.Integer is in module java.base of loader 'bootstrap')
  at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:99)
  at org.apache.spark.sql.catalyst.expressions.NTile.checkInputDataTypes(windowExpressions.scala:877)
```

instead of clear error framework errors like

```
org.apache.spark.sql.catalyst.ExtendedAnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "ntile(99.9)" due to data type mismatch: The first parameter requires the "INT" type, however "99.9" has the type "DECIMAL(3,1)". SQLSTATE: 42K09; line 1 pos 7;
'Project [unresolvedalias(ntile(99.9) windowspecdefinition(id#0L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())))]
+- Range (0, 10, step=1, splits=None)

  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$7(CheckAnalysis.scala:315)
```

### Why are the changes needed?

Improve error messages.

### Does this PR introduce _any_ user-facing change?

Yes, it improves an error message.

### How was this patch tested?

Added a new test case to AnalysisErrorSuite.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46337 from JoshRosen/SPARK-48081-branch-3.4.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
…hen argument is non-foldable or of wrong type

### What changes were proposed in this pull request?

While migrating the `NTile` expression's type check failures to the new error class framework, PR apache#38457 removed a pair of not-unnecessary `return` statements and thus caused certain branches' values to be discarded rather than returned.

As a result, invalid usages like

```
select ntile(99.9) OVER (order by id) from range(10)
```

trigger internal errors like errors like

```
 java.lang.ClassCastException: class org.apache.spark.sql.types.Decimal cannot be cast to class java.lang.Integer (org.apache.spark.sql.types.Decimal is in unnamed module of loader 'app'; java.lang.Integer is in module java.base of loader 'bootstrap')
  at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:99)
  at org.apache.spark.sql.catalyst.expressions.NTile.checkInputDataTypes(windowExpressions.scala:877)
```

instead of clear error framework errors like

```
org.apache.spark.sql.catalyst.ExtendedAnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "ntile(99.9)" due to data type mismatch: The first parameter requires the "INT" type, however "99.9" has the type "DECIMAL(3,1)". SQLSTATE: 42K09; line 1 pos 7;
'Project [unresolvedalias(ntile(99.9) windowspecdefinition(id#0L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())))]
+- Range (0, 10, step=1, splits=None)

  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$7(CheckAnalysis.scala:315)
```

### Why are the changes needed?

Improve error messages.

### Does this PR introduce _any_ user-facing change?

Yes, it improves an error message.

### How was this patch tested?

Added a new test case to AnalysisErrorSuite.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46333 from JoshRosen/SPARK-48081.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants