Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48081] Fix ClassCastException in NTile.checkInputDataTypes() when argument is non-foldable or of wrong type #46333

Closed
wants to merge 4 commits into from

Conversation

JoshRosen
Copy link
Contributor

@JoshRosen JoshRosen commented May 2, 2024

What changes were proposed in this pull request?

While migrating the NTile expression's type check failures to the new error class framework, PR #38457 removed a pair of not-unnecessary return statements and thus caused certain branches' values to be discarded rather than returned.

As a result, invalid usages like

select ntile(99.9) OVER (order by id) from range(10)

trigger internal errors like errors like

 java.lang.ClassCastException: class org.apache.spark.sql.types.Decimal cannot be cast to class java.lang.Integer (org.apache.spark.sql.types.Decimal is in unnamed module of loader 'app'; java.lang.Integer is in module java.base of loader 'bootstrap')
  at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:99)
  at org.apache.spark.sql.catalyst.expressions.NTile.checkInputDataTypes(windowExpressions.scala:877)

instead of clear error framework errors like

org.apache.spark.sql.catalyst.ExtendedAnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "ntile(99.9)" due to data type mismatch: The first parameter requires the "INT" type, however "99.9" has the type "DECIMAL(3,1)". SQLSTATE: 42K09; line 1 pos 7;
'Project [unresolvedalias(ntile(99.9) windowspecdefinition(id#0L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())))]
+- Range (0, 10, step=1, splits=None)

  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$7(CheckAnalysis.scala:315)

Why are the changes needed?

Improve error messages.

Does this PR introduce any user-facing change?

Yes, it improves an error message.

How was this patch tested?

Added a new test case to AnalysisErrorSuite.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label May 2, 2024
@JoshRosen JoshRosen changed the title [SPARK-48081] Fix ClassCastException in NTile.checkInputDataTypes() when data type is mismatched [SPARK-48081] Fix ClassCastException in NTile.checkInputDataTypes() when input data type is mismatched or non-foldable May 2, 2024
@JoshRosen JoshRosen changed the title [SPARK-48081] Fix ClassCastException in NTile.checkInputDataTypes() when input data type is mismatched or non-foldable [SPARK-48081] Fix ClassCastException in NTile.checkInputDataTypes() when argument is non-foldable or of wrong type May 2, 2024
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

dongjoon-hyun pushed a commit that referenced this pull request May 2, 2024
…hen argument is non-foldable or of wrong type

### What changes were proposed in this pull request?

While migrating the `NTile` expression's type check failures to the new error class framework, PR #38457 removed a pair of not-unnecessary `return` statements and thus caused certain branches' values to be discarded rather than returned.

As a result, invalid usages like

```
select ntile(99.9) OVER (order by id) from range(10)
```

trigger internal errors like errors like

```
 java.lang.ClassCastException: class org.apache.spark.sql.types.Decimal cannot be cast to class java.lang.Integer (org.apache.spark.sql.types.Decimal is in unnamed module of loader 'app'; java.lang.Integer is in module java.base of loader 'bootstrap')
  at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:99)
  at org.apache.spark.sql.catalyst.expressions.NTile.checkInputDataTypes(windowExpressions.scala:877)
```

instead of clear error framework errors like

```
org.apache.spark.sql.catalyst.ExtendedAnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "ntile(99.9)" due to data type mismatch: The first parameter requires the "INT" type, however "99.9" has the type "DECIMAL(3,1)". SQLSTATE: 42K09; line 1 pos 7;
'Project [unresolvedalias(ntile(99.9) windowspecdefinition(id#0L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())))]
+- Range (0, 10, step=1, splits=None)

  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$7(CheckAnalysis.scala:315)
```

### Why are the changes needed?

Improve error messages.

### Does this PR introduce _any_ user-facing change?

Yes, it improves an error message.

### How was this patch tested?

Added a new test case to AnalysisErrorSuite.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46333 from JoshRosen/SPARK-48081.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit b99a64b)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun pushed a commit that referenced this pull request May 2, 2024
…hen argument is non-foldable or of wrong type

### What changes were proposed in this pull request?

While migrating the `NTile` expression's type check failures to the new error class framework, PR #38457 removed a pair of not-unnecessary `return` statements and thus caused certain branches' values to be discarded rather than returned.

As a result, invalid usages like

```
select ntile(99.9) OVER (order by id) from range(10)
```

trigger internal errors like errors like

```
 java.lang.ClassCastException: class org.apache.spark.sql.types.Decimal cannot be cast to class java.lang.Integer (org.apache.spark.sql.types.Decimal is in unnamed module of loader 'app'; java.lang.Integer is in module java.base of loader 'bootstrap')
  at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:99)
  at org.apache.spark.sql.catalyst.expressions.NTile.checkInputDataTypes(windowExpressions.scala:877)
```

instead of clear error framework errors like

```
org.apache.spark.sql.catalyst.ExtendedAnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "ntile(99.9)" due to data type mismatch: The first parameter requires the "INT" type, however "99.9" has the type "DECIMAL(3,1)". SQLSTATE: 42K09; line 1 pos 7;
'Project [unresolvedalias(ntile(99.9) windowspecdefinition(id#0L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())))]
+- Range (0, 10, step=1, splits=None)

  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$7(CheckAnalysis.scala:315)
```

### Why are the changes needed?

Improve error messages.

### Does this PR introduce _any_ user-facing change?

Yes, it improves an error message.

### How was this patch tested?

Added a new test case to AnalysisErrorSuite.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46333 from JoshRosen/SPARK-48081.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit b99a64b)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented May 2, 2024

Merged to master/3.5/3.4 for Apache Spark 4.0.0-preview and 3.5.2 and 3.4.4.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented May 2, 2024

Oh, there was a test failure in 3.5/3.4. I reverted them from branch-3.5/3.4.

Could you make a backporting PR to branch-3.5 and branch-3.4, @JoshRosen ?

JoshRosen added a commit to JoshRosen/spark that referenced this pull request May 2, 2024
…hen argument is non-foldable or of wrong type

### What changes were proposed in this pull request?

While migrating the `NTile` expression's type check failures to the new error class framework, PR apache#38457 removed a pair of not-unnecessary `return` statements and thus caused certain branches' values to be discarded rather than returned.

As a result, invalid usages like

```
select ntile(99.9) OVER (order by id) from range(10)
```

trigger internal errors like errors like

```
 java.lang.ClassCastException: class org.apache.spark.sql.types.Decimal cannot be cast to class java.lang.Integer (org.apache.spark.sql.types.Decimal is in unnamed module of loader 'app'; java.lang.Integer is in module java.base of loader 'bootstrap')
  at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:99)
  at org.apache.spark.sql.catalyst.expressions.NTile.checkInputDataTypes(windowExpressions.scala:877)
```

instead of clear error framework errors like

```
org.apache.spark.sql.catalyst.ExtendedAnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "ntile(99.9)" due to data type mismatch: The first parameter requires the "INT" type, however "99.9" has the type "DECIMAL(3,1)". SQLSTATE: 42K09; line 1 pos 7;
'Project [unresolvedalias(ntile(99.9) windowspecdefinition(id#0L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())))]
+- Range (0, 10, step=1, splits=None)

  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$7(CheckAnalysis.scala:315)
```

### Why are the changes needed?

Improve error messages.

### Does this PR introduce _any_ user-facing change?

Yes, it improves an error message.

### How was this patch tested?

Added a new test case to AnalysisErrorSuite.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46333 from JoshRosen/SPARK-48081.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
JoshRosen added a commit to JoshRosen/spark that referenced this pull request May 2, 2024
…hen argument is non-foldable or of wrong type

### What changes were proposed in this pull request?

While migrating the `NTile` expression's type check failures to the new error class framework, PR apache#38457 removed a pair of not-unnecessary `return` statements and thus caused certain branches' values to be discarded rather than returned.

As a result, invalid usages like

```
select ntile(99.9) OVER (order by id) from range(10)
```

trigger internal errors like errors like

```
 java.lang.ClassCastException: class org.apache.spark.sql.types.Decimal cannot be cast to class java.lang.Integer (org.apache.spark.sql.types.Decimal is in unnamed module of loader 'app'; java.lang.Integer is in module java.base of loader 'bootstrap')
  at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:99)
  at org.apache.spark.sql.catalyst.expressions.NTile.checkInputDataTypes(windowExpressions.scala:877)
```

instead of clear error framework errors like

```
org.apache.spark.sql.catalyst.ExtendedAnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "ntile(99.9)" due to data type mismatch: The first parameter requires the "INT" type, however "99.9" has the type "DECIMAL(3,1)". SQLSTATE: 42K09; line 1 pos 7;
'Project [unresolvedalias(ntile(99.9) windowspecdefinition(id#0L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())))]
+- Range (0, 10, step=1, splits=None)

  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$7(CheckAnalysis.scala:315)
```

### Why are the changes needed?

Improve error messages.

### Does this PR introduce _any_ user-facing change?

Yes, it improves an error message.

### How was this patch tested?

Added a new test case to AnalysisErrorSuite.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46333 from JoshRosen/SPARK-48081.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@JoshRosen JoshRosen deleted the SPARK-48081 branch May 2, 2024 16:40
dongjoon-hyun pushed a commit that referenced this pull request May 2, 2024
…aTypes() when argument is non-foldable or of wrong type

branch-3.5 pick of PR #46333 , fixing test issue due to difference in expected error message parameter formatting across branches; original description follows below:

---

### What changes were proposed in this pull request?

While migrating the `NTile` expression's type check failures to the new error class framework, PR #38457 removed a pair of not-unnecessary `return` statements and thus caused certain branches' values to be discarded rather than returned.

As a result, invalid usages like

```
select ntile(99.9) OVER (order by id) from range(10)
```

trigger internal errors like errors like

```
 java.lang.ClassCastException: class org.apache.spark.sql.types.Decimal cannot be cast to class java.lang.Integer (org.apache.spark.sql.types.Decimal is in unnamed module of loader 'app'; java.lang.Integer is in module java.base of loader 'bootstrap')
  at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:99)
  at org.apache.spark.sql.catalyst.expressions.NTile.checkInputDataTypes(windowExpressions.scala:877)
```

instead of clear error framework errors like

```
org.apache.spark.sql.catalyst.ExtendedAnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "ntile(99.9)" due to data type mismatch: The first parameter requires the "INT" type, however "99.9" has the type "DECIMAL(3,1)". SQLSTATE: 42K09; line 1 pos 7;
'Project [unresolvedalias(ntile(99.9) windowspecdefinition(id#0L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())))]
+- Range (0, 10, step=1, splits=None)

  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$7(CheckAnalysis.scala:315)
```

### Why are the changes needed?

Improve error messages.

### Does this PR introduce _any_ user-facing change?

Yes, it improves an error message.

### How was this patch tested?

Added a new test case to AnalysisErrorSuite.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46336 from JoshRosen/SPARK-48081-branch-3.5.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun pushed a commit that referenced this pull request May 2, 2024
…aTypes() when argument is non-foldable or of wrong type

branch-3.4 pick of PR #46333 , fixing test issue due to difference in expected error message parameter formatting across branches; original description follows below:

---

### What changes were proposed in this pull request?

While migrating the `NTile` expression's type check failures to the new error class framework, PR #38457 removed a pair of not-unnecessary `return` statements and thus caused certain branches' values to be discarded rather than returned.

As a result, invalid usages like

```
select ntile(99.9) OVER (order by id) from range(10)
```

trigger internal errors like errors like

```
 java.lang.ClassCastException: class org.apache.spark.sql.types.Decimal cannot be cast to class java.lang.Integer (org.apache.spark.sql.types.Decimal is in unnamed module of loader 'app'; java.lang.Integer is in module java.base of loader 'bootstrap')
  at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:99)
  at org.apache.spark.sql.catalyst.expressions.NTile.checkInputDataTypes(windowExpressions.scala:877)
```

instead of clear error framework errors like

```
org.apache.spark.sql.catalyst.ExtendedAnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "ntile(99.9)" due to data type mismatch: The first parameter requires the "INT" type, however "99.9" has the type "DECIMAL(3,1)". SQLSTATE: 42K09; line 1 pos 7;
'Project [unresolvedalias(ntile(99.9) windowspecdefinition(id#0L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())))]
+- Range (0, 10, step=1, splits=None)

  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$7(CheckAnalysis.scala:315)
```

### Why are the changes needed?

Improve error messages.

### Does this PR introduce _any_ user-facing change?

Yes, it improves an error message.

### How was this patch tested?

Added a new test case to AnalysisErrorSuite.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46337 from JoshRosen/SPARK-48081-branch-3.4.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
…hen argument is non-foldable or of wrong type

### What changes were proposed in this pull request?

While migrating the `NTile` expression's type check failures to the new error class framework, PR apache#38457 removed a pair of not-unnecessary `return` statements and thus caused certain branches' values to be discarded rather than returned.

As a result, invalid usages like

```
select ntile(99.9) OVER (order by id) from range(10)
```

trigger internal errors like errors like

```
 java.lang.ClassCastException: class org.apache.spark.sql.types.Decimal cannot be cast to class java.lang.Integer (org.apache.spark.sql.types.Decimal is in unnamed module of loader 'app'; java.lang.Integer is in module java.base of loader 'bootstrap')
  at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:99)
  at org.apache.spark.sql.catalyst.expressions.NTile.checkInputDataTypes(windowExpressions.scala:877)
```

instead of clear error framework errors like

```
org.apache.spark.sql.catalyst.ExtendedAnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "ntile(99.9)" due to data type mismatch: The first parameter requires the "INT" type, however "99.9" has the type "DECIMAL(3,1)". SQLSTATE: 42K09; line 1 pos 7;
'Project [unresolvedalias(ntile(99.9) windowspecdefinition(id#0L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())))]
+- Range (0, 10, step=1, splits=None)

  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$7(CheckAnalysis.scala:315)
```

### Why are the changes needed?

Improve error messages.

### Does this PR introduce _any_ user-facing change?

Yes, it improves an error message.

### How was this patch tested?

Added a new test case to AnalysisErrorSuite.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46333 from JoshRosen/SPARK-48081.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants