Skip to content

Comments

[SPARK-40212] [SQL] SparkSQL castPartValue does not properly handle byte, short, or float#37659

Closed
BrennanStein wants to merge 2 commits intoapache:masterfrom
BrennanStein:spark40212
Closed

[SPARK-40212] [SQL] SparkSQL castPartValue does not properly handle byte, short, or float#37659
BrennanStein wants to merge 2 commits intoapache:masterfrom
BrennanStein:spark40212

Conversation

@BrennanStein
Copy link

@BrennanStein BrennanStein commented Aug 25, 2022

What changes were proposed in this pull request?

The castPartValueToDesiredType function now returns byte for ByteType and short for ShortType, rather than ints; also floats for FloatType rather than double.

Why are the changes needed?

Previously, attempting to read back in a file partitioned on one of these column types would result in a ClassCastException at runtime (for Byte, java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Byte). I can't think this is anything but a bug, as returning the correct data type prevents the crash.

Does this PR introduce any user-facing change?

Yes: it changes the observed behavior when reading in a byte/short/float-partitioned file.

How was this patch tested?

Added unit test. Without the castPartValueToDesiredType updates, the test fails with the stated exception.

===
I'll note that I'm not familiar enough with the spark repo to know if this will have ripple effects elsewhere, but tests pass on my fork and since the very similar https://github.com/apache/spark/pull/36344/files only needed to touch these two files I expect this change is self-contained as well.

@github-actions github-actions bot added the SQL label Aug 25, 2022
@BrennanStein BrennanStein changed the title [SPARK-40212] [SQL] SparkSQL castPartValue does not properly handle byte & short [SPARK-40212] [SQL] SparkSQL castPartValue does not properly handle byte, short, or float Aug 27, 2022
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@wangyum
Copy link
Member

wangyum commented Aug 28, 2022

cc @cloud-fan @HyukjinKwon

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Aug 29, 2022

Merged to master and branch-3.3.

HyukjinKwon pushed a commit that referenced this pull request Aug 29, 2022
…te, short, or float

The `castPartValueToDesiredType` function now returns byte for ByteType and short for ShortType, rather than ints; also floats for FloatType rather than double.

Previously, attempting to read back in a file partitioned on one of these column types would result in a ClassCastException at runtime (for Byte, `java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Byte`). I can't think this is anything but a bug, as returning the correct data type prevents the crash.

Yes: it changes the observed behavior when reading in a byte/short/float-partitioned file.

Added unit test. Without the `castPartValueToDesiredType` updates, the test fails with the stated exception.

===
I'll note that I'm not familiar enough with the spark repo to know if this will have ripple effects elsewhere, but tests pass on my fork and since the very similar https://github.com/apache/spark/pull/36344/files only needed to touch these two files I expect this change is self-contained as well.

Closes #37659 from BrennanStein/spark40212.

Authored-by: Brennan Stein <brennan.stein@ekata.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 146f187)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! late LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants