-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-27160][SQL] Fix DecimalType when building orc filters #24092
Conversation
Please review! I do think this should be backported to 2.4.x. And I will add unit tests later. |
@@ -136,10 +137,7 @@ private[sql] object OrcFilters { | |||
case FloatType | DoubleType => | |||
value.asInstanceOf[Number].doubleValue() | |||
case _: DecimalType => | |||
val decimal = value.asInstanceOf[java.math.BigDecimal] | |||
val decimalWritable = new HiveDecimalWritable(decimal.longValue) | |||
decimalWritable.mutateEnforcePrecisionScale(decimal.precision, decimal.scale) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we remove this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will potentially result in an RuntimeException in the hashCode method of HiveDecimalWritabble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide more details? I'm not convinced that we can skip it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are similar conversions in the ORC project.
@dongjoon-hyun Why do we need mutateEnforcePrecisionScale ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan I have an ORC file to reproduce the RuntimeException with mutateEnforcePrecisionScale. But have not come up with a nice unit test yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sadhen and @cloud-fan .
Yes, Line 140 was the bug and mutateEnforcePrecisionScale
just amended the scale and precision for the HiveDecimalWriter(long)
case. We can remove this in the new code.
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala
Show resolved
Hide resolved
Test build #103505 has finished for PR 24092 at commit
|
Test build #103493 has finished for PR 24092 at commit
|
Thank you for pinging me, @sadhen . I'll take a look today! |
Retest this please. |
cc @dbtsai since he is a release manager. |
Can we add an end-to-end test to demonstrate the correctness bug? |
Test build #103517 has finished for PR 24092 at commit
|
+1 for @cloud-fan 's opinion. |
Do you mean generating an ORC file with DecimalType, and read it using the native reader with predicate push down? |
Yes, right, @sadhen . |
@cloud-fan @dongjoon-hyun Please review again. |
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala
Outdated
Show resolved
Hide resolved
Retest this please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM (Pending Jenkins)
Test build #103681 has started for PR 24092 at commit |
Retest this please. |
Test build #103696 has finished for PR 24092 at commit
|
Merged to master. |
Hi, @sadhen . There is a conflict at |
DecimalType Literal should not be casted to Long. eg. For `df.filter("x < 3.14")`, assuming df (x in DecimalType) reads from a ORC table and uses the native ORC reader with predicate push down enabled, we will push down the `x < 3.14` predicate to the ORC reader via a SearchArgument. OrcFilters will construct the SearchArgument, but not handle the DecimalType correctly. The previous impl will construct `x < 3` from `x < 3.14`. ``` $ sbt > sql/testOnly *OrcFilterSuite > sql/testOnly *OrcQuerySuite -- -z "27160" ``` Closes #24092 from sadhen/spark27160. Authored-by: Darcy Shen <sadhen@zoho.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
DecimalType Literal should not be casted to Long. eg. For `df.filter("x < 3.14")`, assuming df (x in DecimalType) reads from a ORC table and uses the native ORC reader with predicate push down enabled, we will push down the `x < 3.14` predicate to the ORC reader via a SearchArgument. OrcFilters will construct the SearchArgument, but not handle the DecimalType correctly. The previous impl will construct `x < 3` from `x < 3.14`. ``` $ sbt > sql/testOnly *OrcFilterSuite > sql/testOnly *OrcQuerySuite -- -z "27160" ``` Closes #24092 from sadhen/spark27160. Authored-by: Darcy Shen <sadhen@zoho.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit f3ba73a) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Also, I tested and landed this on |
DecimalType Literal should not be casted to Long. eg. For `df.filter("x < 3.14")`, assuming df (x in DecimalType) reads from a ORC table and uses the native ORC reader with predicate push down enabled, we will push down the `x < 3.14` predicate to the ORC reader via a SearchArgument. OrcFilters will construct the SearchArgument, but not handle the DecimalType correctly. The previous impl will construct `x < 3` from `x < 3.14`. ``` $ sbt > sql/testOnly *OrcFilterSuite > sql/testOnly *OrcQuerySuite -- -z "27160" ``` Closes apache#24092 from sadhen/spark27160. Authored-by: Darcy Shen <sadhen@zoho.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala
DecimalType Literal should not be casted to Long. eg. For `df.filter("x < 3.14")`, assuming df (x in DecimalType) reads from a ORC table and uses the native ORC reader with predicate push down enabled, we will push down the `x < 3.14` predicate to the ORC reader via a SearchArgument. OrcFilters will construct the SearchArgument, but not handle the DecimalType correctly. The previous impl will construct `x < 3` from `x < 3.14`. ``` $ sbt > sql/testOnly *OrcFilterSuite > sql/testOnly *OrcQuerySuite -- -z "27160" ``` Closes apache#24092 from sadhen/spark27160. Authored-by: Darcy Shen <sadhen@zoho.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala
DecimalType Literal should not be casted to Long. eg. For `df.filter("x < 3.14")`, assuming df (x in DecimalType) reads from a ORC table and uses the native ORC reader with predicate push down enabled, we will push down the `x < 3.14` predicate to the ORC reader via a SearchArgument. OrcFilters will construct the SearchArgument, but not handle the DecimalType correctly. The previous impl will construct `x < 3` from `x < 3.14`. ``` $ sbt > sql/testOnly *OrcFilterSuite > sql/testOnly *OrcQuerySuite -- -z "27160" ``` Closes apache#24092 from sadhen/spark27160. Authored-by: Darcy Shen <sadhen@zoho.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
DecimalType Literal should not be casted to Long. eg. For `df.filter("x < 3.14")`, assuming df (x in DecimalType) reads from a ORC table and uses the native ORC reader with predicate push down enabled, we will push down the `x < 3.14` predicate to the ORC reader via a SearchArgument. OrcFilters will construct the SearchArgument, but not handle the DecimalType correctly. The previous impl will construct `x < 3` from `x < 3.14`. ``` $ sbt > sql/testOnly *OrcFilterSuite > sql/testOnly *OrcQuerySuite -- -z "27160" ``` Closes apache#24092 from sadhen/spark27160. Authored-by: Darcy Shen <sadhen@zoho.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit f3ba73a) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
What changes were proposed in this pull request?
DecimalType Literal should not be casted to Long.
eg. For
df.filter("x < 3.14")
, assuming df (x in DecimalType) reads from a ORC table and uses the native ORC reader with predicate push down enabled, we will push down thex < 3.14
predicate to the ORC reader via a SearchArgument.OrcFilters will construct the SearchArgument, but not handle the DecimalType correctly.
The previous impl will construct
x < 3
fromx < 3.14
.How was this patch tested?