Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-27160][SQL] Fix DecimalType when building orc filters #24092

Closed
wants to merge 6 commits into from

Conversation

da-liii
Copy link
Contributor

@da-liii da-liii commented Mar 14, 2019

What changes were proposed in this pull request?

DecimalType Literal should not be casted to Long.

eg. For df.filter("x < 3.14"), assuming df (x in DecimalType) reads from a ORC table and uses the native ORC reader with predicate push down enabled, we will push down the x < 3.14 predicate to the ORC reader via a SearchArgument.

OrcFilters will construct the SearchArgument, but not handle the DecimalType correctly.

The previous impl will construct x < 3 from x < 3.14.

How was this patch tested?

$ sbt
> sql/testOnly *OrcFilterSuite
> sql/testOnly *OrcQuerySuite -- -z "27160"

@da-liii
Copy link
Contributor Author

da-liii commented Mar 14, 2019

@dongjoon-hyun

Please review! I do think this should be backported to 2.4.x. And I will add unit tests later.

@@ -136,10 +137,7 @@ private[sql] object OrcFilters {
case FloatType | DoubleType =>
value.asInstanceOf[Number].doubleValue()
case _: DecimalType =>
val decimal = value.asInstanceOf[java.math.BigDecimal]
val decimalWritable = new HiveDecimalWritable(decimal.longValue)
decimalWritable.mutateEnforcePrecisionScale(decimal.precision, decimal.scale)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we remove this line?

Copy link
Contributor Author

@da-liii da-liii Mar 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will potentially result in an RuntimeException in the hashCode method of HiveDecimalWritabble.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide more details? I'm not convinced that we can skip it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan I have an ORC file to reproduce the RuntimeException with mutateEnforcePrecisionScale. But have not come up with a nice unit test yet.

Copy link
Member

@dongjoon-hyun dongjoon-hyun Mar 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sadhen and @cloud-fan .
Yes, Line 140 was the bug and mutateEnforcePrecisionScale just amended the scale and precision for the HiveDecimalWriter(long) case. We can remove this in the new code.

@da-liii da-liii changed the title [SPARK-27160][SQL] Fix DecimalType literal casting [WIP][SPARK-27160][SQL] Fix DecimalType literal casting Mar 14, 2019
@da-liii da-liii changed the title [WIP][SPARK-27160][SQL] Fix DecimalType literal casting [SPARK-27160][SQL] Fix DecimalType literal casting Mar 14, 2019
@SparkQA
Copy link

SparkQA commented Mar 14, 2019

Test build #103505 has finished for PR 24092 at commit 591c3f4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 14, 2019

Test build #103493 has finished for PR 24092 at commit 4be6d82.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Thank you for pinging me, @sadhen . I'll take a look today!

@dongjoon-hyun
Copy link
Member

Retest this please.

@dongjoon-hyun
Copy link
Member

cc @dbtsai since he is a release manager.

@cloud-fan
Copy link
Contributor

Can we add an end-to-end test to demonstrate the correctness bug?

@SparkQA
Copy link

SparkQA commented Mar 15, 2019

Test build #103517 has finished for PR 24092 at commit 591c3f4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

+1 for @cloud-fan 's opinion.
@sadhen , could you add another test case for that?

@da-liii
Copy link
Contributor Author

da-liii commented Mar 15, 2019

@dongjoon-hyun @cloud-fan

Do you mean generating an ORC file with DecimalType, and read it using the native reader with predicate push down?

@dongjoon-hyun
Copy link
Member

Yes, right, @sadhen .

@da-liii
Copy link
Contributor Author

da-liii commented Mar 19, 2019

@cloud-fan @dongjoon-hyun Please review again.

@cloud-fan cloud-fan changed the title [SPARK-27160][SQL] Fix DecimalType literal casting [SPARK-27160][SQL] Fix DecimalType when building orc filters Mar 19, 2019
@dongjoon-hyun
Copy link
Member

Retest this please.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (Pending Jenkins)

@SparkQA
Copy link

SparkQA commented Mar 19, 2019

Test build #103681 has started for PR 24092 at commit 08e158a.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Mar 20, 2019

Test build #103696 has finished for PR 24092 at commit 08e158a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Merged to master.

@dongjoon-hyun
Copy link
Member

Hi, @sadhen . There is a conflict at branch-2.4. Could you make a PR against branch-2.4 please?

dongjoon-hyun pushed a commit that referenced this pull request Mar 23, 2019
DecimalType Literal should not be casted to Long.

eg. For `df.filter("x < 3.14")`, assuming df (x in DecimalType) reads from a ORC table and uses the native ORC reader with predicate push down enabled, we will push down the `x < 3.14` predicate to the ORC reader via a SearchArgument.

OrcFilters will construct the SearchArgument, but not handle the DecimalType correctly.

The previous impl will construct `x < 3` from `x < 3.14`.

```
$ sbt
> sql/testOnly *OrcFilterSuite
> sql/testOnly *OrcQuerySuite -- -z "27160"
```

Closes #24092 from sadhen/spark27160.

Authored-by: Darcy Shen <sadhen@zoho.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Mar 23, 2019

Since @sadhen looks busy and RC8 fails, I resolved the conflicts and back port this to branch-2.4. I built and tested locally.

cc @dbtsai since he is a release manager for 2.4.1.

dongjoon-hyun pushed a commit that referenced this pull request Mar 23, 2019
DecimalType Literal should not be casted to Long.

eg. For `df.filter("x < 3.14")`, assuming df (x in DecimalType) reads from a ORC table and uses the native ORC reader with predicate push down enabled, we will push down the `x < 3.14` predicate to the ORC reader via a SearchArgument.

OrcFilters will construct the SearchArgument, but not handle the DecimalType correctly.

The previous impl will construct `x < 3` from `x < 3.14`.

```
$ sbt
> sql/testOnly *OrcFilterSuite
> sql/testOnly *OrcQuerySuite -- -z "27160"
```

Closes #24092 from sadhen/spark27160.

Authored-by: Darcy Shen <sadhen@zoho.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit f3ba73a)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@dongjoon-hyun
Copy link
Member

Also, I tested and landed this on branch-2.3 for 2.3.4.

kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 23, 2019
DecimalType Literal should not be casted to Long.

eg. For `df.filter("x < 3.14")`, assuming df (x in DecimalType) reads from a ORC table and uses the native ORC reader with predicate push down enabled, we will push down the `x < 3.14` predicate to the ORC reader via a SearchArgument.

OrcFilters will construct the SearchArgument, but not handle the DecimalType correctly.

The previous impl will construct `x < 3` from `x < 3.14`.

```
$ sbt
> sql/testOnly *OrcFilterSuite
> sql/testOnly *OrcQuerySuite -- -z "27160"
```

Closes apache#24092 from sadhen/spark27160.

Authored-by: Darcy Shen <sadhen@zoho.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 25, 2019
DecimalType Literal should not be casted to Long.

eg. For `df.filter("x < 3.14")`, assuming df (x in DecimalType) reads from a ORC table and uses the native ORC reader with predicate push down enabled, we will push down the `x < 3.14` predicate to the ORC reader via a SearchArgument.

OrcFilters will construct the SearchArgument, but not handle the DecimalType correctly.

The previous impl will construct `x < 3` from `x < 3.14`.

```
$ sbt
> sql/testOnly *OrcFilterSuite
> sql/testOnly *OrcQuerySuite -- -z "27160"
```

Closes apache#24092 from sadhen/spark27160.

Authored-by: Darcy Shen <sadhen@zoho.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Aug 1, 2019
DecimalType Literal should not be casted to Long.

eg. For `df.filter("x < 3.14")`, assuming df (x in DecimalType) reads from a ORC table and uses the native ORC reader with predicate push down enabled, we will push down the `x < 3.14` predicate to the ORC reader via a SearchArgument.

OrcFilters will construct the SearchArgument, but not handle the DecimalType correctly.

The previous impl will construct `x < 3` from `x < 3.14`.

```
$ sbt
> sql/testOnly *OrcFilterSuite
> sql/testOnly *OrcQuerySuite -- -z "27160"
```

Closes apache#24092 from sadhen/spark27160.

Authored-by: Darcy Shen <sadhen@zoho.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
holdenk pushed a commit to holdenk/spark that referenced this pull request Sep 12, 2019
DecimalType Literal should not be casted to Long.

eg. For `df.filter("x < 3.14")`, assuming df (x in DecimalType) reads from a ORC table and uses the native ORC reader with predicate push down enabled, we will push down the `x < 3.14` predicate to the ORC reader via a SearchArgument.

OrcFilters will construct the SearchArgument, but not handle the DecimalType correctly.

The previous impl will construct `x < 3` from `x < 3.14`.

```
$ sbt
> sql/testOnly *OrcFilterSuite
> sql/testOnly *OrcQuerySuite -- -z "27160"
```

Closes apache#24092 from sadhen/spark27160.

Authored-by: Darcy Shen <sadhen@zoho.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit f3ba73a)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants