Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21330][SQL] Bad partitioning does not allow to read a JDBC table with extreme values on the partition column #18800

Closed
wants to merge 2 commits into from

Conversation

aray
Copy link
Contributor

@aray aray commented Aug 1, 2017

What changes were proposed in this pull request?

An overflow of the difference of bounds on the partitioning column leads to no data being read. This
patch checks for this overflow.

How was this patch tested?

New unit test.

@@ -64,7 +64,8 @@ private[sql] object JDBCRelation extends Logging {
s"bound. Lower bound: $lowerBound; Upper bound: $upperBound")

val numPartitions =
if ((upperBound - lowerBound) >= partitioning.numPartitions) {
if ((upperBound - lowerBound) >= partitioning.numPartitions ||
(upperBound - lowerBound) < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. for bonus points, add a comment about what this is for, and indent this line two more spaces

val df = sql("SELECT * FROM partsoverflow")
checkNumPartitions(df, expectedNumPartitions = 3)
assert(df.collect().length == 3)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe delete this blank for tidiness

@SparkQA
Copy link

SparkQA commented Aug 1, 2017

Test build #80130 has finished for PR 18800 at commit 7de8ccc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 1, 2017

Test build #80131 has finished for PR 18800 at commit 9587bf1.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 2, 2017

Test build #3867 has finished for PR 18800 at commit 9587bf1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Aug 4, 2017
…le with extreme values on the partition column

## What changes were proposed in this pull request?

An overflow of the difference of bounds on the partitioning column leads to no data being read. This
patch checks for this overflow.

## How was this patch tested?

New unit test.

Author: Andrew Ray <ray.andrew@gmail.com>

Closes #18800 from aray/SPARK-21330.

(cherry picked from commit 25826c7)
Signed-off-by: Sean Owen <sowen@cloudera.com>
asfgit pushed a commit that referenced this pull request Aug 4, 2017
…le with extreme values on the partition column

## What changes were proposed in this pull request?

An overflow of the difference of bounds on the partitioning column leads to no data being read. This
patch checks for this overflow.

## How was this patch tested?

New unit test.

Author: Andrew Ray <ray.andrew@gmail.com>

Closes #18800 from aray/SPARK-21330.

(cherry picked from commit 25826c7)
Signed-off-by: Sean Owen <sowen@cloudera.com>
@srowen
Copy link
Member

srowen commented Aug 4, 2017

Merged to master/2.2/2.1

@asfgit asfgit closed this in 25826c7 Aug 4, 2017
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
…le with extreme values on the partition column

## What changes were proposed in this pull request?

An overflow of the difference of bounds on the partitioning column leads to no data being read. This
patch checks for this overflow.

## How was this patch tested?

New unit test.

Author: Andrew Ray <ray.andrew@gmail.com>

Closes apache#18800 from aray/SPARK-21330.

(cherry picked from commit 25826c7)
Signed-off-by: Sean Owen <sowen@cloudera.com>
jzhuge pushed a commit to jzhuge/spark that referenced this pull request Aug 20, 2018
…le with extreme values on the partition column

An overflow of the difference of bounds on the partitioning column leads to no data being read. This
patch checks for this overflow.

New unit test.

Author: Andrew Ray <ray.andrew@gmail.com>

Closes apache#18800 from aray/SPARK-21330.

(cherry picked from commit 25826c7)
Signed-off-by: Sean Owen <sowen@cloudera.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants