New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32788][SQL] non-partitioned table scan should not have partition filter #29637
Conversation
@@ -439,6 +435,9 @@ case class FileSourceScanExec( | |||
driverMetrics("staticFilesNum") = filesNum | |||
driverMetrics("staticFilesSize") = filesSize | |||
} | |||
if (relation.partitionSchemaOption.isDefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is not needed, but I keep it here for: 1. avoid duplicated code. 2. make it super safe that we only update the numPartitions
if table is partitioned.
test("SPARK-32788: non-partitioned table scan should not have partition filter") { | ||
withTable("t") { | ||
spark.range(1).write.saveAsTable("t") | ||
checkAnswer(sql("SELECT id FROM t WHERE (SELECT true)"), Row(0L)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bit surprised that this query failed... nice catch.
ExpressionSet(normalizedFilters | ||
.filter(_.references.subsetOf(partitionSet))) | ||
} | ||
|
||
logInfo(s"Pruning directories with: ${partitionKeyFilters.mkString(",")}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Is this logging still meaningful for the non-partition case? I mean, could we move it inside the else
block in the line 159-162?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense
LGTM |
Test build #128253 has finished for PR 29637 at commit
|
…on filter ### What changes were proposed in this pull request? This PR fixes a bug `FileSourceStrategy`, which generates partition filters even if the table is not partitioned. This can confuse `FileSourceScanExec`, which mistakenly think the table is partitioned and tries to update the `numPartitions` metrics, and cause a failure. We should not generate partition filters for non-partitioned table. ### Why are the changes needed? The bug was exposed by #29436. ### Does this PR introduce _any_ user-facing change? Yes, fix a bug. ### How was this patch tested? new test Closes #29637 from cloud-fan/refactor. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Yuming Wang <yumwang@ebay.com> (cherry picked from commit 76330e0) Signed-off-by: Yuming Wang <yumwang@ebay.com>
Merged to master and branch-3.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm too
Test build #128259 has finished for PR 29637 at commit
|
…on filter ### What changes were proposed in this pull request? This PR fixes a bug `FileSourceStrategy`, which generates partition filters even if the table is not partitioned. This can confuse `FileSourceScanExec`, which mistakenly think the table is partitioned and tries to update the `numPartitions` metrics, and cause a failure. We should not generate partition filters for non-partitioned table. ### Why are the changes needed? The bug was exposed by apache#29436. ### Does this PR introduce _any_ user-facing change? Yes, fix a bug. ### How was this patch tested? new test Closes apache#29637 from cloud-fan/refactor. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Yuming Wang <yumwang@ebay.com> (cherry picked from commit 76330e0) Signed-off-by: Yuming Wang <yumwang@ebay.com>
What changes were proposed in this pull request?
This PR fixes a bug
FileSourceStrategy
, which generates partition filters even if the table is not partitioned. This can confuseFileSourceScanExec
, which mistakenly think the table is partitioned and tries to update thenumPartitions
metrics, and cause a failure. We should not generate partition filters for non-partitioned table.Why are the changes needed?
The bug was exposed by #29436.
Does this PR introduce any user-facing change?
Yes, fix a bug.
How was this patch tested?
new test