New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23523] [SQL] Fix the incorrect result caused by the rule OptimizeMetadataOnlyQuery #20684
Conversation
relation.output.filter(a => partColumns.contains(a.name.toLowerCase)) | ||
val attrMap = relation.output.map(_.name).zip(relation.output).toMap | ||
partitionColumnNames.map { colName => | ||
attrMap.getOrElse(colName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to consider the case sensitivity when comparing the names? cc @cloud-fan
good catch! LGTM |
Test build #87707 has finished for PR 20684 at commit
|
Test build #87700 has finished for PR 20684 at commit
|
retest this please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Test build #87708 has finished for PR 20684 at commit
|
Test build #87709 has finished for PR 20684 at commit
|
retest this please |
Test build #87716 has finished for PR 20684 at commit
|
Hi, @gatorsmile and @cloud-fan . |
We are still waiting for the official announcement of Spark 2.3 release. This will be merged to 2.3.1 for sure. |
I see. Thank you for confirmation, @gatorsmile ! |
Gentle ping, @gatorsmile since 2.3 is announced officially yesterday. |
…zeMetadataOnlyQuery ## What changes were proposed in this pull request? ```Scala val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e") Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5") .write.json(tablePath.getCanonicalPath) val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5", "CoL3").distinct() df.show() ``` It generates a wrong result. ``` [c,e,a] ``` We have a bug in the rule `OptimizeMetadataOnlyQuery `. We should respect the attribute order in the original leaf node. This PR is to fix it. ## How was this patch tested? Added a test case Author: gatorsmile <gatorsmile@gmail.com> Closes apache#20684 from gatorsmile/optimizeMetadataOnly.
…he rule OptimizeMetadataOnlyQuery This PR is to backport #20684 and #20693 to Spark 2.3 branch --- ## What changes were proposed in this pull request? ```Scala val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e") Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5") .write.json(tablePath.getCanonicalPath) val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5", "CoL3").distinct() df.show() ``` It generates a wrong result. ``` [c,e,a] ``` We have a bug in the rule `OptimizeMetadataOnlyQuery `. We should respect the attribute order in the original leaf node. This PR is to fix it. ## How was this patch tested? Added a test case Author: Xingbo Jiang <xingbo.jiang@databricks.com> Author: gatorsmile <gatorsmile@gmail.com> Closes #20763 from gatorsmile/backport23523.
…zeMetadataOnlyQuery ## What changes were proposed in this pull request? ```Scala val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e") Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5") .write.json(tablePath.getCanonicalPath) val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5", "CoL3").distinct() df.show() ``` It generates a wrong result. ``` [c,e,a] ``` We have a bug in the rule `OptimizeMetadataOnlyQuery `. We should respect the attribute order in the original leaf node. This PR is to fix it. ## How was this patch tested? Added a test case Author: gatorsmile <gatorsmile@gmail.com> Closes apache#20684 from gatorsmile/optimizeMetadataOnly.
What changes were proposed in this pull request?
It generates a wrong result.
We have a bug in the rule
OptimizeMetadataOnlyQuery
. We should respect the attribute order in the original leaf node. This PR is to fix it.How was this patch tested?
Added a test case