Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23523] [SQL] Fix the incorrect result caused by the rule OptimizeMetadataOnlyQuery #20684

Closed
wants to merge 3 commits into from

Conversation

gatorsmile
Copy link
Member

What changes were proposed in this pull request?

val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e")
 Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5")
 .write.json(tablePath.getCanonicalPath)
 val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5", "CoL3").distinct()
 df.show()

It generates a wrong result.

[c,e,a]

We have a bug in the rule OptimizeMetadataOnlyQuery . We should respect the attribute order in the original leaf node. This PR is to fix it.

How was this patch tested?

Added a test case

relation.output.filter(a => partColumns.contains(a.name.toLowerCase))
val attrMap = relation.output.map(_.name).zip(relation.output).toMap
partitionColumnNames.map { colName =>
attrMap.getOrElse(colName,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to consider the case sensitivity when comparing the names? cc @cloud-fan

@cloud-fan
Copy link
Contributor

good catch! LGTM

@SparkQA
Copy link

SparkQA commented Feb 27, 2018

Test build #87707 has finished for PR 20684 at commit 1bfaef8.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 27, 2018

Test build #87700 has finished for PR 20684 at commit ce702c7.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Feb 27, 2018

retest this please.

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SparkQA
Copy link

SparkQA commented Feb 27, 2018

Test build #87708 has finished for PR 20684 at commit 1bfaef8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 27, 2018

Test build #87709 has finished for PR 20684 at commit 1bfaef8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Feb 27, 2018

Test build #87716 has finished for PR 20684 at commit 1bfaef8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in 414ee86 Feb 27, 2018
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Feb 28, 2018

Hi, @gatorsmile and @cloud-fan .
Since 2.3 vote passed, can we have this in branch-2.3 for Apache Spark 2.3.1?

@gatorsmile
Copy link
Member Author

We are still waiting for the official announcement of Spark 2.3 release. This will be merged to 2.3.1 for sure.

@dongjoon-hyun
Copy link
Member

I see. Thank you for confirmation, @gatorsmile !

@dongjoon-hyun
Copy link
Member

Gentle ping, @gatorsmile since 2.3 is announced officially yesterday.

gatorsmile added a commit to gatorsmile/spark that referenced this pull request Mar 7, 2018
…zeMetadataOnlyQuery

## What changes were proposed in this pull request?
```Scala
val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e")
 Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5")
 .write.json(tablePath.getCanonicalPath)
 val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5", "CoL3").distinct()
 df.show()
```

It generates a wrong result.
```
[c,e,a]
```

We have a bug in the rule `OptimizeMetadataOnlyQuery `. We should respect the attribute order in the original leaf node. This PR is to fix it.

## How was this patch tested?
Added a test case

Author: gatorsmile <gatorsmile@gmail.com>

Closes apache#20684 from gatorsmile/optimizeMetadataOnly.
asfgit pushed a commit that referenced this pull request Mar 13, 2018
…he rule OptimizeMetadataOnlyQuery

This PR is to backport #20684 and #20693 to Spark 2.3 branch

---

## What changes were proposed in this pull request?
```Scala
val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e")
 Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5")
 .write.json(tablePath.getCanonicalPath)
 val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5", "CoL3").distinct()
 df.show()
```

It generates a wrong result.
```
[c,e,a]
```

We have a bug in the rule `OptimizeMetadataOnlyQuery `. We should respect the attribute order in the original leaf node. This PR is to fix it.

## How was this patch tested?
Added a test case

Author: Xingbo Jiang <xingbo.jiang@databricks.com>
Author: gatorsmile <gatorsmile@gmail.com>

Closes #20763 from gatorsmile/backport23523.
peter-toth pushed a commit to peter-toth/spark that referenced this pull request Oct 6, 2018
…zeMetadataOnlyQuery

## What changes were proposed in this pull request?
```Scala
val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e")
 Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5")
 .write.json(tablePath.getCanonicalPath)
 val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5", "CoL3").distinct()
 df.show()
```

It generates a wrong result.
```
[c,e,a]
```

We have a bug in the rule `OptimizeMetadataOnlyQuery `. We should respect the attribute order in the original leaf node. This PR is to fix it.

## How was this patch tested?
Added a test case

Author: gatorsmile <gatorsmile@gmail.com>

Closes apache#20684 from gatorsmile/optimizeMetadataOnly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants