[SPARK-19246][SQL]CataLogTable's partitionSchema order and exist check #16606

windpiger · 2017-01-16T15:05:15Z

What changes were proposed in this pull request?

CataLogTable's partitionSchema should check if each column name in partitionColumnNames must match one and only one field in schema, if not we should throw an exception

and CataLogTable's partitionSchema should keep order with partitionColumnNames

How was this patch tested?

N/A

SparkQA · 2017-01-16T15:10:10Z

Test build #71451 has finished for PR 16606 at commit eaf18ce.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-16T16:44:23Z

Test build #71453 has finished for PR 16606 at commit 9296624.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-17T07:17:43Z

Test build #71483 has started for PR 16606 at commit 4260f84.

cloud-fan · 2017-01-17T07:50:06Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala

+   * keep the schema order with partitionColumnNames
+   */
+  def partitionSchema: StructType = StructType(partitionColumnNames.flatMap {
+    p => schema.filter(_.name == p)


nit: code style

xxx.map { p => xxx }

cloud-fan · 2017-01-17T07:52:23Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala

-    c => partitionColumnNames.contains(c.name)
+  /**
+   * schema of this table's partition columns
+   * keep the schema order with partitionColumnNames


let's keep the previous document, I think it's clear enough.

gatorsmile · 2017-01-17T07:56:41Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/PartitionProviderCompatibilitySuite.scala

@@ -481,4 +481,27 @@ class PartitionProviderCompatibilitySuite
      assert(spark.sql("show partitions test").count() == 5)
    }
  }
+
+  test("saveAsTable with inconsistent columns order" +


Could you move it to PartitionedWriteSuite?

gatorsmile · 2017-01-17T08:32:20Z

retest this please

SparkQA · 2017-01-17T11:00:48Z

Test build #71494 has finished for PR 16606 at commit 4260f84.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-01-17T11:09:40Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala

@@ -184,8 +184,8 @@ case class CatalogTable(
  import CatalogTable._

  /** schema of this table's partition columns */
-  def partitionSchema: StructType = StructType(schema.filter {
-    c => partitionColumnNames.contains(c.name)
+  def partitionSchema: StructType = StructType(partitionColumnNames.flatMap { p =>


please use

xxx.map { p => schema.find(_.name == p).getOrElse { throw ... } }

SparkQA · 2017-01-17T11:17:45Z

Test build #71497 has finished for PR 16606 at commit 6f2816e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-17T19:59:22Z

Test build #71519 has finished for PR 16606 at commit c08e1c9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-18T02:19:46Z

Test build #71552 has finished for PR 16606 at commit 8cbee32.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-18T04:59:47Z

Test build #71554 has finished for PR 16606 at commit 79e2e3f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-01-18T05:20:27Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala

@@ -138,6 +138,7 @@ case class CreateDataSourceTableAsSelectCommand(
    val tableIdentWithDB = table.identifier.copy(database = Some(db))
    val tableName = tableIdentWithDB.unquotedString

+    var tableWithSchema = table.copy(schema = query.output.toStructType)


shall we set the schema in AnalyzeCreateTable?

cloud-fan · 2017-01-18T05:21:02Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala

@@ -1374,4 +1377,47 @@ class HiveDDLSuite
      assert(e2.message.contains("Hive data source can only be used with tables"))
    }
  }
+
+  test("table partition schema should be ordered") {


table partition schema should respect the order of partition columns

cloud-fan · 2017-01-18T05:21:13Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala

+
+  test("table partition schema should be ordered") {
+    withTable("t", "t1") {
+      val path = Utils.createTempDir(namePrefix = "t")


use withTempDir

cloud-fan · 2017-01-18T05:23:23Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala

+      val path = Utils.createTempDir(namePrefix = "t")
+      val path1 = Utils.createTempDir(namePrefix = "t1")
+      try {
+        spark.sql(s"""


nit: code style, please follow existing code: https://github.com/apache/spark/pull/16606/files#diff-b7094baa12601424a5d19cb930e3402fR1255

cloud-fan · 2017-01-18T05:29:55Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala

+                     |create table t (id long, P1 int, P2 int)
+                     |using parquet
+                     |options (path "$path")
+                     |partitioned by (P1, P2)""".stripMargin)


this test can pass without your changes right? I think we can just keep the below one.

cloud-fan · 2017-01-18T05:33:10Z

sql/core/src/test/scala/org/apache/spark/sql/sources/PartitionedWriteSuite.scala

@@ -92,6 +92,30 @@ class PartitionedWriteSuite extends QueryTest with SharedSQLContext {
    }
  }

+
+  test("saveAsTable with inconsistent columns order" +


does this test improve the test coverage?

SparkQA · 2017-01-18T07:49:54Z

Test build #71580 has started for PR 16606 at commit 5e60f14.

cloud-fan · 2017-01-23T15:51:16Z

how about we just add an assert? i.e.
assert(schema.takeRight(partitionColumnNames.length).map(_.name) == partitionColumnNames)

SparkQA · 2017-01-23T17:16:57Z

Test build #71853 has finished for PR 16606 at commit 206b232.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-01-24T02:39:12Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala

+
+    StructType(schema.filter { c =>
+      partitionColumnNames.contains(c.name)
+    })


This can be shorten to a single line.

SparkQA · 2017-01-24T02:55:09Z

Test build #71889 has finished for PR 16606 at commit 7e30cc7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-01-24T03:12:17Z

LGTM pending test

cloud-fan · 2017-01-24T03:19:33Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala

+   * schema of this table's partition columns
+   */
+  def partitionSchema: StructType = {
+    assert(schema.takeRight(partitionColumnNames.length).map(_.name) == partitionColumnNames)


how about

val partitionFields = schema.takeRight(partitionColumnNames.length) assert(partitionFields.map(_.name) == partitionColumnNames) StructType(partitionFields)

SparkQA · 2017-01-24T05:22:58Z

Test build #71902 has finished for PR 16606 at commit 04d3940.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-24T05:43:42Z

Test build #71912 has started for PR 16606 at commit 72164eb.

cloud-fan · 2017-01-24T06:30:34Z

LGTM, pending tests

cloud-fan · 2017-01-24T08:11:33Z

retest this please

SparkQA · 2017-01-24T10:50:59Z

Test build #71922 has finished for PR 16606 at commit 72164eb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-01-24T12:49:54Z

thanks, merging to master!

## What changes were proposed in this pull request? CataLogTable's partitionSchema should check if each column name in partitionColumnNames must match one and only one field in schema, if not we should throw an exception and CataLogTable's partitionSchema should keep order with partitionColumnNames ## How was this patch tested? N/A Author: windpiger <songjun@outlook.com> Closes apache#16606 from windpiger/checkPartionColNameWithSchema.

[SPARK-19246][SQL]CataLogTable's partitionSchema order and exist check

eaf18ce

fix a style

9296624

Merge branch 'master' into checkPartionColNameWithSchema

09d17bc

cloud-fan reviewed Jan 17, 2017

View reviewed changes

gatorsmile reviewed Jan 17, 2017

View reviewed changes

cloud-fan reviewed Jan 17, 2017

View reviewed changes

cloud-fan reviewed Jan 18, 2017

View reviewed changes

windpiger added 2 commits January 23, 2017 22:46

Merge branch 'master' into checkPartionColNameWithSchema

d620b86

merge with master and remove some code

206b232

windpiger force-pushed the checkPartionColNameWithSchema branch from 5e60f14 to 206b232 Compare January 23, 2017 14:49

optimize the code

7e30cc7

gatorsmile reviewed Jan 24, 2017

View reviewed changes

fix a code style

04d3940

cloud-fan reviewed Jan 24, 2017

View reviewed changes

optimize the code

72164eb

asfgit closed this in 752502b Jan 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-19246][SQL]CataLogTable's partitionSchema order and exist check #16606

[SPARK-19246][SQL]CataLogTable's partitionSchema order and exist check #16606

windpiger commented Jan 16, 2017 •

edited

Loading

SparkQA commented Jan 16, 2017

SparkQA commented Jan 16, 2017

SparkQA commented Jan 17, 2017

cloud-fan Jan 17, 2017

cloud-fan Jan 17, 2017

gatorsmile Jan 17, 2017

gatorsmile commented Jan 17, 2017

SparkQA commented Jan 17, 2017

cloud-fan Jan 17, 2017

SparkQA commented Jan 17, 2017

SparkQA commented Jan 17, 2017

SparkQA commented Jan 18, 2017

SparkQA commented Jan 18, 2017

cloud-fan Jan 18, 2017

cloud-fan Jan 18, 2017

cloud-fan Jan 18, 2017

cloud-fan Jan 18, 2017

cloud-fan Jan 18, 2017

cloud-fan Jan 18, 2017

SparkQA commented Jan 18, 2017

cloud-fan commented Jan 23, 2017 •

edited

Loading

SparkQA commented Jan 23, 2017

gatorsmile Jan 24, 2017

SparkQA commented Jan 24, 2017

gatorsmile commented Jan 24, 2017

cloud-fan Jan 24, 2017

SparkQA commented Jan 24, 2017

SparkQA commented Jan 24, 2017

cloud-fan commented Jan 24, 2017

cloud-fan commented Jan 24, 2017

SparkQA commented Jan 24, 2017

cloud-fan commented Jan 24, 2017

[SPARK-19246][SQL]CataLogTable's partitionSchema order and exist check #16606

[SPARK-19246][SQL]CataLogTable's partitionSchema order and exist check #16606

Conversation

windpiger commented Jan 16, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Jan 16, 2017

SparkQA commented Jan 16, 2017

SparkQA commented Jan 17, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gatorsmile commented Jan 17, 2017

SparkQA commented Jan 17, 2017

Choose a reason for hiding this comment

SparkQA commented Jan 17, 2017

SparkQA commented Jan 17, 2017

SparkQA commented Jan 18, 2017

SparkQA commented Jan 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jan 18, 2017

cloud-fan commented Jan 23, 2017 • edited Loading

SparkQA commented Jan 23, 2017

Choose a reason for hiding this comment

SparkQA commented Jan 24, 2017

gatorsmile commented Jan 24, 2017

Choose a reason for hiding this comment

SparkQA commented Jan 24, 2017

SparkQA commented Jan 24, 2017

cloud-fan commented Jan 24, 2017

cloud-fan commented Jan 24, 2017

SparkQA commented Jan 24, 2017

cloud-fan commented Jan 24, 2017

windpiger commented Jan 16, 2017 •

edited

Loading

cloud-fan commented Jan 23, 2017 •

edited

Loading