[SPARK-21085] [SQL] Failed to read the partitioned table created by Spark 2.1 #18295

gatorsmile · 2017-06-14T05:20:57Z

What changes were proposed in this pull request?

Before the PR, Spark is unable to read the partitioned table created by Spark 2.1 when the table schema does not put the partitioning column at the end of the schema.
assert(partitionFields.map(_.name) == partitionColumnNames)

When reading the table metadata from the metastore, we also need to reorder the columns.

How was this patch tested?

Added test cases to check both Hive-serde and data source tables.

gatorsmile · 2017-06-14T05:21:09Z

cc @cloud-fan

SparkQA · 2017-06-14T06:55:01Z

Test build #78028 has finished for PR 18295 at commit 719ecf2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2017-06-14T07:25:31Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala

-          schema = schemaFromTableProps,
-          partitionColumnNames = getPartitionColumnsFromTableProperties(table),
+          schema = reorderedSchema,
+          partitionColumnNames = partColumnNames,
          bucketSpec = getBucketSpecFromTableProperties(table))
      } else {


For the case below that Hive metastore changes the table schema, can we pass that assert always?

cloud-fan · 2017-06-14T08:28:34Z

LGTM, merging to master/2.2!

…ark 2.1 ### What changes were proposed in this pull request? Before the PR, Spark is unable to read the partitioned table created by Spark 2.1 when the table schema does not put the partitioning column at the end of the schema. [assert(partitionFields.map(_.name) == partitionColumnNames)](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala#L234-L236) When reading the table metadata from the metastore, we also need to reorder the columns. ### How was this patch tested? Added test cases to check both Hive-serde and data source tables. Author: gatorsmile <gatorsmile@gmail.com> Closes #18295 from gatorsmile/reorderReadSchema. (cherry picked from commit 0c88e8d) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…ark 2.1 ### What changes were proposed in this pull request? Before the PR, Spark is unable to read the partitioned table created by Spark 2.1 when the table schema does not put the partitioning column at the end of the schema. [assert(partitionFields.map(_.name) == partitionColumnNames)](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala#L234-L236) When reading the table metadata from the metastore, we also need to reorder the columns. ### How was this patch tested? Added test cases to check both Hive-serde and data source tables. Author: gatorsmile <gatorsmile@gmail.com> Closes apache#18295 from gatorsmile/reorderReadSchema.

fix.

719ecf2

viirya reviewed Jun 14, 2017

View reviewed changes

asfgit closed this in 0c88e8d Jun 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-21085] [SQL] Failed to read the partitioned table created by Spark 2.1 #18295

[SPARK-21085] [SQL] Failed to read the partitioned table created by Spark 2.1 #18295

gatorsmile commented Jun 14, 2017

gatorsmile commented Jun 14, 2017

SparkQA commented Jun 14, 2017

viirya Jun 14, 2017

cloud-fan commented Jun 14, 2017

[SPARK-21085] [SQL] Failed to read the partitioned table created by Spark 2.1 #18295

[SPARK-21085] [SQL] Failed to read the partitioned table created by Spark 2.1 #18295

Conversation

gatorsmile commented Jun 14, 2017

What changes were proposed in this pull request?

How was this patch tested?

gatorsmile commented Jun 14, 2017

SparkQA commented Jun 14, 2017

viirya Jun 14, 2017

Choose a reason for hiding this comment

cloud-fan commented Jun 14, 2017