Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19246][SQL]CataLogTable's partitionSchema order and exist check #16606

Closed

Conversation

windpiger
Copy link
Contributor

@windpiger windpiger commented Jan 16, 2017

What changes were proposed in this pull request?

CataLogTable's partitionSchema should check if each column name in partitionColumnNames must match one and only one field in schema, if not we should throw an exception

and CataLogTable's partitionSchema should keep order with partitionColumnNames

How was this patch tested?

N/A

@SparkQA
Copy link

SparkQA commented Jan 16, 2017

Test build #71451 has finished for PR 16606 at commit eaf18ce.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 16, 2017

Test build #71453 has finished for PR 16606 at commit 9296624.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 17, 2017

Test build #71483 has started for PR 16606 at commit 4260f84.

* keep the schema order with partitionColumnNames
*/
def partitionSchema: StructType = StructType(partitionColumnNames.flatMap {
p => schema.filter(_.name == p)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: code style

xxx.map { p =>
  xxx
}

c => partitionColumnNames.contains(c.name)
/**
* schema of this table's partition columns
* keep the schema order with partitionColumnNames
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep the previous document, I think it's clear enough.

@@ -481,4 +481,27 @@ class PartitionProviderCompatibilitySuite
assert(spark.sql("show partitions test").count() == 5)
}
}

test("saveAsTable with inconsistent columns order" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move it to PartitionedWriteSuite?

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jan 17, 2017

Test build #71494 has finished for PR 16606 at commit 4260f84.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -184,8 +184,8 @@ case class CatalogTable(
import CatalogTable._

/** schema of this table's partition columns */
def partitionSchema: StructType = StructType(schema.filter {
c => partitionColumnNames.contains(c.name)
def partitionSchema: StructType = StructType(partitionColumnNames.flatMap { p =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use

xxx.map { p =>
  schema.find(_.name == p).getOrElse {
    throw ...
  }
}

@SparkQA
Copy link

SparkQA commented Jan 17, 2017

Test build #71497 has finished for PR 16606 at commit 6f2816e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 17, 2017

Test build #71519 has finished for PR 16606 at commit c08e1c9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 18, 2017

Test build #71552 has finished for PR 16606 at commit 8cbee32.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 18, 2017

Test build #71554 has finished for PR 16606 at commit 79e2e3f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -138,6 +138,7 @@ case class CreateDataSourceTableAsSelectCommand(
val tableIdentWithDB = table.identifier.copy(database = Some(db))
val tableName = tableIdentWithDB.unquotedString

var tableWithSchema = table.copy(schema = query.output.toStructType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we set the schema in AnalyzeCreateTable?

@@ -1374,4 +1377,47 @@ class HiveDDLSuite
assert(e2.message.contains("Hive data source can only be used with tables"))
}
}

test("table partition schema should be ordered") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

table partition schema should respect the order of partition columns


test("table partition schema should be ordered") {
withTable("t", "t1") {
val path = Utils.createTempDir(namePrefix = "t")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use withTempDir

val path = Utils.createTempDir(namePrefix = "t")
val path1 = Utils.createTempDir(namePrefix = "t1")
try {
spark.sql(s"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

|create table t (id long, P1 int, P2 int)
|using parquet
|options (path "$path")
|partitioned by (P1, P2)""".stripMargin)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test can pass without your changes right? I think we can just keep the below one.

@@ -92,6 +92,30 @@ class PartitionedWriteSuite extends QueryTest with SharedSQLContext {
}
}


test("saveAsTable with inconsistent columns order" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this test improve the test coverage?

@SparkQA
Copy link

SparkQA commented Jan 18, 2017

Test build #71580 has started for PR 16606 at commit 5e60f14.

@windpiger windpiger force-pushed the checkPartionColNameWithSchema branch from 5e60f14 to 206b232 Compare January 23, 2017 14:49
@cloud-fan
Copy link
Contributor

cloud-fan commented Jan 23, 2017

how about we just add an assert? i.e.
assert(schema.takeRight(partitionColumnNames.length).map(_.name) == partitionColumnNames)

@SparkQA
Copy link

SparkQA commented Jan 23, 2017

Test build #71853 has finished for PR 16606 at commit 206b232.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


StructType(schema.filter { c =>
partitionColumnNames.contains(c.name)
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be shorten to a single line.

@SparkQA
Copy link

SparkQA commented Jan 24, 2017

Test build #71889 has finished for PR 16606 at commit 7e30cc7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM pending test

* schema of this table's partition columns
*/
def partitionSchema: StructType = {
assert(schema.takeRight(partitionColumnNames.length).map(_.name) == partitionColumnNames)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about

val partitionFields = schema.takeRight(partitionColumnNames.length)
assert(partitionFields.map(_.name) == partitionColumnNames)
StructType(partitionFields)

@SparkQA
Copy link

SparkQA commented Jan 24, 2017

Test build #71902 has finished for PR 16606 at commit 04d3940.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 24, 2017

Test build #71912 has started for PR 16606 at commit 72164eb.

@cloud-fan
Copy link
Contributor

LGTM, pending tests

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jan 24, 2017

Test build #71922 has finished for PR 16606 at commit 72164eb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in 752502b Jan 24, 2017
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

CataLogTable's partitionSchema should check if each column name in partitionColumnNames must match one and only one field in schema, if not we should throw an exception

and CataLogTable's partitionSchema should keep order with partitionColumnNames

## How was this patch tested?
N/A

Author: windpiger <songjun@outlook.com>

Closes apache#16606 from windpiger/checkPartionColNameWithSchema.
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
## What changes were proposed in this pull request?

CataLogTable's partitionSchema should check if each column name in partitionColumnNames must match one and only one field in schema, if not we should throw an exception

and CataLogTable's partitionSchema should keep order with partitionColumnNames

## How was this patch tested?
N/A

Author: windpiger <songjun@outlook.com>

Closes apache#16606 from windpiger/checkPartionColNameWithSchema.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants