Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19917][SQL]qualified partition path stored in catalog #17254

Closed
wants to merge 8 commits into from

Conversation

windpiger
Copy link
Contributor

@windpiger windpiger commented Mar 11, 2017

What changes were proposed in this pull request?

partition path should be qualified to store in catalog.
There are some scenes:

  1. ALTER TABLE t PARTITION(b=1) SET LOCATION '/path/x'
    should be qualified: file:/path/x
    Hive 2.0.0 does not support for location without schema here.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. {0}  is not absolute or has no scheme information.  Please specify a complete absolute uri with scheme information.
  1. ALTER TABLE t PARTITION(b=1) SET LOCATION 'x'
    should be qualified: file:/tablelocation/x
    Hive 2.0.0 does not support for relative location here.
  2. ALTER TABLE t ADD PARTITION(b=1) LOCATION '/path/x'
    should be qualified: file:/path/x
    the same with Hive 2.0.0
  3. ALTER TABLE t ADD PARTITION(b=1) LOCATION 'x'
    should be qualified: file:/tablelocation/x
    the same with Hive 2.0.0

Currently only ALTER TABLE t ADD PARTITION(b=1) LOCATION for hive serde table has the expected qualified path. we should make other scenes to be consist with it.

Another change is for alter table location.

How was this patch tested?

add / modify existing TestCases

@@ -1201,7 +1202,9 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils {
verifyLocation(new URI("/swanky/steak/place"))
// set table partition location without explicitly specifying database
sql("ALTER TABLE tab1 PARTITION (a='1', b='2') SET LOCATION 'vienna'")
verifyLocation(new URI("vienna"), Some(partSpec))
val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("tab1"))
Copy link
Contributor Author

@windpiger windpiger Mar 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALTER TABLE PARTITION SET LOCATION
relative location will be quallified with parent path using table location

assert(tableLocation.isDefined)
makeQualifiedPath(new Path(tableLocation.get.toString, "paris").toString)
} else {
new URI("paris")
Copy link
Contributor Author

@windpiger windpiger Mar 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALTER TABLE ADD PARTITION LOCATION
relative location will be quallified with parent path using table location

now InMemoryCatalog has the same action with HiveExternalCatalog

@@ -2180,6 +2181,13 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils {

withTempDir { dir =>
assert(!dir.getAbsolutePath.startsWith("file:/"))
spark.sql(s"ALTER TABLE t SET LOCATION '$dir'")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALTER TABLE SET LOCATION
should be also qualified

@SparkQA
Copy link

SparkQA commented Mar 11, 2017

Test build #74373 has finished for PR 17254 at commit c0dc3b7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 11, 2017

Test build #74377 has started for PR 17254 at commit 191d8a1.

@SparkQA
Copy link

SparkQA commented Mar 11, 2017

Test build #74378 has started for PR 17254 at commit 36a3463.

@windpiger
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 11, 2017

Test build #74384 has finished for PR 17254 at commit 36a3463.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@windpiger
Copy link
Contributor Author

cc @cloud-fan @gatorsmile

@SparkQA
Copy link

SparkQA commented Sep 6, 2017

Test build #81437 has finished for PR 17254 at commit 36a3463.

  • This patch fails PySpark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 24, 2018

Test build #95195 has started for PR 17254 at commit 36a3463.

remove empty line

remove empty line

fix test failed

fix test failed
@SparkQA
Copy link

SparkQA commented Sep 17, 2019

Test build #110779 has finished for PR 17254 at commit f59d65f.

  • This patch fails R style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@windpiger
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Sep 17, 2019

Test build #110782 has finished for PR 17254 at commit f61af0a.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 17, 2019

Test build #110783 has finished for PR 17254 at commit c028c75.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

&& !tableDefinition.storage.locationUri.get.isAbsolute) {
// make the location of the table qualified.
val qualifiedTableLocation =
makeQualifiedPath(tableDefinition.storage.locationUri.get)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 2 space indentation

@@ -942,7 +954,8 @@ class SessionCatalog(
requireTableExists(TableIdentifier(table, Option(db)))
requireExactMatchedPartitionSpec(parts.map(_.spec), getTableMetadata(tableName))
requireNonEmptyValueInPartitionSpec(parts.map(_.spec))
externalCatalog.alterPartitions(db, table, parts)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary blank line

@cloud-fan
Copy link
Contributor

Currently only ALTER TABLE t ADD PARTITION(b=1) LOCATION for hive serde table has the expected qualified path.

Where is this done?

@SparkQA
Copy link

SparkQA commented Sep 18, 2019

Test build #110847 has finished for PR 17254 at commit fadc15c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

parts: Seq[CatalogTablePartition]): Seq[CatalogTablePartition] = {
parts.map { part =>
if (part.storage.locationUri.isDefined && !part.storage.locationUri.get.isAbsolute) {
val tbl = getTableMetadata(tableIdentifier)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should get table metadata only once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can mark it as lazy val in case we don't need to qualify any path.

@SparkQA
Copy link

SparkQA commented Sep 23, 2019

Test build #111219 has finished for PR 17254 at commit e168b34.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 23, 2019

Test build #111222 has finished for PR 17254 at commit f8ab46a.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 24, 2019

Test build #111248 has finished for PR 17254 at commit 20e0168.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan cloud-fan closed this in da7e5c4 Sep 24, 2019
@cloud-fan
Copy link
Contributor

thanks, merging to master!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants