Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19763][SQL]qualified external datasource table location stored in catalog #17095

Closed
wants to merge 8 commits into from

Conversation

windpiger
Copy link
Contributor

What changes were proposed in this pull request?

If we create a external datasource table with a non-qualified location , we should qualified it to store in catalog.

CREATE TABLE t(a string)
USING parquet
LOCATION '/path/xx'


CREATE TABLE t1(a string, b string)
USING parquet
PARTITIONED BY(b)
LOCATION '/path/xx'

when we get the table from catalog, the location should be qualified, e.g.'file:/path/xxx'

How was this patch tested?

unit test added

@SparkQA
Copy link

SparkQA commented Feb 28, 2017

Test build #73556 has finished for PR 17095 at commit 570ce24.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 28, 2017

Test build #73565 has finished for PR 17095 at commit 55c525e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 28, 2017

Test build #73582 has finished for PR 17095 at commit 18ec570.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 28, 2017

Test build #73591 has finished for PR 17095 at commit 22bef8b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@windpiger
Copy link
Contributor Author

cc @cloud-fan

@@ -254,7 +254,18 @@ class SessionCatalog(
val db = formatDatabaseName(tableDefinition.identifier.database.getOrElse(getCurrentDatabase))
val table = formatTableName(tableDefinition.identifier.table)
validateName(table)
val newTableDefinition = tableDefinition.copy(identifier = TableIdentifier(table, Some(db)))

val newTableDefinition = if (tableDefinition.storage.locationUri.isDefined) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be easier if locationUri is of type URI?

Copy link
Contributor Author

@windpiger windpiger Mar 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the URI without schema is also legal, this fix also needed even if it is a URI.
while if it is a URI, we can do this when the URI created.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but why we have to store the full qualified path? What can we gain from this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the location without schema like hdfs/file, when we restore it from metastore, we did not know what filesystem where the table stored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we apply it to all locations like database location, partition location?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they should all be applied this logic~
database has already contain this logic, shall I add the logic of partition in another pr?

@windpiger
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 2, 2017

Test build #73760 has finished for PR 17095 at commit 7e08045.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 2, 2017

Test build #73773 has finished for PR 17095 at commit 9932b03.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -1843,10 +1843,12 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach {
|OPTIONS(path "$dir")
""".stripMargin)
val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
assert(table.location == dir.getAbsolutePath)
val dirPath = new Path(dir.getAbsolutePath)
val fs = dirPath.getFileSystem(spark.sessionState.newHadoopConf())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create a helper function to avoid the duplicate codes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks~ I add the makeQualifiedPath in this PR

after that PR mereged, I will fix confilct and do this modify.

@@ -230,8 +230,8 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils {
}

private def getDBPath(dbName: String): URI = {
val warehousePath = s"file:${spark.sessionState.conf.warehousePath.stripPrefix("file:")}"
new Path(warehousePath, s"$dbName.db").toUri
val warehousePath = makeQualifiedPath(s"${spark.sessionState.conf.warehousePath}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just write spark.sessionState.conf.warehousePath, no need to wrap it with s""

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in 274973d Mar 9, 2017
@SparkQA
Copy link

SparkQA commented Mar 9, 2017

Test build #74259 has finished for PR 17095 at commit c7e837c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@windpiger
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 9, 2017

Test build #74258 has finished for PR 17095 at commit 0919fea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public static class LongWrapper
  • public static class IntWrapper
  • case class ResolveInlineTables(conf: CatalystConf) extends Rule[LogicalPlan]
  • case class CostBasedJoinReorder(conf: CatalystConf) extends Rule[LogicalPlan] with PredicateHelper
  • case class JoinPlan(itemIds: Set[Int], plan: LogicalPlan, joinConds: Set[Expression], cost: Cost)
  • case class Cost(rows: BigInt, size: BigInt)
  • abstract class RepartitionOperation extends UnaryNode
  • case class FlatMapGroupsWithState(
  • class CSVOptions(
  • class UnivocityParser(
  • trait WatermarkSupport extends UnaryExecNode
  • case class FlatMapGroupsWithStateExec(

@kayousterhout
Copy link
Contributor

kayousterhout commented Mar 17, 2017

I suspect that this PR is the cause of consistent failures in the maven build, in the HiveCatalogedDDLSuite unit test: https://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.sql.hive.execution.HiveCatalogedDDLSuite&test_name=create+temporary+view+using

Based on the error message: https://spark-tests.appspot.com/test-logs/408097945 it looks like the way the path is getting re-written (I think by the code in this PR) is causing Hadoop's path code to barf. The create temporary view using unit test is the only one in that suite that reads from a CSV file, which would explain why that's the only one that's failing. @windpiger or @cloud-fan would one of you mind looking into this?

I filed a JIRA here: https://issues.apache.org/jira/browse/SPARK-19990

@kayousterhout
Copy link
Contributor

kayousterhout commented Mar 17, 2017

Sounds like this was caused by a different PR (see the comment on the JIRA) and is now being fixed by @windpiger, so never mind here (and thanks @windpiger for looking into this!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants