[SPARK-19763][SQL]qualified external datasource table location stored in catalog #17095

windpiger · 2017-02-28T03:51:27Z

What changes were proposed in this pull request?

If we create a external datasource table with a non-qualified location , we should qualified it to store in catalog.

CREATE TABLE t(a string)
USING parquet
LOCATION '/path/xx'


CREATE TABLE t1(a string, b string)
USING parquet
PARTITIONED BY(b)
LOCATION '/path/xx'

when we get the table from catalog, the location should be qualified, e.g.'file:/path/xxx'

How was this patch tested?

unit test added

… in catalog

SparkQA · 2017-02-28T04:58:55Z

Test build #73556 has finished for PR 17095 at commit 570ce24.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-28T06:34:35Z

Test build #73565 has finished for PR 17095 at commit 55c525e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-28T11:29:25Z

Test build #73582 has finished for PR 17095 at commit 18ec570.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-28T16:39:26Z

Test build #73591 has finished for PR 17095 at commit 22bef8b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

windpiger · 2017-03-01T01:05:22Z

cc @cloud-fan

cloud-fan · 2017-03-01T02:06:11Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala

@@ -254,7 +254,18 @@ class SessionCatalog(
    val db = formatDatabaseName(tableDefinition.identifier.database.getOrElse(getCurrentDatabase))
    val table = formatTableName(tableDefinition.identifier.table)
    validateName(table)
-    val newTableDefinition = tableDefinition.copy(identifier = TableIdentifier(table, Some(db)))
+
+    val newTableDefinition = if (tableDefinition.storage.locationUri.isDefined) {


would it be easier if locationUri is of type URI?

the URI without schema is also legal, this fix also needed even if it is a URI.
while if it is a URI, we can do this when the URI created.

but why we have to store the full qualified path? What can we gain from this?

if the location without schema like hdfs/file, when we restore it from metastore, we did not know what filesystem where the table stored.

shall we apply it to all locations like database location, partition location?

Yes, they should all be applied this logic~
database has already contain this logic, shall I add the logic of partition in another pr?

windpiger · 2017-03-02T11:21:07Z

retest this please

SparkQA · 2017-03-02T12:52:34Z

Test build #73760 has finished for PR 17095 at commit 7e08045.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-02T17:35:27Z

Test build #73773 has finished for PR 17095 at commit 9932b03.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-03-03T06:30:34Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala

@@ -1843,10 +1843,12 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach {
             |OPTIONS(path "$dir")
           """.stripMargin)
        val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
-        assert(table.location == dir.getAbsolutePath)
+        val dirPath = new Path(dir.getAbsolutePath)
+        val fs = dirPath.getFileSystem(spark.sessionState.newHadoopConf())


Can you create a helper function to avoid the duplicate codes?

ok, thanks~ I add the makeQualifiedPath in this PR

after that PR mereged, I will fix confilct and do this modify.

cloud-fan · 2017-03-09T09:17:34Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala

@@ -230,8 +230,8 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils {
  }

  private def getDBPath(dbName: String): URI = {
-    val warehousePath = s"file:${spark.sessionState.conf.warehousePath.stripPrefix("file:")}"
-    new Path(warehousePath, s"$dbName.db").toUri
+    val warehousePath = makeQualifiedPath(s"${spark.sessionState.conf.warehousePath}")


just write spark.sessionState.conf.warehousePath, no need to wrap it with s""

cloud-fan · 2017-03-09T09:18:42Z

thanks, merging to master!

SparkQA · 2017-03-09T10:16:22Z

Test build #74259 has finished for PR 17095 at commit c7e837c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

windpiger · 2017-03-09T10:51:42Z

retest this please

SparkQA · 2017-03-09T11:02:28Z

Test build #74258 has finished for PR 17095 at commit 0919fea.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
public static class LongWrapper
public static class IntWrapper
case class ResolveInlineTables(conf: CatalystConf) extends Rule[LogicalPlan]
case class CostBasedJoinReorder(conf: CatalystConf) extends Rule[LogicalPlan] with PredicateHelper
case class JoinPlan(itemIds: Set[Int], plan: LogicalPlan, joinConds: Set[Expression], cost: Cost)
case class Cost(rows: BigInt, size: BigInt)
abstract class RepartitionOperation extends UnaryNode
case class FlatMapGroupsWithState(
class CSVOptions(
class UnivocityParser(
trait WatermarkSupport extends UnaryExecNode
case class FlatMapGroupsWithStateExec(

kayousterhout · 2017-03-17T02:03:53Z

I suspect that this PR is the cause of consistent failures in the maven build, in the HiveCatalogedDDLSuite unit test: https://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.sql.hive.execution.HiveCatalogedDDLSuite&test_name=create+temporary+view+using

Based on the error message: https://spark-tests.appspot.com/test-logs/408097945 it looks like the way the path is getting re-written (I think by the code in this PR) is causing Hadoop's path code to barf. The create temporary view using unit test is the only one in that suite that reads from a CSV file, which would explain why that's the only one that's failing. @windpiger or @cloud-fan would one of you mind looking into this?

I filed a JIRA here: https://issues.apache.org/jira/browse/SPARK-19990

kayousterhout · 2017-03-17T05:02:39Z

Sounds like this was caused by a different PR (see the comment on the JIRA) and is now being fixed by @windpiger, so never mind here (and thanks @windpiger for looking into this!)

[SPARK-19763][SQL]qualified external datasource table location stored…

570ce24

… in catalog

fix test failed

55c525e

fix test failed

18ec570

modify test case

22bef8b

cloud-fan reviewed Mar 1, 2017

View reviewed changes

merge with master

7e08045

fix test failed

9932b03

gatorsmile reviewed Mar 3, 2017

View reviewed changes

windpiger added 2 commits March 9, 2017 16:41

merge with master

0919fea

remove useless import

c7e837c

windpiger mentioned this pull request Mar 9, 2017

[SPARK-19235] [SQL] [TESTS] Enable Test Cases in DDLSuite with Hive Metastore #16592

Closed

cloud-fan reviewed Mar 9, 2017

View reviewed changes

asfgit closed this in 274973d Mar 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-19763][SQL]qualified external datasource table location stored in catalog #17095

[SPARK-19763][SQL]qualified external datasource table location stored in catalog #17095

windpiger commented Feb 28, 2017

SparkQA commented Feb 28, 2017

SparkQA commented Feb 28, 2017

SparkQA commented Feb 28, 2017

SparkQA commented Feb 28, 2017

windpiger commented Mar 1, 2017

cloud-fan Mar 1, 2017

windpiger Mar 1, 2017 •

edited

cloud-fan Mar 2, 2017

windpiger Mar 2, 2017

cloud-fan Mar 2, 2017

windpiger Mar 2, 2017

windpiger commented Mar 2, 2017

SparkQA commented Mar 2, 2017

SparkQA commented Mar 2, 2017

gatorsmile Mar 3, 2017

windpiger Mar 3, 2017

cloud-fan Mar 9, 2017

cloud-fan commented Mar 9, 2017

SparkQA commented Mar 9, 2017

windpiger commented Mar 9, 2017

SparkQA commented Mar 9, 2017

kayousterhout commented Mar 17, 2017 •

edited

kayousterhout commented Mar 17, 2017 •

edited

[SPARK-19763][SQL]qualified external datasource table location stored in catalog #17095

[SPARK-19763][SQL]qualified external datasource table location stored in catalog #17095

Conversation

windpiger commented Feb 28, 2017

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Feb 28, 2017

SparkQA commented Feb 28, 2017

SparkQA commented Feb 28, 2017

SparkQA commented Feb 28, 2017

windpiger commented Mar 1, 2017

Choose a reason for hiding this comment

windpiger Mar 1, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

windpiger commented Mar 2, 2017

SparkQA commented Mar 2, 2017

SparkQA commented Mar 2, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented Mar 9, 2017

SparkQA commented Mar 9, 2017

windpiger commented Mar 9, 2017

SparkQA commented Mar 9, 2017

kayousterhout commented Mar 17, 2017 • edited

kayousterhout commented Mar 17, 2017 • edited

windpiger Mar 1, 2017 •

edited

kayousterhout commented Mar 17, 2017 •

edited

kayousterhout commented Mar 17, 2017 •

edited