New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-19763][SQL]qualified external datasource table location stored in catalog #17095
Conversation
Test build #73556 has finished for PR 17095 at commit
|
Test build #73565 has finished for PR 17095 at commit
|
Test build #73582 has finished for PR 17095 at commit
|
Test build #73591 has finished for PR 17095 at commit
|
cc @cloud-fan |
@@ -254,7 +254,18 @@ class SessionCatalog( | |||
val db = formatDatabaseName(tableDefinition.identifier.database.getOrElse(getCurrentDatabase)) | |||
val table = formatTableName(tableDefinition.identifier.table) | |||
validateName(table) | |||
val newTableDefinition = tableDefinition.copy(identifier = TableIdentifier(table, Some(db))) | |||
|
|||
val newTableDefinition = if (tableDefinition.storage.locationUri.isDefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be easier if locationUri
is of type URI
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the URI without schema is also legal, this fix also needed even if it is a URI.
while if it is a URI, we can do this when the URI created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but why we have to store the full qualified path? What can we gain from this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the location without schema like hdfs/file, when we restore it from metastore, we did not know what filesystem where the table stored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we apply it to all locations like database location, partition location?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they should all be applied this logic~
database has already contain this logic, shall I add the logic of partition in another pr?
retest this please |
Test build #73760 has finished for PR 17095 at commit
|
Test build #73773 has finished for PR 17095 at commit
|
@@ -1843,10 +1843,12 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach { | |||
|OPTIONS(path "$dir") | |||
""".stripMargin) | |||
val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) | |||
assert(table.location == dir.getAbsolutePath) | |||
val dirPath = new Path(dir.getAbsolutePath) | |||
val fs = dirPath.getFileSystem(spark.sessionState.newHadoopConf()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you create a helper function to avoid the duplicate codes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks~ I add the makeQualifiedPath
in this PR
after that PR mereged, I will fix confilct and do this modify.
@@ -230,8 +230,8 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { | |||
} | |||
|
|||
private def getDBPath(dbName: String): URI = { | |||
val warehousePath = s"file:${spark.sessionState.conf.warehousePath.stripPrefix("file:")}" | |||
new Path(warehousePath, s"$dbName.db").toUri | |||
val warehousePath = makeQualifiedPath(s"${spark.sessionState.conf.warehousePath}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just write spark.sessionState.conf.warehousePath
, no need to wrap it with s""
thanks, merging to master! |
Test build #74259 has finished for PR 17095 at commit
|
retest this please |
Test build #74258 has finished for PR 17095 at commit
|
I suspect that this PR is the cause of consistent failures in the maven build, in the HiveCatalogedDDLSuite unit test: https://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.sql.hive.execution.HiveCatalogedDDLSuite&test_name=create+temporary+view+using Based on the error message: https://spark-tests.appspot.com/test-logs/408097945 it looks like the way the path is getting re-written (I think by the code in this PR) is causing Hadoop's path code to barf. The create temporary view using unit test is the only one in that suite that reads from a CSV file, which would explain why that's the only one that's failing. @windpiger or @cloud-fan would one of you mind looking into this? I filed a JIRA here: https://issues.apache.org/jira/browse/SPARK-19990 |
Sounds like this was caused by a different PR (see the comment on the JIRA) and is now being fixed by @windpiger, so never mind here (and thanks @windpiger for looking into this!) |
What changes were proposed in this pull request?
If we create a external datasource table with a non-qualified location , we should qualified it to store in catalog.
when we get the table from catalog, the location should be qualified, e.g.'file:/path/xxx'
How was this patch tested?
unit test added