-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-18961][SQL] Support SHOW TABLE EXTENDED ... PARTITION
statement
#16373
Conversation
Test build #70470 has finished for PR 16373 at commit
|
@yhuai @hvanhovell @gatorsmile Would you please review this PR when you have time? Thanks! |
val isTemp = catalog.isTemporaryTable(tableIdent) | ||
if (isExtended) { | ||
val information = catalog.getTempViewOrPermanentTableMetadata(tableIdent).toString | ||
Row(database, tableName, isTemp, s"${information}\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know why we need s"${information}\n"
, instead of information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to output a extra '\n' at the end of the whole row, so the output format looks better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, why we do not have the \n
in the other similar cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think only in this case, the output could contains multiple rows, and each row contains multiple lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR also faces the same issue, right? Can you post the outputs of the command?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The statement SHOW TABLE EXTENDED ... PARTITION
requires the partition to be fully matched, so in this case, we output no more than one row for each command.
For example:
spark-sql> SHOW TABLE EXTENDED LIKE 'show_t1' PARTITION(c='Us', d=1);
db2 show_t1 false CatalogPartition(
Partition Values: [c=Us, d=1]
Storage(Location: file:/Users/meituan/workspace/spark/spark-warehouse/db2.db/show_t1/c=Us/d=1, InputFormat: org.apache.hadoop.mapred.TextInputFormat, OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, Serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Properties: [serialization.format=1])
Partition Parameters:{transient_lastDdlTime=1482918646})
Time taken: 0.957 seconds, Fetched 1 row(s)
spark-sql> SHOW TABLE EXTENDED LIKE 'show_t1' PARTITION(c='Us');
Error in query: Partition spec is invalid. The spec (c) must match the partition spec (c, d) defined in table '`db2`.`show_t1`';
BTW, this behavior strictly follows that from HIVE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you post the output of Hive? I am thinking if our outputs look strange. Do we need to improve the function toString?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hive> create table t1(a int, b string) partitioned by (dt string, hour string);
hive> alter table t1 add partition (dt='2017-02-08', hour='17');
hive> alter table t1 add partition (dt='2017-02-08', hour='18');
hive> show table extended like 't1' partition(dt='2017-02-08');
FAILED: SemanticException [Error 10006]: Partition not found {dt=2017-02-08}
hive> show table extended like 't1' partition(dt='2017-02-08', hour='17');
OK
tableName:t1
owner:meituan
location:hdfs://localhost:9000/user/hive/warehouse/t1/dt=2017-02-08/hour=17
inputformat:org.apache.hadoop.mapred.TextInputFormat
outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
columns:struct columns { i32 a, string b}
partitioned:true
partitionColumns:struct partition_columns { string dt, string hour}
totalNumberFiles:0
totalFileSize:0
maxFileSize:0
minFileSize:0
lastAccessTime:0
lastUpdateTime:1486548513945
== SQL == | ||
SHOW TABLE EXTENDED LIKE 'show_t1' PARTITION(c='Us') | ||
^^^ | ||
show_t1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This output is right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is weird, I think I should invest a few time on this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SQLQueryTestSuite
runs the sql and convert the result as a hive compatible sequence of strings. It handles the ShowTablesCommand
this way:
// SHOW TABLES in Hive only output table names, while ours outputs database, table name, isTemp.
case command: ExecutedCommandExec if command.cmd.isInstanceOf[ShowTablesCommand] =>
command.executeCollect().map(_.getString(1))
We may either change this case, or we create a new command ShowTableCommand
to handle the statement SHOW TABLE EXTENDED LIKE ... [PARTITION]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hiveResultString
is only used for testing. I think we should fix it as long as it does not break any test case.
d5ce86a
to
0020c16
Compare
Test build #72575 has finished for PR 16373 at commit
|
Test build #72579 has finished for PR 16373 at commit
|
0424edd
to
1b2a41c
Compare
Test build #74090 has finished for PR 16373 at commit
|
Test build #74091 has finished for PR 16373 at commit
|
@gatorsmile Does this PR looks good? |
@cloud-fan @hvanhovell @gatorsmile Would you please look at this PR? |
Will review this today. :) |
@@ -123,8 +123,12 @@ class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) { | |||
.mkString("\t") | |||
} | |||
// SHOW TABLES in Hive only output table names, while ours outputs database, table name, isTemp. | |||
case command: ExecutedCommandExec if command.cmd.isInstanceOf[ShowTablesCommand] => | |||
command.executeCollect().map(_.getString(1)) | |||
case command@ ExecutedCommandExec(showTables: ShowTablesCommand) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: command@
-> command @
command.executeCollect().map(_.getString(1)) | ||
case command@ ExecutedCommandExec(showTables: ShowTablesCommand) => | ||
if (showTables.isExtended) { | ||
command.executeCollect().map(_.getString(3)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do not have it, any test case failed?
This function is just for testing. If we do not have the related Hive output comparison. We can simplify it to
case command @ ExecutedCommandExec(showTables: ShowTablesCommand) if !showTables.isExtended =>
command.executeCollect().map(_.getString(1))
val isTemp = catalog.isTemporaryTable(tableIdent) | ||
if (isExtended) { | ||
val information = catalog.getTempViewOrPermanentTableMetadata(tableIdent).toString | ||
Row(database, tableName, isTemp, s"${information}\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: -> s"$information\n"
// | ||
// Note: tableIdentifierPattern should be non-empty, otherwise a [[ParseException]] | ||
// should have been thrown by the sql parser. | ||
val tableIdentifier = TableIdentifier(tableIdentifierPattern.get, Some(db)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be consistent, we can rename it to tableIdent
val tableName = table.table | ||
val isTemp = catalog.isTemporaryTable(table) | ||
val information = partition.toString | ||
Seq(Row(database, tableName, isTemp, information)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason this is different from the above case? That is, why not adding \n
?
What is the reason why we only have the negative one in the newly added test cases? |
val tables = | ||
tableIdentifierPattern.map(catalog.listTables(db, _)).getOrElse(catalog.listTables(db)) | ||
tables.map { tableIdent => | ||
val database = tableIdent.database.getOrElse("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we get the current database?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For temporary views, they have empty database.
} | ||
} else { | ||
// Show the information of partitions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when users specify the partition spec, we can only list one table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - this follows the behavior that Hive does.
fa93c80
to
77ca941
Compare
@gatorsmile Added the positive test cases. |
Test build #74451 has finished for PR 16373 at commit
|
// SHOW TABLES in Hive only output table names, while ours outputs database, table name, isTemp. | ||
case command: ExecutedCommandExec if command.cmd.isInstanceOf[ShowTablesCommand] => | ||
// SHOW TABLES in Hive only output table names, while ours output database, table name, isTemp. | ||
case command @ ExecutedCommandExec(showTables: ShowTablesCommand) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case command @ ExecutedCommandExec(showTables: ShowTablesCommand)
->
case command @ ExecutedCommandExec(_: ShowTablesCommand)
val tables = | ||
tableIdentifierPattern.map(catalog.listTables(db, _)).getOrElse(catalog.listTables(db)) | ||
tables.map { tableIdent => | ||
val database = tableIdent.database.getOrElse(db) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For temporary views, database
is empty. val database = tableIdent.database.getOrElse("")
@@ -925,6 +925,26 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach { | |||
} | |||
} | |||
|
|||
test("show table extended ... partition") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a q. What is the reason why we are unable to do it in show-tables.sql
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In show-tables.sql
, we only output the value of the column tableName
, we should verify the schema here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change hiveResultString
to
case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended =>
command.executeCollect().map(_.getString(1))
I did a try. It works. Below is the output.
-- !query 22
SHOW TABLE EXTENDED LIKE 'show_t1' PARTITION(c='Ch', d=1)
-- !query 22 schema
struct<database:string,tableName:string,isTemporary:boolean,information:string>
-- !query 22 output
showdb show_t1 false CatalogPartition(
Partition Values: [c=Ch, d=1]
Storage(Location: file:/Users/xiao/IdeaProjects/sparkDelivery/sql/core/spark-warehouse/showdb.db/show_t1/c=Ch/d=1)
Partition Parameters:{})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it works, but it outputs the absolute path for Location
, so the test suite will fail on another environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, you just need to improve the function getNormalizedResult
in SQLQueryTestSuite to mask it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll update that later.
Test build #74481 has finished for PR 16373 at commit
|
bede7ae
to
62aee59
Compare
Test build #74510 has finished for PR 16373 at commit
|
Test build #74506 has finished for PR 16373 at commit
|
retest this please. |
Test build #74520 has finished for PR 16373 at commit
|
LGTM |
Thanks! Merging to master. |
…ABLE EXTENDED ### What changes were proposed in this pull request? Invoke the check `DDLUtils.verifyPartitionProviderIsHive()` from V1 implementation of `SHOW TABLE EXTENDED` when partition specs are specified. This PR is some kind of follow up #16373 and #15515. ### Why are the changes needed? To output an user friendly error with recommendation like **" ... partition metadata is not stored in the Hive metastore. To import this information into the metastore, run `msck repair table tableName` "** instead of silently output an empty result. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? By running the affected test suites, in particular: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "hive/test:testOnly *PartitionProviderCompatibilitySuite" ``` Closes #30618 from MaxGekk/show-table-extended-verifyPartitionProviderIsHive. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
…ABLE EXTENDED ### What changes were proposed in this pull request? Invoke the check `DDLUtils.verifyPartitionProviderIsHive()` from V1 implementation of `SHOW TABLE EXTENDED` when partition specs are specified. This PR is some kind of follow up #16373 and #15515. ### Why are the changes needed? To output an user friendly error with recommendation like **" ... partition metadata is not stored in the Hive metastore. To import this information into the metastore, run `msck repair table tableName` "** instead of silently output an empty result. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? By running the affected test suites, in particular: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "hive/test:testOnly *PartitionProviderCompatibilitySuite" ``` Closes #30618 from MaxGekk/show-table-extended-verifyPartitionProviderIsHive. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org> (cherry picked from commit 29096a8) Signed-off-by: HyukjinKwon <gurwls223@apache.org>
…ABLE EXTENDED Invoke the check `DDLUtils.verifyPartitionProviderIsHive()` from V1 implementation of `SHOW TABLE EXTENDED` when partition specs are specified. This PR is some kind of follow up apache#16373 and apache#15515. To output an user friendly error with recommendation like **" ... partition metadata is not stored in the Hive metastore. To import this information into the metastore, run `msck repair table tableName` "** instead of silently output an empty result. Yes. By running the affected test suites, in particular: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "hive/test:testOnly *PartitionProviderCompatibilitySuite" ``` Closes apache#30618 from MaxGekk/show-table-extended-verifyPartitionProviderIsHive. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org> (cherry picked from commit 29096a8) Signed-off-by: Max Gekk <max.gekk@gmail.com>
…ABLE EXTENDED Invoke the check `DDLUtils.verifyPartitionProviderIsHive()` from V1 implementation of `SHOW TABLE EXTENDED` when partition specs are specified. This PR is some kind of follow up apache#16373 and apache#15515. To output an user friendly error with recommendation like **" ... partition metadata is not stored in the Hive metastore. To import this information into the metastore, run `msck repair table tableName` "** instead of silently output an empty result. Yes. By running the affected test suites, in particular: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "hive/test:testOnly *PartitionProviderCompatibilitySuite" ``` Closes apache#30618 from MaxGekk/show-table-extended-verifyPartitionProviderIsHive. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org> (cherry picked from commit 29096a8) Signed-off-by: Max Gekk <max.gekk@gmail.com>
…HOW TABLE EXTENDED ### What changes were proposed in this pull request? Invoke the check `DDLUtils.verifyPartitionProviderIsHive()` from V1 implementation of `SHOW TABLE EXTENDED` when partition specs are specified. This PR is some kind of follow up #16373 and #15515. ### Why are the changes needed? To output an user friendly error with recommendation like **" ... partition metadata is not stored in the Hive metastore. To import this information into the metastore, run `msck repair table tableName` "** instead of silently output an empty result. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? By running the affected test suites, in particular: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *HiveCatalogedDDLSuite" $ build/sbt -Phive-2.3 -Phive-thriftserver "hive/test:testOnly *PartitionProviderCompatibilitySuite" ``` Authored-by: Max Gekk <max.gekkgmail.com> Signed-off-by: HyukjinKwon <gurwls223apache.org> (cherry picked from commit 29096a8) Signed-off-by: Max Gekk <max.gekkgmail.com> Closes #30640 from MaxGekk/show-table-extended-verifyPartitionProviderIsHive-3.0. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
…HOW TABLE EXTENDED ### What changes were proposed in this pull request? Invoke the check `DDLUtils.verifyPartitionProviderIsHive()` from V1 implementation of `SHOW TABLE EXTENDED` when partition specs are specified. This PR is some kind of follow up #16373 and #15515. ### Why are the changes needed? To output an user friendly error with recommendation like **" ... partition metadata is not stored in the Hive metastore. To import this information into the metastore, run `msck repair table tableName` "** instead of silently output an empty result. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? By running the affected test suites, in particular: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *HiveCatalogedDDLSuite" $ build/sbt -Phive-2.3 -Phive-thriftserver "hive/test:testOnly *PartitionProviderCompatibilitySuite" ``` Authored-by: Max Gekk <max.gekkgmail.com> Signed-off-by: HyukjinKwon <gurwls223apache.org> (cherry picked from commit 29096a8) Signed-off-by: Max Gekk <max.gekkgmail.com> Closes #30641 from MaxGekk/show-table-extended-verifyPartitionProviderIsHive-2.4. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
What changes were proposed in this pull request?
We should support the statement
SHOW TABLE EXTENDED LIKE 'table_identifier' PARTITION(partition_spec)
, just like that HIVE does.When partition is specified, the
SHOW TABLE EXTENDED
command should output the information of the partitions instead of the tables.Note that in this statement, we require exact matched partition spec. For example:
How was this patch tested?
Add new test sqls in file
show-tables.sql
.Add new test case in
DDLSuite
.