Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18961][SQL] Support SHOW TABLE EXTENDED ... PARTITION statement #16373

Closed

Conversation

jiangxb1987
Copy link
Contributor

@jiangxb1987 jiangxb1987 commented Dec 21, 2016

What changes were proposed in this pull request?

We should support the statement SHOW TABLE EXTENDED LIKE 'table_identifier' PARTITION(partition_spec), just like that HIVE does.
When partition is specified, the SHOW TABLE EXTENDED command should output the information of the partitions instead of the tables.
Note that in this statement, we require exact matched partition spec. For example:

CREATE TABLE show_t1(a String, b Int) PARTITIONED BY (c String, d String);
ALTER TABLE show_t1 ADD PARTITION (c='Us', d=1) PARTITION (c='Us', d=22);

-- Output the extended information of Partition(c='Us', d=1)
SHOW TABLE EXTENDED LIKE 'show_t1' PARTITION(c='Us', d=1);
-- Throw an AnalysisException
SHOW TABLE EXTENDED LIKE 'show_t1' PARTITION(c='Us');

How was this patch tested?

Add new test sqls in file show-tables.sql.
Add new test case in DDLSuite.

@SparkQA
Copy link

SparkQA commented Dec 21, 2016

Test build #70470 has finished for PR 16373 at commit d5ce86a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor Author

@yhuai @hvanhovell @gatorsmile Would you please review this PR when you have time? Thanks!

val isTemp = catalog.isTemporaryTable(tableIdent)
if (isExtended) {
val information = catalog.getTempViewOrPermanentTableMetadata(tableIdent).toString
Row(database, tableName, isTemp, s"${information}\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why we need s"${information}\n", instead of information

Copy link
Contributor Author

@jiangxb1987 jiangxb1987 Dec 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to output a extra '\n' at the end of the whole row, so the output format looks better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, why we do not have the \n in the other similar cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think only in this case, the output could contains multiple rows, and each row contains multiple lines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR also faces the same issue, right? Can you post the outputs of the command?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statement SHOW TABLE EXTENDED ... PARTITION requires the partition to be fully matched, so in this case, we output no more than one row for each command.
For example:

spark-sql> SHOW TABLE EXTENDED LIKE 'show_t1' PARTITION(c='Us', d=1);
db2	show_t1	false	CatalogPartition(
	Partition Values: [c=Us, d=1]
	Storage(Location: file:/Users/meituan/workspace/spark/spark-warehouse/db2.db/show_t1/c=Us/d=1, InputFormat: org.apache.hadoop.mapred.TextInputFormat, OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, Serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Properties: [serialization.format=1])
	Partition Parameters:{transient_lastDdlTime=1482918646})
Time taken: 0.957 seconds, Fetched 1 row(s)
spark-sql> SHOW TABLE EXTENDED LIKE 'show_t1' PARTITION(c='Us');
Error in query: Partition spec is invalid. The spec (c) must match the partition spec (c, d) defined in table '`db2`.`show_t1`';

BTW, this behavior strictly follows that from HIVE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you post the output of Hive? I am thinking if our outputs look strange. Do we need to improve the function toString?

Copy link
Contributor Author

@jiangxb1987 jiangxb1987 Feb 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hive> create table t1(a int, b string) partitioned by (dt string, hour string);
hive> alter table t1 add partition (dt='2017-02-08', hour='17');
hive> alter table t1 add partition (dt='2017-02-08', hour='18');
hive> show table extended like 't1' partition(dt='2017-02-08');
FAILED: SemanticException [Error 10006]: Partition not found {dt=2017-02-08}
hive> show table extended like 't1' partition(dt='2017-02-08', hour='17');
OK
tableName:t1
owner:meituan
location:hdfs://localhost:9000/user/hive/warehouse/t1/dt=2017-02-08/hour=17
inputformat:org.apache.hadoop.mapred.TextInputFormat
outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
columns:struct columns { i32 a, string b}
partitioned:true
partitionColumns:struct partition_columns { string dt, string hour}
totalNumberFiles:0
totalFileSize:0
maxFileSize:0
minFileSize:0
lastAccessTime:0
lastUpdateTime:1486548513945

== SQL ==
SHOW TABLE EXTENDED LIKE 'show_t1' PARTITION(c='Us')
^^^
show_t1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This output is right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is weird, I think I should invest a few time on this problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SQLQueryTestSuite runs the sql and convert the result as a hive compatible sequence of strings. It handles the ShowTablesCommand this way:

    // SHOW TABLES in Hive only output table names, while ours outputs database, table name, isTemp.
    case command: ExecutedCommandExec if command.cmd.isInstanceOf[ShowTablesCommand] =>
      command.executeCollect().map(_.getString(1))

We may either change this case, or we create a new command ShowTableCommand to handle the statement SHOW TABLE EXTENDED LIKE ... [PARTITION].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hiveResultString is only used for testing. I think we should fix it as long as it does not break any test case.

@SparkQA
Copy link

SparkQA commented Feb 8, 2017

Test build #72575 has finished for PR 16373 at commit 0020c16.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 8, 2017

Test build #72579 has finished for PR 16373 at commit 0424edd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 7, 2017

Test build #74090 has finished for PR 16373 at commit 1b2a41c.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 7, 2017

Test build #74091 has finished for PR 16373 at commit fa93c80.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor Author

@gatorsmile Does this PR looks good?

@jiangxb1987
Copy link
Contributor Author

jiangxb1987 commented Mar 10, 2017

@cloud-fan @hvanhovell @gatorsmile Would you please look at this PR?

@gatorsmile
Copy link
Member

Will review this today. :)

@@ -123,8 +123,12 @@ class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) {
.mkString("\t")
}
// SHOW TABLES in Hive only output table names, while ours outputs database, table name, isTemp.
case command: ExecutedCommandExec if command.cmd.isInstanceOf[ShowTablesCommand] =>
command.executeCollect().map(_.getString(1))
case command@ ExecutedCommandExec(showTables: ShowTablesCommand) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: command@ -> command @

command.executeCollect().map(_.getString(1))
case command@ ExecutedCommandExec(showTables: ShowTablesCommand) =>
if (showTables.isExtended) {
command.executeCollect().map(_.getString(3))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do not have it, any test case failed?

This function is just for testing. If we do not have the related Hive output comparison. We can simplify it to

    case command @ ExecutedCommandExec(showTables: ShowTablesCommand) if !showTables.isExtended =>
      command.executeCollect().map(_.getString(1))

val isTemp = catalog.isTemporaryTable(tableIdent)
if (isExtended) {
val information = catalog.getTempViewOrPermanentTableMetadata(tableIdent).toString
Row(database, tableName, isTemp, s"${information}\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: -> s"$information\n"

//
// Note: tableIdentifierPattern should be non-empty, otherwise a [[ParseException]]
// should have been thrown by the sql parser.
val tableIdentifier = TableIdentifier(tableIdentifierPattern.get, Some(db))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent, we can rename it to tableIdent

val tableName = table.table
val isTemp = catalog.isTemporaryTable(table)
val information = partition.toString
Seq(Row(database, tableName, isTemp, information))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason this is different from the above case? That is, why not adding \n?

@gatorsmile
Copy link
Member

What is the reason why we only have the negative one in the newly added test cases?

val tables =
tableIdentifierPattern.map(catalog.listTables(db, _)).getOrElse(catalog.listTables(db))
tables.map { tableIdent =>
val database = tableIdent.database.getOrElse("")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we get the current database?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For temporary views, they have empty database.

}
} else {
// Show the information of partitions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when users specify the partition spec, we can only list one table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - this follows the behavior that Hive does.

@jiangxb1987
Copy link
Contributor Author

@gatorsmile Added the positive test cases.

@SparkQA
Copy link

SparkQA commented Mar 13, 2017

Test build #74451 has finished for PR 16373 at commit 77ca941.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// SHOW TABLES in Hive only output table names, while ours outputs database, table name, isTemp.
case command: ExecutedCommandExec if command.cmd.isInstanceOf[ShowTablesCommand] =>
// SHOW TABLES in Hive only output table names, while ours output database, table name, isTemp.
case command @ ExecutedCommandExec(showTables: ShowTablesCommand) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case command @ ExecutedCommandExec(showTables: ShowTablesCommand) ->
case command @ ExecutedCommandExec(_: ShowTablesCommand)

val tables =
tableIdentifierPattern.map(catalog.listTables(db, _)).getOrElse(catalog.listTables(db))
tables.map { tableIdent =>
val database = tableIdent.database.getOrElse(db)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For temporary views, database is empty. val database = tableIdent.database.getOrElse("")

@@ -925,6 +925,26 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach {
}
}

test("show table extended ... partition") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a q. What is the reason why we are unable to do it in show-tables.sql?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In show-tables.sql, we only output the value of the column tableName, we should verify the schema here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change hiveResultString to

    case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended =>
      command.executeCollect().map(_.getString(1))

I did a try. It works. Below is the output.


-- !query 22
SHOW TABLE EXTENDED LIKE 'show_t1' PARTITION(c='Ch', d=1)
-- !query 22 schema
struct<database:string,tableName:string,isTemporary:boolean,information:string>
-- !query 22 output
showdb	show_t1	false	CatalogPartition(
	Partition Values: [c=Ch, d=1]
	Storage(Location: file:/Users/xiao/IdeaProjects/sparkDelivery/sql/core/spark-warehouse/showdb.db/show_t1/c=Ch/d=1)
	Partition Parameters:{})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it works, but it outputs the absolute path for Location, so the test suite will fail on another environment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, you just need to improve the function getNormalizedResult in SQLQueryTestSuite to mask it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll update that later.

@SparkQA
Copy link

SparkQA commented Mar 14, 2017

Test build #74481 has finished for PR 16373 at commit b46d771.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 14, 2017

Test build #74510 has finished for PR 16373 at commit 62aee59.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 14, 2017

Test build #74506 has finished for PR 16373 at commit bede7ae.

  • This patch fails PySpark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor Author

retest this please.

@SparkQA
Copy link

SparkQA commented Mar 14, 2017

Test build #74520 has finished for PR 16373 at commit 62aee59.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM

@gatorsmile
Copy link
Member

Thanks! Merging to master.

@asfgit asfgit closed this in a02a0b1 Mar 14, 2017
@jiangxb1987 jiangxb1987 deleted the show-partition-extended branch March 16, 2017 06:42
HyukjinKwon pushed a commit that referenced this pull request Dec 7, 2020
…ABLE EXTENDED

### What changes were proposed in this pull request?
Invoke the check `DDLUtils.verifyPartitionProviderIsHive()` from V1 implementation of `SHOW TABLE EXTENDED` when partition specs are specified.

This PR is some kind of follow up #16373 and #15515.

### Why are the changes needed?
To output an user friendly error with recommendation like
**"
... partition metadata is not stored in the Hive metastore. To import this information into the metastore, run `msck repair table tableName`
"**
instead of silently output an empty result.

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
By running the affected test suites, in particular:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "hive/test:testOnly *PartitionProviderCompatibilitySuite"
```

Closes #30618 from MaxGekk/show-table-extended-verifyPartitionProviderIsHive.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
HyukjinKwon pushed a commit that referenced this pull request Dec 7, 2020
…ABLE EXTENDED

### What changes were proposed in this pull request?
Invoke the check `DDLUtils.verifyPartitionProviderIsHive()` from V1 implementation of `SHOW TABLE EXTENDED` when partition specs are specified.

This PR is some kind of follow up #16373 and #15515.

### Why are the changes needed?
To output an user friendly error with recommendation like
**"
... partition metadata is not stored in the Hive metastore. To import this information into the metastore, run `msck repair table tableName`
"**
instead of silently output an empty result.

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
By running the affected test suites, in particular:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "hive/test:testOnly *PartitionProviderCompatibilitySuite"
```

Closes #30618 from MaxGekk/show-table-extended-verifyPartitionProviderIsHive.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(cherry picked from commit 29096a8)
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
MaxGekk added a commit to MaxGekk/spark that referenced this pull request Dec 7, 2020
…ABLE EXTENDED

Invoke the check `DDLUtils.verifyPartitionProviderIsHive()` from V1 implementation of `SHOW TABLE EXTENDED` when partition specs are specified.

This PR is some kind of follow up apache#16373 and apache#15515.

To output an user friendly error with recommendation like
**"
... partition metadata is not stored in the Hive metastore. To import this information into the metastore, run `msck repair table tableName`
"**
instead of silently output an empty result.

Yes.

By running the affected test suites, in particular:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "hive/test:testOnly *PartitionProviderCompatibilitySuite"
```

Closes apache#30618 from MaxGekk/show-table-extended-verifyPartitionProviderIsHive.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(cherry picked from commit 29096a8)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
MaxGekk added a commit to MaxGekk/spark that referenced this pull request Dec 7, 2020
…ABLE EXTENDED

Invoke the check `DDLUtils.verifyPartitionProviderIsHive()` from V1 implementation of `SHOW TABLE EXTENDED` when partition specs are specified.

This PR is some kind of follow up apache#16373 and apache#15515.

To output an user friendly error with recommendation like
**"
... partition metadata is not stored in the Hive metastore. To import this information into the metastore, run `msck repair table tableName`
"**
instead of silently output an empty result.

Yes.

By running the affected test suites, in particular:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "hive/test:testOnly *PartitionProviderCompatibilitySuite"
```

Closes apache#30618 from MaxGekk/show-table-extended-verifyPartitionProviderIsHive.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(cherry picked from commit 29096a8)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
HyukjinKwon pushed a commit that referenced this pull request Dec 7, 2020
…HOW TABLE EXTENDED

### What changes were proposed in this pull request?
Invoke the check `DDLUtils.verifyPartitionProviderIsHive()` from V1 implementation of `SHOW TABLE EXTENDED` when partition specs are specified.

This PR is some kind of follow up #16373 and #15515.

### Why are the changes needed?
To output an user friendly error with recommendation like
**"
... partition metadata is not stored in the Hive metastore. To import this information into the metastore, run `msck repair table tableName`
"**
instead of silently output an empty result.

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
By running the affected test suites, in particular:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *HiveCatalogedDDLSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "hive/test:testOnly *PartitionProviderCompatibilitySuite"
```

Authored-by: Max Gekk <max.gekkgmail.com>
Signed-off-by: HyukjinKwon <gurwls223apache.org>
(cherry picked from commit 29096a8)
Signed-off-by: Max Gekk <max.gekkgmail.com>

Closes #30640 from MaxGekk/show-table-extended-verifyPartitionProviderIsHive-3.0.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
HyukjinKwon pushed a commit that referenced this pull request Dec 7, 2020
…HOW TABLE EXTENDED

### What changes were proposed in this pull request?
Invoke the check `DDLUtils.verifyPartitionProviderIsHive()` from V1 implementation of `SHOW TABLE EXTENDED` when partition specs are specified.

This PR is some kind of follow up #16373 and #15515.

### Why are the changes needed?
To output an user friendly error with recommendation like
**"
... partition metadata is not stored in the Hive metastore. To import this information into the metastore, run `msck repair table tableName`
"**
instead of silently output an empty result.

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
By running the affected test suites, in particular:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *HiveCatalogedDDLSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "hive/test:testOnly *PartitionProviderCompatibilitySuite"
```

Authored-by: Max Gekk <max.gekkgmail.com>
Signed-off-by: HyukjinKwon <gurwls223apache.org>
(cherry picked from commit 29096a8)
Signed-off-by: Max Gekk <max.gekkgmail.com>

Closes #30641 from MaxGekk/show-table-extended-verifyPartitionProviderIsHive-2.4.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants