-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-31113][SQL] Add SHOW VIEWS command #27897
Conversation
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala
Outdated
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
cc @gatorsmile @cloud-fan @maropu , thanks! |
Could you add a new entry for this command in the SQL doc? See: https://github.com/parano/spark-1/blob/master/docs/sql-ref-syntax-aux-show-tables.md |
@@ -346,6 +346,13 @@ class InMemoryCatalog( | |||
StringUtils.filterPattern(listTables(db), pattern) | |||
} | |||
|
|||
override def listViews(db: String, pattern: String): Seq[String] = synchronized { | |||
requireDbExists(db) | |||
val views = catalog(db).tables.filter(v => v._2.table.tableType == CatalogTableType.VIEW) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: .filter(_._2.table.tableType == CatalogTableType.VIEW)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated in c5570d2.
parsePlan("SHOW VIEWS IN testcat.ns1.ns2.tbl"), | ||
ShowViewsStatement(UnresolvedNamespace(Seq("testcat", "ns1", "ns2", "tbl")), None)) | ||
comparePlans( | ||
parsePlan("SHOW VIEWS IN tbl LIKE '*dog*'"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plz add tests for the optional LIKE case? https://github.com/apache/spark/pull/27897/files#diff-8c1cb2af4aa1109e08481dae79052cc3R227
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, add/update tests for SHOW VIEWS
and SHOW TABLES
.
throw new AnalysisException( | ||
s"The database name is not valid: ${ns.quoted}") | ||
throw new AnalysisException( | ||
s"The database name is not valid: ${ns.quoted}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: (This is not related to this pr though) Can you remove the line break? throw new AnalysisException(s"The database name is not valid: ${ns.quoted}")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, removed the line break for all cases in this file.
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
Outdated
Show resolved
Hide resolved
-- !query | ||
SHOW VIEWS | ||
-- !query schema | ||
struct<database:string,viewName:string,isTemporary:boolean> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This output schema is the same with hive? Can you add some running examples (of Spark and Hive) in the PR description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just found HiveResult.hiveResultString
is used to update query result in Hive compatible form for ShowTablesCommand
. Also add transform rule for ShowViewsCommand
and updated PR description. Thanks.
@maropu Thanks for your review, I'll address them soon! |
This comment has been minimized.
This comment has been minimized.
The `SHOW VIEWS` statement returns all the views for an optionally specified database. | ||
Additionally, the output of this statement may be filtered by an optional matching | ||
pattern. If no database is specified then the views are returned from the | ||
current database. Note that both global and local temporary views are also returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about Note that both global and local temporary views are also returned.
-> Note that the command always lists global and temporary views regardless of a given database
?
SHOW VIEWS [ { FROM | IN } database_name ] [ LIKE 'regex_pattern' ] | ||
{% endhighlight %} | ||
|
||
### Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: this section is basically the same with the SHOW TABLE
one: https://github.com/apache/spark/blob/master/docs/sql-ref-syntax-aux-show-tables.md
| default | sam | false | | ||
| default | sam1 | false | | ||
| default | suj | false | | ||
+-----------+------------+--------------+--+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding an example for the isTemporary = true
case?
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
Outdated
Show resolved
Hide resolved
tableIdentifierPattern: Option[String]) extends RunnableCommand { | ||
|
||
// The result of SHOW TABLES/SHOW TABLE has three basic columns: database, tableName and | ||
// isTemporary. If `isExtended` is true, append column `information` to the output columns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems you wrongly copied&pasted this comment from ShowTablesCommand
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, my mistake. Updated in dc6e3cb.
|
||
override def run(sparkSession: SparkSession): Seq[Row] = { | ||
// Since we need to return a Seq of rows, we will call getTables directly | ||
// instead of calling tables in sparkSession. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, in the original comment in ShowTablesCommand
, getTable
-> listTables
?
val db = databaseName.getOrElse(catalog.getCurrentDatabase) | ||
|
||
// Show the information of views. | ||
val views = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit style;
val views = tableIdentifierPattern.map(catalog.listViews(db, _))
.getOrElse(catalog.listViews(db, "*"))
@@ -755,6 +755,10 @@ private[hive] class HiveClientImpl( | |||
client.getTablesByPattern(dbName, pattern).asScala | |||
} | |||
|
|||
override def listViews(dbName: String, pattern: String): Seq[String] = withHiveState { | |||
shim.getTablesByType(client, dbName, pattern, HiveTableType.VIRTUAL_VIEW) | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plz add tests in the hive side, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, updated in dc6e3cb. Not sure if it's enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea and I think the added ones are good enough.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
retest this please |
This comment has been minimized.
This comment has been minimized.
The overall looks fine now and anyone can check this? @HyukjinKwon @dongjoon-hyun @cloud-fan |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
Outdated
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
retest this please |
This reverts commit 3073255
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
retest this please |
This comment has been minimized.
This comment has been minimized.
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
Outdated
Show resolved
Hide resolved
Test build #120878 has finished for PR 27897 at commit
|
@@ -117,6 +117,52 @@ class HiveCommandSuite extends QueryTest with SQLTestUtils with TestHiveSingleto | |||
} | |||
} | |||
|
|||
test("show views") { | |||
withView("default1a", "default2b", "temp1", "global_temp.temp2") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @Eric5553 . Is this correct? Maybe, the following?
default1a
-> show1a
?
default2b
-> show2b
?
temp1
-> global_temp.temp1
?
global_temp.temp2
-> temp2
?
Could you add like the following to
|
CREATE VIEW sam1 AS SELECT id, salary FROM employee WHERE name = 'sam1'; | ||
CREATE VIEW suj AS SELECT id, salary FROM employee WHERE name = 'suj'; | ||
USE userdb; | ||
CREATE VIEW user1 AS SELECT id, salary FROM employee WHERE name = 'user1'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ur, if employee
is a table in default
. This will fail. This should be default.employee
instead of employee
because we switched the database at line 59.
If employee
is a temp view, it's valid. However, in this example, we use temp*
prefix convention for temp views. So, employee
is not a temp view.
@@ -50,6 +50,10 @@ object HiveResult { | |||
// namespace and table name. | |||
case command : ShowTablesExec => | |||
command.executeCollect().map(_.getString(1)) | |||
// SHOW VIEWS in Hive only outputs table names while our v1 command outputs | |||
// namespace, table name, and isTemp. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
table name
-> view name
?
Maybe, viewName
and isTemporary
are better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @Eric5553 . I left four comments. For the others, it looks good to me.
withView("..")
: [SPARK-31113][SQL] Add SHOW VIEWS command #27897 (comment)sql-ref-syntax.md
: [SPARK-31113][SQL] Add SHOW VIEWS command #27897 (comment)employee
: [SPARK-31113][SQL] Add SHOW VIEWS command #27897 (comment)viewName
: [SPARK-31113][SQL] Add SHOW VIEWS command #27897 (comment)
I also verified this PR with JDK11 with remote HMS as the last testing. Thank you for working on this for a long time.
Hi, @juliuszsompolski |
@dongjoon-hyun
Simba will fix all these issues in the next release of their ODBC driver, by switching to use the proper SparkGetTablesOperation metadata operation (and other metadata operations implemented by @wangyum and @AngersZhuuuu last year as well). So with SHOW VIEWS it's also not perfect because it returns duplicate views, but it is better than currently, because it doesn't fail the call and lets PowerBI connect and Tableau show tables/schemas instead of empty in the schema explorer. Simba audited their behavior that depend on Hive version in the Spark driver and found the following:
GetPrimaryKeys and GetCrossReference is a potential future problem, because Spark does not currently override those Hive operations. But Simba will fix it in next release by making SQLPrimaryKeys and SQLForeignKeys just noops - as Spark does not have any use of primary/foreign keys for now. |
Thank you so much for sharing, @juliuszsompolski ! |
Hi, @dongjoon-hyun, thanks so much for your help! And sorry for my mistakes... I've addressed them as you suggested :-) |
Thank you for updating, @Eric5553 . |
### What changes were proposed in this pull request? Previously, user can issue `SHOW TABLES` to get info of both tables and views. This PR (SPARK-31113) implements `SHOW VIEWS` SQL command similar to HIVE to get views only.(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowViews) **Hive** -- Only show view names ``` hive> SHOW VIEWS; OK view_1 view_2 ... ``` **Spark(Hive-Compatible)** -- Only show view names, used in tests and `SparkSQLDriver` for CLI applications ``` SHOW VIEWS IN showdb; view_1 view_2 ... ``` **Spark** -- Show more information database/viewName/isTemporary ``` spark-sql> SHOW VIEWS; userdb view_1 false userdb view_2 false ... ``` ### Why are the changes needed? `SHOW VIEWS` command provides better granularity to only get information of views. ### Does this PR introduce any user-facing change? Add new `SHOW VIEWS` SQL command ### How was this patch tested? Add new test `show-views.sql` and pass existing tests Closes #27897 from Eric5553/ShowViews. Authored-by: Eric Wu <492960551@qq.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit a28ed86) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Thank you again, all! |
@maropu @cloud-fan @dongjoon-hyun @HyukjinKwon @juliuszsompolski , many thanks!!! |
Test build #120922 has finished for PR 27897 at commit
|
### What changes were proposed in this pull request? Previously, user can issue `SHOW TABLES` to get info of both tables and views. This PR (SPARK-31113) implements `SHOW VIEWS` SQL command similar to HIVE to get views only.(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowViews) **Hive** -- Only show view names ``` hive> SHOW VIEWS; OK view_1 view_2 ... ``` **Spark(Hive-Compatible)** -- Only show view names, used in tests and `SparkSQLDriver` for CLI applications ``` SHOW VIEWS IN showdb; view_1 view_2 ... ``` **Spark** -- Show more information database/viewName/isTemporary ``` spark-sql> SHOW VIEWS; userdb view_1 false userdb view_2 false ... ``` ### Why are the changes needed? `SHOW VIEWS` command provides better granularity to only get information of views. ### Does this PR introduce any user-facing change? Add new `SHOW VIEWS` SQL command ### How was this patch tested? Add new test `show-views.sql` and pass existing tests Closes apache#27897 from Eric5553/ShowViews. Authored-by: Eric Wu <492960551@qq.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
Previously, user can issue
SHOW TABLES
to get info of both tables and views.This PR (SPARK-31113) implements
SHOW VIEWS
SQL command similar to HIVE to get views only.(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowViews)Hive -- Only show view names
Spark(Hive-Compatible) -- Only show view names, used in tests and
SparkSQLDriver
for CLI applicationsSpark -- Show more information database/viewName/isTemporary
Why are the changes needed?
SHOW VIEWS
command provides better granularity to only get information of views.Does this PR introduce any user-facing change?
Add new
SHOW VIEWS
SQL commandHow was this patch tested?
Add new test
show-views.sql
and pass existing tests