Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31113][SQL] Add SHOW VIEWS command #27897

Closed
wants to merge 13 commits into from

Conversation

Eric5553
Copy link
Contributor

@Eric5553 Eric5553 commented Mar 13, 2020

What changes were proposed in this pull request?

Previously, user can issue SHOW TABLES to get info of both tables and views.
This PR (SPARK-31113) implements SHOW VIEWS SQL command similar to HIVE to get views only.(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowViews)

Hive -- Only show view names

hive> SHOW VIEWS;
OK
view_1
view_2
...

Spark(Hive-Compatible) -- Only show view names, used in tests and SparkSQLDriver for CLI applications

SHOW VIEWS IN showdb;
view_1
view_2
...

Spark -- Show more information database/viewName/isTemporary

spark-sql> SHOW VIEWS;
userdb	view_1	false
userdb	view_2	false
...

Why are the changes needed?

SHOW VIEWS command provides better granularity to only get information of views.

Does this PR introduce any user-facing change?

Add new SHOW VIEWS SQL command

How was this patch tested?

Add new test show-views.sql and pass existing tests

@SparkQA

This comment has been minimized.

@SparkQA

This comment has been minimized.

@SparkQA

This comment has been minimized.

@Eric5553
Copy link
Contributor Author

cc @gatorsmile @cloud-fan @maropu , thanks!

@maropu
Copy link
Member

maropu commented Mar 14, 2020

Could you add a new entry for this command in the SQL doc? See: https://github.com/parano/spark-1/blob/master/docs/sql-ref-syntax-aux-show-tables.md

@@ -346,6 +346,13 @@ class InMemoryCatalog(
StringUtils.filterPattern(listTables(db), pattern)
}

override def listViews(db: String, pattern: String): Seq[String] = synchronized {
requireDbExists(db)
val views = catalog(db).tables.filter(v => v._2.table.tableType == CatalogTableType.VIEW)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: .filter(_._2.table.tableType == CatalogTableType.VIEW)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated in c5570d2.

parsePlan("SHOW VIEWS IN testcat.ns1.ns2.tbl"),
ShowViewsStatement(UnresolvedNamespace(Seq("testcat", "ns1", "ns2", "tbl")), None))
comparePlans(
parsePlan("SHOW VIEWS IN tbl LIKE '*dog*'"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, add/update tests for SHOW VIEWS and SHOW TABLES.

throw new AnalysisException(
s"The database name is not valid: ${ns.quoted}")
throw new AnalysisException(
s"The database name is not valid: ${ns.quoted}")
Copy link
Member

@maropu maropu Mar 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: (This is not related to this pr though) Can you remove the line break? throw new AnalysisException(s"The database name is not valid: ${ns.quoted}")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, removed the line break for all cases in this file.

-- !query
SHOW VIEWS
-- !query schema
struct<database:string,viewName:string,isTemporary:boolean>
Copy link
Member

@maropu maropu Mar 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This output schema is the same with hive? Can you add some running examples (of Spark and Hive) in the PR description?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just found HiveResult.hiveResultString is used to update query result in Hive compatible form for ShowTablesCommand. Also add transform rule for ShowViewsCommand and updated PR description. Thanks.

@Eric5553
Copy link
Contributor Author

@maropu Thanks for your review, I'll address them soon!

@SparkQA

This comment has been minimized.

docs/sql-ref-syntax-aux-show-tables.md Show resolved Hide resolved
The `SHOW VIEWS` statement returns all the views for an optionally specified database.
Additionally, the output of this statement may be filtered by an optional matching
pattern. If no database is specified then the views are returned from the
current database. Note that both global and local temporary views are also returned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about Note that both global and local temporary views are also returned. -> Note that the command always lists global and temporary views regardless of a given database?

SHOW VIEWS [ { FROM | IN } database_name ] [ LIKE 'regex_pattern' ]
{% endhighlight %}

### Parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this section is basically the same with the SHOW TABLE one: https://github.com/apache/spark/blob/master/docs/sql-ref-syntax-aux-show-tables.md

docs/sql-ref-syntax-aux-show-views.md Show resolved Hide resolved
| default | sam | false |
| default | sam1 | false |
| default | suj | false |
+-----------+------------+--------------+--+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding an example for the isTemporary = true case?

tableIdentifierPattern: Option[String]) extends RunnableCommand {

// The result of SHOW TABLES/SHOW TABLE has three basic columns: database, tableName and
// isTemporary. If `isExtended` is true, append column `information` to the output columns.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you wrongly copied&pasted this comment from ShowTablesCommand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, my mistake. Updated in dc6e3cb.


override def run(sparkSession: SparkSession): Seq[Row] = {
// Since we need to return a Seq of rows, we will call getTables directly
// instead of calling tables in sparkSession.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, in the original comment in ShowTablesCommand, getTable -> listTables?

val db = databaseName.getOrElse(catalog.getCurrentDatabase)

// Show the information of views.
val views =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit style;

    val views = tableIdentifierPattern.map(catalog.listViews(db, _))
      .getOrElse(catalog.listViews(db, "*"))

@@ -755,6 +755,10 @@ private[hive] class HiveClientImpl(
client.getTablesByPattern(dbName, pattern).asScala
}

override def listViews(dbName: String, pattern: String): Seq[String] = withHiveState {
shim.getTablesByType(client, dbName, pattern, HiveTableType.VIRTUAL_VIEW)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plz add tests in the hive side, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, updated in dc6e3cb. Not sure if it's enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea and I think the added ones are good enough.

@SparkQA

This comment has been minimized.

@SparkQA

This comment has been minimized.

@maropu
Copy link
Member

maropu commented Mar 17, 2020

retest this please

@SparkQA

This comment has been minimized.

@maropu
Copy link
Member

maropu commented Mar 17, 2020

The overall looks fine now and anyone can check this? @HyukjinKwon @dongjoon-hyun @cloud-fan

@SparkQA

This comment has been minimized.

@SparkQA

This comment has been minimized.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA

This comment has been minimized.

@SparkQA

This comment has been minimized.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA

This comment has been minimized.

@SparkQA
Copy link

SparkQA commented Apr 6, 2020

Test build #120878 has finished for PR 27897 at commit 069fbc1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -117,6 +117,52 @@ class HiveCommandSuite extends QueryTest with SQLTestUtils with TestHiveSingleto
}
}

test("show views") {
withView("default1a", "default2b", "temp1", "global_temp.temp2") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @Eric5553 . Is this correct? Maybe, the following?
default1a -> show1a?
default2b -> show2b?
temp1 -> global_temp.temp1?
global_temp.temp2 -> temp2?

@dongjoon-hyun
Copy link
Member

Could you add like the following to sql-ref-syntax.md? Currently, it's missing there.

docs/sql-ref-syntax.md
 - [SHOW TBLPROPERTIES](sql-ref-syntax-aux-show-tblproperties.html)
+- [SHOW VIEWS](sql-ref-syntax-aux-show-views.html)
 - [UNCACHE TABLE](sql-ref-syntax-aux-cache-uncache-table.html)

CREATE VIEW sam1 AS SELECT id, salary FROM employee WHERE name = 'sam1';
CREATE VIEW suj AS SELECT id, salary FROM employee WHERE name = 'suj';
USE userdb;
CREATE VIEW user1 AS SELECT id, salary FROM employee WHERE name = 'user1';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, if employee is a table in default. This will fail. This should be default.employee instead of employee because we switched the database at line 59.

If employee is a temp view, it's valid. However, in this example, we use temp* prefix convention for temp views. So, employee is not a temp view.

@@ -50,6 +50,10 @@ object HiveResult {
// namespace and table name.
case command : ShowTablesExec =>
command.executeCollect().map(_.getString(1))
// SHOW VIEWS in Hive only outputs table names while our v1 command outputs
// namespace, table name, and isTemp.
Copy link
Member

@dongjoon-hyun dongjoon-hyun Apr 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

table name -> view name?
Maybe, viewName and isTemporary are better.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @Eric5553 . I left four comments. For the others, it looks good to me.

  1. withView(".."): [SPARK-31113][SQL] Add SHOW VIEWS command #27897 (comment)
  2. sql-ref-syntax.md: [SPARK-31113][SQL] Add SHOW VIEWS command #27897 (comment)
  3. employee: [SPARK-31113][SQL] Add SHOW VIEWS command #27897 (comment)
  4. viewName: [SPARK-31113][SQL] Add SHOW VIEWS command #27897 (comment)

I also verified this PR with JDK11 with remote HMS as the last testing. Thank you for working on this for a long time.

@dongjoon-hyun
Copy link
Member

Hi, @juliuszsompolski
After we merge this PR to branch-3.0, is there any other blockers for Simba ODBC Driver?

@juliuszsompolski
Copy link
Contributor

@dongjoon-hyun
Simba has been testing with Spark 3.0 and didn't run into other issues.
With SHOW VIEWS the SQLGetTables operation will run without error, but runs into a number of small correctness problems (all of these are just bugs, because it was never the intended behavior to start issuing SHOW VIEWS based on Hive version in Spark driver; it stems from pieces of shared codebase between Spark and Hive).
What current Simba drivers do is collect the answer to SQLGetTables by calling "SHOW SCHEMAS", and then "SHOW TABLES IN schema" and (now with Hive2.3) "SHOW VIEWS IN schema" for every schema. That results in:

  • local temporary views will be reported in duplicate, once for every schema (Simba did the deduplication of local temp views from SHOW TABLES responses for every schema, but it doesn't apply to SHOW VIEWS as it didn't expect the SHOW VIEWS to happen...)
  • views in schemas will be reported twice, and both times as "TABLE", not as "VIEW" (once from SHOW TABLES, which currently also returns VIEWS, and once from SHOW VIEWS; because of the way they lookup table types, the occurrence from SHOW VIEWS will also appear as TABLE)
  • global temporary views will not be reported (it's an existing issue with Spark 2.4 as well; they never look into global_temp schema, because it's not listed by SHOW SCHEMAS).

Simba will fix all these issues in the next release of their ODBC driver, by switching to use the proper SparkGetTablesOperation metadata operation (and other metadata operations implemented by @wangyum and @AngersZhuuuu last year as well).

So with SHOW VIEWS it's also not perfect because it returns duplicate views, but it is better than currently, because it doesn't fail the call and lets PowerBI connect and Tableau show tables/schemas instead of empty in the schema explorer.

Simba audited their behavior that depend on Hive version in the Spark driver and found the following:

If the Hive version is greater than or equal to 0.13.0 the driver expects the following:
- The server supports server side properties.
If the Hive version is greater than or equal to 1.2.0 the driver expects the following:
- The server supports the UNION syntax without requiring the ALL clause before the UNION clause.
If the Hive version is greater than or equal to 2.2.0 the driver expects the following:
- The server supports the SHOW VIEWS syntax.
If the Hive version is greater than or equal to 3.0.0 the driver expects the following:
- The server supports the GetPrimaryKeys and GetCrossReference API.

GetPrimaryKeys and GetCrossReference is a potential future problem, because Spark does not currently override those Hive operations. But Simba will fix it in next release by making SQLPrimaryKeys and SQLForeignKeys just noops - as Spark does not have any use of primary/foreign keys for now.

@dongjoon-hyun
Copy link
Member

Thank you so much for sharing, @juliuszsompolski !

@Eric5553
Copy link
Contributor Author

Eric5553 commented Apr 7, 2020

Hi, @dongjoon-hyun, thanks so much for your help! And sorry for my mistakes... I've addressed them as you suggested :-)

@dongjoon-hyun
Copy link
Member

Thank you for updating, @Eric5553 .
For the last commit, I verified the test case update locally and new generated HTML doc. The other part is only comment-updates. I'll merge this to master/3.0.

dongjoon-hyun pushed a commit that referenced this pull request Apr 7, 2020
### What changes were proposed in this pull request?
Previously, user can issue `SHOW TABLES` to get info of both tables and views.
This PR (SPARK-31113) implements `SHOW VIEWS` SQL command similar to HIVE to get views only.(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowViews)

**Hive** -- Only show view names
```
hive> SHOW VIEWS;
OK
view_1
view_2
...
```

**Spark(Hive-Compatible)** -- Only show view names, used in tests and `SparkSQLDriver` for CLI applications
```
SHOW VIEWS IN showdb;
view_1
view_2
...
```

**Spark** -- Show more information database/viewName/isTemporary
```
spark-sql> SHOW VIEWS;
userdb	view_1	false
userdb	view_2	false
...
```

### Why are the changes needed?
`SHOW VIEWS` command provides better granularity to only get information of views.

### Does this PR introduce any user-facing change?
Add new `SHOW VIEWS` SQL command

### How was this patch tested?
Add new test `show-views.sql` and pass existing tests

Closes #27897 from Eric5553/ShowViews.

Authored-by: Eric Wu <492960551@qq.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit a28ed86)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun
Copy link
Member

Thank you again, all!

@Eric5553
Copy link
Contributor Author

Eric5553 commented Apr 7, 2020

@maropu @cloud-fan @dongjoon-hyun @HyukjinKwon @juliuszsompolski , many thanks!!!

@SparkQA
Copy link

SparkQA commented Apr 7, 2020

Test build #120922 has finished for PR 27897 at commit f6132c0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
### What changes were proposed in this pull request?
Previously, user can issue `SHOW TABLES` to get info of both tables and views.
This PR (SPARK-31113) implements `SHOW VIEWS` SQL command similar to HIVE to get views only.(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowViews)

**Hive** -- Only show view names
```
hive> SHOW VIEWS;
OK
view_1
view_2
...
```

**Spark(Hive-Compatible)** -- Only show view names, used in tests and `SparkSQLDriver` for CLI applications
```
SHOW VIEWS IN showdb;
view_1
view_2
...
```

**Spark** -- Show more information database/viewName/isTemporary
```
spark-sql> SHOW VIEWS;
userdb	view_1	false
userdb	view_2	false
...
```

### Why are the changes needed?
`SHOW VIEWS` command provides better granularity to only get information of views.

### Does this PR introduce any user-facing change?
Add new `SHOW VIEWS` SQL command

### How was this patch tested?
Add new test `show-views.sql` and pass existing tests

Closes apache#27897 from Eric5553/ShowViews.

Authored-by: Eric Wu <492960551@qq.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
8 participants