Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20198] [SQL] Remove the inconsistency in table/function name conventions in SparkSession.Catalog APIs #17518

Closed
wants to merge 4 commits into from

Conversation

gatorsmile
Copy link
Member

What changes were proposed in this pull request?

Observed by @felixcheung , in SparkSession.Catalog APIs, we have different conventions/rules for table/function identifiers/names. Most APIs accept the qualified name (i.e., databaseName.tableName or databaseName.functionName). However, the following five APIs do not accept it.

  • def listColumns(tableName: String): Dataset[Column]
  • def getTable(tableName: String): Table
  • def getFunction(functionName: String): Function
  • def tableExists(tableName: String): Boolean
  • def functionExists(functionName: String): Boolean

To make them consistent with the other Catalog APIs, this PR does the changes, updates the function/API comments and adds the @params to clarify the inputs we allow.

How was this patch tested?

Added the test cases .

@gatorsmile gatorsmile changed the title [SPARK-20198] [SQL] Remove the inconsistency when table/function name definitions in SparkSession.Catalog APIs [SPARK-20198] [SQL] Remove the inconsistency in table/function name conventions in SparkSession.Catalog APIs Apr 3, 2017
@SparkQA
Copy link

SparkQA commented Apr 3, 2017

Test build #75473 has finished for PR 17518 at commit dc9fefa.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


// Find an unqualified table using the current database
assert(!spark.catalog.tableExists("tbl_y"))
spark.catalog.setCurrentDatabase(db)
assert(spark.catalog.tableExists("tbl_y"))

// Unable to find the table, although the temp view with the given name exists
assert(!spark.catalog.tableExists(db, "tbl_x"))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new test case is not related to this PR. Just to improve the existing test case coverage.


// Find an unqualified function using the current database
assert(!spark.catalog.functionExists("fn2"))
spark.catalog.setCurrentDatabase(db)
assert(spark.catalog.functionExists("fn2"))

// Unable to find the function, although the temp function with the given name exists
assert(!spark.catalog.functionExists(db, "fn1"))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new test case is not related to this PR. Just to improve the existing test case coverage.

@SparkQA
Copy link

SparkQA commented Apr 3, 2017

Test build #75474 has finished for PR 17518 at commit b3e81b4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

@rxin
Copy link
Contributor

rxin commented Apr 3, 2017

Is this an API change or just a documentation change? The title suggests you are changing public facing APIs?

@gatorsmile
Copy link
Member Author

gatorsmile commented Apr 3, 2017

@rxin This has an API change. After this PR, the following APIs will accept the qualified table/function names.

  • def listColumns(tableName: String): Dataset[Column]
  • def getTable(tableName: String): Table
  • def getFunction(functionName: String): Function
  • def tableExists(tableName: String): Boolean
  • def functionExists(functionName: String): Boolean

For example, before this PR, listColumns("db1.tab") will search a table having the name db1.tab. After this PR, it will search the table having the name tab in the database db1

@@ -383,36 +439,48 @@ abstract class Catalog {
* preserved database `global_temp`, and we must use the qualified name to refer a global temp
* view, e.g. `SELECT * FROM global_temp.view1`.
*
* @param viewName the name of the view to be dropped.
* @param viewName the unqualified name of the temporary view to be dropped.
* @return true if the view is dropped successfully, false otherwise.
* @since 2.1.0
*/
def dropGlobalTempView(viewName: String): Boolean
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, just realized we don't have dropTable in Catalog...

@cloud-fan
Copy link
Contributor

@rxin I think these API changes make sense, it's more intuitive to return columns for tbl1 in db1 when calling listColumns("db1.tbl1"), instead of a special table named db1.tbl1, the previous behavior is kind of bugy and not consistent with other interfaces in Catalog.

LGTM. merging to master!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants