Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-39579][SQL][PYTHON][R] Make ListFunctions/getFunction/functionExists compatible with 3 layer namespace #36977

Closed

Conversation

zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Jun 24, 2022

What changes were proposed in this pull request?

Make ListFunctions/getFunction/functionExists compatible with 3 layer namespace

Why are the changes needed?

to support 3 layer namespace

Does this PR introduce any user-facing change?

yes

How was this patch tested?

added UT

@github-actions github-actions bot added the SQL label Jun 24, 2022
@zhengruifeng zhengruifeng force-pushed the sql_3L_catalog_list_functions branch from aef63ca to 80ebb13 Compare June 28, 2022 08:20
@github-actions github-actions bot added the R label Jun 28, 2022
@zhengruifeng zhengruifeng force-pushed the sql_3L_catalog_list_functions branch from f7acee4 to 67e812c Compare June 29, 2022 02:28
@zhengruifeng zhengruifeng changed the title [SPARK-39579][SQL][WIP] Make ListFunctions compatible with 3 layer namespace [SPARK-39579][SQL] Make ListFunctions compatible with 3 layer namespace Jun 29, 2022
@zhengruifeng zhengruifeng marked this pull request as ready for review June 29, 2022 10:12
@zhengruifeng zhengruifeng changed the title [SPARK-39579][SQL] Make ListFunctions compatible with 3 layer namespace [SPARK-39579][SQL] Make ListFunctions/getFunction/functionExists compatible with 3 layer namespace Jun 30, 2022
@zhengruifeng zhengruifeng force-pushed the sql_3L_catalog_list_functions branch from 67e812c to fe583c7 Compare June 30, 2022 02:44
@zhengruifeng zhengruifeng changed the title [SPARK-39579][SQL] Make ListFunctions/getFunction/functionExists compatible with 3 layer namespace [SPARK-39579][SQL][WIP] Make ListFunctions/getFunction/functionExists compatible with 3 layer namespace Jun 30, 2022
@zhengruifeng zhengruifeng force-pushed the sql_3L_catalog_list_functions branch from fe583c7 to eb98e66 Compare June 30, 2022 08:32
@zhengruifeng zhengruifeng changed the title [SPARK-39579][SQL][WIP] Make ListFunctions/getFunction/functionExists compatible with 3 layer namespace [SPARK-39579][SQL][PYTHON] Make ListFunctions/getFunction/functionExists compatible with 3 layer namespace Jun 30, 2022
@zhengruifeng zhengruifeng force-pushed the sql_3L_catalog_list_functions branch from 775970c to 6d2e578 Compare June 30, 2022 13:12
@@ -2139,16 +2139,16 @@ class Analyzer(override val catalogManager: CatalogManager)
}

def lookupBuiltinOrTempFunction(name: Seq[String]): Option[ExpressionInfo] = {
if (name.length == 1) {
v1SessionCatalog.lookupBuiltinOrTempFunction(name.head)
if (name.length == 1 || name.length == 3) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to support look up temp function with a 3-layer ident.

@zhengruifeng zhengruifeng changed the title [SPARK-39579][SQL][PYTHON] Make ListFunctions/getFunction/functionExists compatible with 3 layer namespace [SPARK-39579][SQL][PYTHON][R] Make ListFunctions/getFunction/functionExists compatible with 3 layer namespace Jun 30, 2022
@zhengruifeng zhengruifeng force-pushed the sql_3L_catalog_list_functions branch 2 times, most recently from b21756d to 93c8793 Compare July 1, 2022 01:28
@zhengruifeng
Copy link
Contributor Author

all tests passed, just rebase to fix conflicts

@zhengruifeng
Copy link
Contributor Author

cc @cloud-fan @amaliujia could you please take a look when you find some time?

@cloud-fan
Copy link
Contributor

@zhengruifeng sorry it has conflicts now...

@zhengruifeng
Copy link
Contributor Author

@cloud-fan dont worry, let me update the pr

@zhengruifeng zhengruifeng force-pushed the sql_3L_catalog_list_functions branch from 93c8793 to a3a50b9 Compare July 1, 2022 02:20
@@ -2080,7 +2080,7 @@ class Analyzer(override val catalogManager: CatalogManager)
throw QueryCompilationErrors.expectPersistentFuncError(
nameParts.head, cmd, mismatchHint, u)
} else {
ResolvedNonPersistentFunc(nameParts.head, V1Function(info))
ResolvedNonPersistentFunc(nameParts.last, V1Function(info))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it a bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be a bug. should always be the name:

/**
 * A plan containing resolved non-persistent (temp or built-in) function.
 */
case class ResolvedNonPersistentFunc(
    name: String,
    func: UnboundFunction)
  extends LeafNodeWithoutStats {
  override def output: Seq[Attribute] = Nil
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a separate PR to fix this bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, my change was confusing, it is not a bug:

        lookupBuiltinOrTempFunction(nameParts)
          .orElse(lookupBuiltinOrTempTableFunction(nameParts))

could be non-empty only when nameparts.length == 1, so nameParts.head is fine.

@@ -2139,16 +2139,16 @@ class Analyzer(override val catalogManager: CatalogManager)
}

def lookupBuiltinOrTempFunction(name: Seq[String]): Option[ExpressionInfo] = {
if (name.length == 1) {
v1SessionCatalog.lookupBuiltinOrTempFunction(name.head)
if (name.length == 1 || name.length == 3) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to look up temp function with name = spark_default.default.add

to look up temp function with name = spark_default.default.+

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me try to handle system functions and user functions seperately, then the changes in Analyzer could be reverted.

// a qualified namespace with catalog name. We assume it's a single database name
// and check if we can find the dbName in sessionCatalog. If so we listFunctions under
// that database. Otherwise we try 3-part name parsing and locate the database.
if (sessionCatalog.databaseExists(dbName) || sessionCatalog.isGlobalTempViewDB(dbName)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to check global temp view db, as we are listing functions.

}

private def makeFunction(funcIdent: FunctionIdentifier): Function = {
val metadata = sessionCatalog.lookupFunctionInfo(funcIdent)
val metadata = try {
Some(sessionCatalog.getFunctionMetadata(funcIdent))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we make this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me revert this

@zhengruifeng zhengruifeng force-pushed the sql_3L_catalog_list_functions branch from 1ca922c to 1a491a3 Compare July 1, 2022 07:03
val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(functionName)
val catalog =
sparkSession.sessionState.catalogManager.catalog(ident(0)).asFunctionCatalog
catalog.functionExists(Identifier.of(Array(ident(1)), ident(2)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes the first name part is always the catalog, which is not the case. We may use the current catalog. I think we should rely on the analyzer and use UnresolvedFunc.

init

add class name

fix ut

add getFunction & functionExists

add ut for testcat

update the py side

nit
@zhengruifeng zhengruifeng force-pushed the sql_3L_catalog_list_functions branch from ec0784b to 4ea562c Compare July 2, 2022 01:19
@zhengruifeng
Copy link
Contributor Author

@cloud-fan could you please take another look? thanks!

also cc @HyukjinKwon if you are intrested in this.

@zhengruifeng
Copy link
Contributor Author

gentle ping @cloud-fan

@@ -288,19 +301,75 @@ def functionExists(self, functionName: str, dbName: Optional[str] = None) -> boo
name of the database to check function existence in.
If no database is specified, the current database is used

.. deprecated:: 3.4.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amaliujia we need to do the same deprecation in scala

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack will follow up on the scala side.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in a2c1038 Jul 5, 2022
@zhengruifeng
Copy link
Contributor Author

@cloud-fan Thank you for reivew

@zhengruifeng zhengruifeng deleted the sql_3L_catalog_list_functions branch July 5, 2022 03:29
Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM2

"a future version. Use functionExists(`dbName.tableName`) instead.",
FutureWarning,
)
return self._jcatalog.functionExists(self.currentDatabase(), functionName)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, here should be self._jcatalog.functionExists(dbName, functionName)

let me fix it in a follow up

zhengruifeng added a commit that referenced this pull request Jul 5, 2022
…me) when dbName is not None

### What changes were proposed in this pull request?
fix functionExists(functionName, dbName)

### Why are the changes needed?
#36977 introduce a bug in `functionExists(functionName, dbName)`, when dbName is not None, should call `self._jcatalog.functionExists(dbName, functionName)`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
existing testsuite

Closes #37088 from zhengruifeng/py_3l_fix_functionExists.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants