Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-30039][SQL] CREATE FUNCTION should do multi-catalog resolution #26890

Conversation

planga82
Copy link
Contributor

What changes were proposed in this pull request?

Add CreateFunctionStatement and make CREATE FUNCTION go through the same catalog/table resolution framework of v2 commands.

Why are the changes needed?

It's important to make all the commands have the same table resolution behavior, to avoid confusing
CREATE FUNCTION namespace.function

Does this PR introduce any user-facing change?

Yes. When running CREATE FUNCTION namespace.function Spark fails the command if the current catalog is set to a v2 catalog.

How was this patch tested?

Unit tests.

@planga82
Copy link
Contributor Author

@dongjoon-hyun
Copy link
Member

ok to test.

}

private def parseSessionCatalogFunctionIdentifier(sql: String, catalog: CatalogPlugin,
functionIdent: Identifier): (Option[String], String) = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-  private def parseSessionCatalogFunctionIdentifier(sql: String, catalog: CatalogPlugin,
+  private def parseSessionCatalogFunctionIdentifier(
+      sql: String,
+      catalog: CatalogPlugin,

} else {
throw new AnalysisException ("SHOW FUNCTIONS is only supported in v1 catalog")
}
val (db, fn) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, val (database, function) = for consistency. Oh. Got it. It's used in the above.

}
DescribeFunctionCommand(functionIdentifier, extended)
val (database, function) =
parseSessionCatalogFunctionIdentifier("DESCRIBE FUNCTION", catalog, functionIdent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question. Is it hard to make SessionCatalogAndFunction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be possible in the case that we would have v2 catalog functions to resolve it. But for now we need to handle an specific error for this case

throw new AnalysisException(s"$sql is only supported in v1 catalog")

If we don't handle it here, we find and unresolved exception that is not clear.
It is the same situation that we have with the commands that use parseV1Table function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do it if we make the error message generic (Function command is only supported in v1 catalog.)? Either approach sounds reasonable to me, but +1 for using extractor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what are you proposing exactly. We could have something like that:

case DescribeFunctionStatement(SessionCatalogAndFunction(catalog, functionIdent), extended) =>
...
case DropFunctionStatement(SessionCatalogAndFunction(catalog, functionIdent), ifExists, isTemp) =>
...
case CreateFunctionStatement(SessionCatalogAndFunction(catalog, functionIdent),
      className, resources, isTemp, ignoreIfExists, replace) =>
...

where SessionCatalogAndFunction is an extractor that tests if isSessionCatalog, something like SessionCatalogAndTable do.
In the case that !isSessionCatalog it implies that any of this cases match and the result is
unresolved operator 'CreateFunctionStatement [testcat, ns1, ns2, fun] for Create function

How could we catch this case generically and throw a generic exception like you propose?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK. This requires throwing an exception inside the extractor, which may not be ideal.

@SparkQA
Copy link

SparkQA commented Dec 15, 2019

Test build #115339 has finished for PR 26890 at commit 23920b2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class CreateFunctionStatement(

@SparkQA
Copy link

SparkQA commented Dec 15, 2019

Test build #115355 has finished for PR 26890 at commit c74974c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}
DescribeFunctionCommand(functionIdentifier, extended)
val (database, function) =
parseSessionCatalogFunctionIdentifier("DESCRIBE FUNCTION", catalog, functionIdent)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do it if we make the error message generic (Function command is only supported in v1 catalog.)? Either approach sounds reasonable to me, but +1 for using extractor.

replace)
}

private def parseSessionCatalogFunctionIdentifier(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would return FunctionIdentifier to match the function name, and also using FunctionIdentifier makes the intention / usage clear on the caller site.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, yes, it sounds good!

parseSessionCatalogFunctionIdentifier("DROP FUNCTION", catalog, functionIdent)
DropFunctionCommand(database, function, ifExists, isTemp)

case CreateFunctionStatement(CatalogAndIdentifier(catalog, functionIdent),
Copy link
Contributor

@imback82 imback82 Dec 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use ident when you use CatalogAndIdentifier instead of functionIdent. (And use functionIdent when it references a FunctionIdentifier object).

@SparkQA
Copy link

SparkQA commented Dec 16, 2019

Test build #115368 has finished for PR 26890 at commit 74275ff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

case Seq(db, fn) => FunctionIdentifier(fn, Some(db))
case Seq(fn) => FunctionIdentifier(fn, None)
case _ =>
throw new AnalysisException(s"Unsupported function name '${functionIdent.quoted}'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not caused by this PR. This error message looks too general, and I do not see much info from it.

Maybe we can make it clearer? Like v1 function name should not have multiple namespaces?

Copy link
Contributor Author

@planga82 planga82 Dec 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the show columns statement we decide this message

Namespace name should have only one part if specified

How about the same message or similar? Both are clearer

@SparkQA
Copy link

SparkQA commented Dec 17, 2019

Test build #115409 has finished for PR 26890 at commit 707b037.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

[SPARK-30039][SQL] Pull requests comments


[SPARK-30039][SQL] Pull request comments


[SPARK-30039][SQL] Pull request comments
@planga82 planga82 force-pushed the feature/SPARK-30039_CreateFunctionV2Command branch from 707b037 to 3796a3a Compare January 6, 2020 19:13
@SparkQA
Copy link

SparkQA commented Jan 6, 2020

Test build #116184 has finished for PR 26890 at commit 3796a3a.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class CreateFunctionStatement(

@SparkQA
Copy link

SparkQA commented Jan 6, 2020

Test build #116185 has finished for PR 26890 at commit d9eb441.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@planga82
Copy link
Contributor Author

planga82 commented Jan 7, 2020

Hi @cloud-fan @viirya @imback82 @dongjoon-hyun! Any more comments on this?

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 9479887 Jan 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants