Skip to content

Conversation

@bomeng
Copy link
Contributor

@bomeng bomeng commented Apr 1, 2016

What changes were proposed in this pull request?

The syntax of "SHOW FUNCTIONS" can be found here:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions

By leveraging catalog.listFunctions(), we can get the proper list of functions with proper regex.

Please check listFunctions() in SessionCatalog.scala to see how "*" get escaped.

If we do not escape "*", for pattern "f", it will cause exception (PatternSyntaxException, Dangling meta character) and thus return empty result.

The original approach is to replace "" with "." to make regex work, but it will fail on the following case:
Suppose functions: sha, sha1, sha2; Query string "sha." (sha will not be shown because query becomes "sha.." - notice 2 dots)

How was this patch tested?

I have added test cases for this issue and also verified that it has the consistent result from Hive.

@DeepSparkBot
Copy link

LGTM

@DeepSparkBot
Copy link

@rxin @andrewor14 Please advise.

@andrewor14
Copy link
Contributor

add to whitelist

@andrewor14
Copy link
Contributor

deep-review this please

pattern match {
case Some(p) =>
try {
catalog.listFunctions(database, p).map{f: FunctionIdentifier => Row(f.name)}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detected style violation.

Suggested improvement:

catalog.listFunctions(database, p).map { f => Row(f.name) }

@DeepSparkBot
Copy link

@andrewor14 Review complete. No major issues found.

@SparkQA
Copy link

SparkQA commented Apr 1, 2016

Test build #54708 has finished for PR 12104 at commit b6c3444.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@bomeng
Copy link
Contributor Author

bomeng commented Apr 1, 2016

@andrewor14 do you want me to change listTables() and InMemoryCatalog as well for the pattern matching? The original implementation has the potential issue as this one. Please let me know. Thanks.

@andrewor14
Copy link
Contributor

Yes, please change those as well.

@SparkQA
Copy link

SparkQA commented Apr 1, 2016

Test build #54720 has finished for PR 12104 at commit 0a7c916.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 1, 2016

Test build #54718 has finished for PR 12104 at commit f505855.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Apr 2, 2016

Test build #54732 has finished for PR 12104 at commit b73216e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 2, 2016

Test build #54741 has finished for PR 12104 at commit b73216e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@bomeng
Copy link
Contributor Author

bomeng commented Apr 2, 2016

@andrewor14 All tests should be passed at this time. The other failed test builds did not pick up my latest commits.

@yhuai
Copy link
Contributor

yhuai commented Apr 4, 2016

@bomeng Can you take a look at #12117 (the fix is part of that PR)? I have to fix this issue while working on that.


private def filterPattern(names: Seq[String], pattern: String): Seq[String] = {
val regex = pattern.replaceAll("\\*", ".*").r
val regex = pattern.replaceAll("\\.\\*", "\\*").replaceAll("\\*", "\\.\\*").r
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this change, I think it depends on what is the wildcard for any character(s). If it is * (following Hive), then, we should keep the original line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive also supports .* though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Hive only literally supports .*. For something like, show functions like 'year.*';, it returns nothing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewor14 If you want to include my changes in your PR, that is fine. I did not see your changes for that though.
@yhuai I think we need to have consistent behavior as Hive does. Hive supports query like "show functions like 'f' " or "show functions like 'f.*' ", for the first case, our current implementation will cause exception and return empty, the second case could missing some items.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show functions like '*f*' and show functions like 'f.*' actually return different result. If you have a function called f, 'f.*' will not return f to you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that * and | are only two allowed wildcards.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like Hive's doc does not really match its implementation. I also took a look at postgres (http://www.postgresql.org/docs/9.1/static/app-psql.html#APP-PSQL-PATTERNS) and impala (http://www.cloudera.com/documentation/archive/impala/2-x/2-1-x/topics/impala_show.html#show_functions_unique_1). * will be interpreted as .* in regex. Also, looking at our current implementation, we do not really have a super clear syntax for the pattern. I feel it will be better to decide the syntax first and then make the change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I do see the difference. I will revisit my implementation and provide more consistent implementation shortly. Thanks @yhuai

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a deep investigation of pattern matching for LIKE in show tables/functions. Here is what I found: Hive only supports * and | as wildcards. Use "." to replace "" is right. The only issue is ShowFunctions() in commands.scala currently did not use it, thus cause the test cases fail. By using listFunctions() in SessionCatalog, the problem should be resolved.
I am closing this PR. Thanks.

@bomeng bomeng closed this Apr 4, 2016
@bomeng bomeng deleted the SPARK-14323 branch April 27, 2016 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants