-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-14323] [SQL] fix the show functions by using catalog listFunctions #12104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
LGTM |
|
@rxin @andrewor14 Please advise. |
|
add to whitelist |
|
deep-review this please |
| pattern match { | ||
| case Some(p) => | ||
| try { | ||
| catalog.listFunctions(database, p).map{f: FunctionIdentifier => Row(f.name)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Detected style violation.
Suggested improvement:
catalog.listFunctions(database, p).map { f => Row(f.name) }
|
@andrewor14 Review complete. No major issues found. |
|
Test build #54708 has finished for PR 12104 at commit
|
|
@andrewor14 do you want me to change listTables() and InMemoryCatalog as well for the pattern matching? The original implementation has the potential issue as this one. Please let me know. Thanks. |
|
Yes, please change those as well. |
|
Test build #54720 has finished for PR 12104 at commit
|
|
Test build #54718 has finished for PR 12104 at commit
|
|
retest this please |
|
Test build #54732 has finished for PR 12104 at commit
|
|
Test build #54741 has finished for PR 12104 at commit
|
|
@andrewor14 All tests should be passed at this time. The other failed test builds did not pick up my latest commits. |
|
|
||
| private def filterPattern(names: Seq[String], pattern: String): Seq[String] = { | ||
| val regex = pattern.replaceAll("\\*", ".*").r | ||
| val regex = pattern.replaceAll("\\.\\*", "\\*").replaceAll("\\*", "\\.\\*").r |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding this change, I think it depends on what is the wildcard for any character(s). If it is * (following Hive), then, we should keep the original line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hive also supports .* though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Hive only literally supports .*. For something like, show functions like 'year.*';, it returns nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewor14 If you want to include my changes in your PR, that is fine. I did not see your changes for that though.
@yhuai I think we need to have consistent behavior as Hive does. Hive supports query like "show functions like 'f' " or "show functions like 'f.*' ", for the first case, our current implementation will cause exception and return empty, the second case could missing some items.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
show functions like '*f*' and show functions like 'f.*' actually return different result. If you have a function called f, 'f.*' will not return f to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that * and | are only two allowed wildcards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like Hive's doc does not really match its implementation. I also took a look at postgres (http://www.postgresql.org/docs/9.1/static/app-psql.html#APP-PSQL-PATTERNS) and impala (http://www.cloudera.com/documentation/archive/impala/2-x/2-1-x/topics/impala_show.html#show_functions_unique_1). * will be interpreted as .* in regex. Also, looking at our current implementation, we do not really have a super clear syntax for the pattern. I feel it will be better to decide the syntax first and then make the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I do see the difference. I will revisit my implementation and provide more consistent implementation shortly. Thanks @yhuai
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a deep investigation of pattern matching for LIKE in show tables/functions. Here is what I found: Hive only supports * and | as wildcards. Use "." to replace "" is right. The only issue is ShowFunctions() in commands.scala currently did not use it, thus cause the test cases fail. By using listFunctions() in SessionCatalog, the problem should be resolved.
I am closing this PR. Thanks.
What changes were proposed in this pull request?
The syntax of "SHOW FUNCTIONS" can be found here:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions
By leveraging catalog.listFunctions(), we can get the proper list of functions with proper regex.
Please check listFunctions() in SessionCatalog.scala to see how "*" get escaped.
If we do not escape "*", for pattern "f", it will cause exception (PatternSyntaxException, Dangling meta character) and thus return empty result.
The original approach is to replace "" with "." to make regex work, but it will fail on the following case:
Suppose functions: sha, sha1, sha2; Query string "sha." (sha will not be shown because query becomes "sha.." - notice 2 dots)
How was this patch tested?
I have added test cases for this issue and also verified that it has the consistent result from Hive.