[SPARK-16284][SQL] Implement reflect SQL function#13969
[SPARK-16284][SQL] Implement reflect SQL function#13969petermaxlee wants to merge 4 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
is it similar to StaticInvoke?
There was a problem hiding this comment.
Thanks for pointing out. It looks similar, but has some subtle differences:
- This one can invoke non-static methods.
- This one does type conversion, and as a result is more user facing. StaticInvoke seems to be used in internal implementations?
- This is a SQL function - why was StaticInvoke a "nonSQL" function?
- This one supports non-codegen.
Perhaps we can push this in, and I can look into whether it'd make sense to consolidate the two?
There was a problem hiding this comment.
this one can invoke non-static methods? How do we pass in the object reference?
I'm ok to leave them separated as this is one is userfacing and StaticInvoke is used internally.
There was a problem hiding this comment.
It assumes there is a no-arg constructor and creates an instance of the class automatically. That's what reflect does in Hive.
|
Test build #3146 has finished for PR 13969 at commit
|
|
Test build #3152 has finished for PR 13969 at commit
|
There was a problem hiding this comment.
What's hive's rule? This looks reasonable but I wanna make sure we don't miss anything.
There was a problem hiding this comment.
Hive follows the same thing for the subset we are supporting here. Hive however also supports timestamps, decimals, etc, that this one is not supporting yet.
|
I brought this up to date. Any comments on the pull request? |
|
cc @dongjoon-hyun to take a look |
|
Oh, sure. @cloud-fan . |
There was a problem hiding this comment.
You can use eval() instead of eval(null).
|
By the way, could we change the expression name? I prefer something simple like just |
|
In general, the noticeable big difference from Hive seems to be the limitation on |
|
Sorry for late to the party. I've done my first pass. |
|
@dongjoon-hyun it's ok to not support the hive case here. Majority of the use cases will be calling some literal functions anyway. |
|
Oh, if then, no problem. :) |
|
JavaReflectMethod isn't the best, but calling it Reflect is pretty bad because there are many "Reflect" classes in various libraries -- just within Spark dependencies there are 3 classes called Reflect. |
|
Alright - I've updated it. The expression is now called Reflect, and it prints different messages depending on whether it is a class not found or a method not found. |
|
Ping! |
There was a problem hiding this comment.
Could you add a single space after newline, e.g. '\n->\n `?
In many cases, we do that.
|
Oh, sorry, @petermaxlee . |
There was a problem hiding this comment.
Currently, this is a boolean. Can we use this for val clazz: Class[_] instead?
For false, it could be null.
There was a problem hiding this comment.
What I mean is the following.
- } else if (!classExists) {
+ } else if (clazz.getOrElse(null) == null) {
...
- @transient private lazy val classExists = Reflect.classExists(className)
+ @transient private lazy val clazz = Reflect.findClass(className)
...
- private def classExists(className: String): Boolean = { ... }
+ private def findClass(className: String): Try[Class[_]] = Try(Utils.classForName(className))
...
- Reflect.findMethod(className, methodName, ...
+ Reflect.findMethod(clazz.get, methodName, ...
...
- Reflect.instantiate(className).orNull.asInstanceOf[Object]
+ Reflect.instantiate(clazz.get).orNull.asInstanceOf[Object]
...
- def findMethod(className: String, methodName: String, ...
+ def findMethod(clazz: Class[_], methodName: String, ...
...
- def instantiate(className: String): Option[Any] = {
+ def instantiate(clazz: Class[_]): Option[Any] = {There was a problem hiding this comment.
It'd make unit test more annoying to write. I kind of prefer doing it this way, since the cost of creating a class 3 times is very small given it's created only once.
There was a problem hiding this comment.
Let's forget about Try. It's not a good style, too.
BTW, do you mean Utils.classForName is called once in this PR?
given it's created only once.
There was a problem hiding this comment.
What about timestamps, dates, decimals, arrays, maps? I suppose structs are entirely out of the question? If they are please document this.
There was a problem hiding this comment.
Let me add a comment saying only string is supported for now.
|
@petermaxlee are you sure that we shouldn't implement this using |
|
@hvanhovell in its current form we'd need some refactoring to StaticInvoke to work with this, due to Hive allowing both static invocation and dynamic invocation. Also - does StaticInvoke do type conversion? If not, it seems like extra work is needed in order to work with RuntimeReplaceable, due to a limitation that it does not really support proper type checking for expressions. As for String, I remember @rxin making a comment somewhere that string type is sufficient for now. |
|
If you mean the following mention, what @rxin said was not about the return type. It's about
|
|
I've pushed a new commit that addresses the review comments. |
|
Test build #3174 has finished for PR 13969 at commit
|
| usage = "_FUNC_(class,method[,arg1[,arg2..]]) calls method with reflection", | ||
| extended = "> SELECT _FUNC_('java.util.UUID', 'randomUUID');\n c33fb387-8500-4bfa-81d2-6e0e3e930df2") | ||
| // scalastyle:on line.size.limit | ||
| case class Reflect(children: Seq[Expression]) |
There was a problem hiding this comment.
Reflect is really ambiguous, how about CallMethod?
There was a problem hiding this comment.
I actually named it JavaMethodReflect before but @dongjoon-hyun asked to use Reflect.
There was a problem hiding this comment.
It is also annoying if we search for reflect (based on the name) and then doesn't find an expression with reflect in the name.
There was a problem hiding this comment.
Ya. It's my fault. Sorry for that.
There was a problem hiding this comment.
So what's a good name? I am not attached to Reflect, but I think Reflect should be in the name, if the function is called reflect.
|
what's hive's behaviour if calling a non-static method but the class doesn't have no-arg constructor? null or exception? |
|
One thing ... it might be better to remove the ability to call non-static methods. At least to me it'd make the things slightly simpler and more clear. |
|
You can also remove half of the test cases. |
|
I submitted a new pull request #14138 and in that version only static methods are supported. |
## What changes were proposed in this pull request? This patch implements reflect SQL function, which can be used to invoke a Java method in SQL. Slightly different from Hive, this implementation requires the class name and the method name to be literals. This implementation also supports only a smaller number of data types, and requires the function to be static, as suggested by rxin in #13969. java_method is an alias for reflect, so this should also resolve SPARK-16277. ## How was this patch tested? Added expression unit tests and an end-to-end test. Author: petermaxlee <petermaxlee@gmail.com> Closes #14138 from petermaxlee/reflect-static.
## What changes were proposed in this pull request? This patch implements reflect SQL function, which can be used to invoke a Java method in SQL. Slightly different from Hive, this implementation requires the class name and the method name to be literals. This implementation also supports only a smaller number of data types, and requires the function to be static, as suggested by rxin in #13969. java_method is an alias for reflect, so this should also resolve SPARK-16277. ## How was this patch tested? Added expression unit tests and an end-to-end test. Author: petermaxlee <petermaxlee@gmail.com> Closes #14138 from petermaxlee/reflect-static. (cherry picked from commit 56bd399) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
|
I am going to close this one since #14138 has been merged. |
What changes were proposed in this pull request?
This patch implements reflect SQL function, which can be used to invoke a Java method in SQL. Slightly different from Hive, this implementation requires the class name and the method name to be literals. This implementation also supports only a smaller number of data types.
java_method is an alias for reflect, so this should also resolve SPARK-16277.
How was this patch tested?
Added expression unit tests and an end-to-end test.