Skip to content

[SPARK-16284][SQL] Implement reflect SQL function#13969

Closed
petermaxlee wants to merge 4 commits intoapache:masterfrom
petermaxlee:reflect
Closed

[SPARK-16284][SQL] Implement reflect SQL function#13969
petermaxlee wants to merge 4 commits intoapache:masterfrom
petermaxlee:reflect

Conversation

@petermaxlee
Copy link
Contributor

@petermaxlee petermaxlee commented Jun 29, 2016

What changes were proposed in this pull request?

This patch implements reflect SQL function, which can be used to invoke a Java method in SQL. Slightly different from Hive, this implementation requires the class name and the method name to be literals. This implementation also supports only a smaller number of data types.

java_method is an alias for reflect, so this should also resolve SPARK-16277.

How was this patch tested?

Added expression unit tests and an end-to-end test.

@petermaxlee
Copy link
Contributor Author

cc @rxin @cloud-fan @dongjoon-hyun

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it similar to StaticInvoke?

Copy link
Contributor Author

@petermaxlee petermaxlee Jun 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out. It looks similar, but has some subtle differences:

  1. This one can invoke non-static methods.
  2. This one does type conversion, and as a result is more user facing. StaticInvoke seems to be used in internal implementations?
  3. This is a SQL function - why was StaticInvoke a "nonSQL" function?
  4. This one supports non-codegen.

Perhaps we can push this in, and I can look into whether it'd make sense to consolidate the two?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one can invoke non-static methods? How do we pass in the object reference?

I'm ok to leave them separated as this is one is userfacing and StaticInvoke is used internally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It assumes there is a no-arg constructor and creates an instance of the class automatically. That's what reflect does in Hive.

@SparkQA
Copy link

SparkQA commented Jun 29, 2016

Test build #3146 has finished for PR 13969 at commit dfe3480.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class JavaMethodReflect(children: Seq[Expression])

@SparkQA
Copy link

SparkQA commented Jun 30, 2016

Test build #3152 has finished for PR 13969 at commit 0e43c95.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class JavaMethodReflect(children: Seq[Expression])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's hive's rule? This looks reasonable but I wanna make sure we don't miss anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive follows the same thing for the subset we are supporting here. Hive however also supports timestamps, decimals, etc, that this one is not supporting yet.

@petermaxlee
Copy link
Contributor Author

I brought this up to date. Any comments on the pull request?

@cloud-fan
Copy link
Contributor

cc @dongjoon-hyun to take a look

@dongjoon-hyun
Copy link
Member

Oh, sure. @cloud-fan .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use eval() instead of eval(null).

@dongjoon-hyun
Copy link
Member

By the way, could we change the expression name?
I know the reason why you choose JavaMethodReflect, but we can call Scala functions, too.

select reflect('scala.Console','println', 'a')

I prefer something simple like just case class Reflect. How do you think about this, @cloud-fan and @petermaxlee ?

@dongjoon-hyun
Copy link
Member

In general, the noticeable big difference from Hive seems to be the limitation on first and second arguments. Hive supports columns for them, too.
IMHO, we can achieve that by putting all the @transient private lazy val into eval function and moving the following feature into findMethod or other places.

    } else if (method == null) {
      TypeCheckFailure("cannot find a method that matches the argument types")

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jul 6, 2016

Sorry for late to the party. I've done my first pass.
This PR looks very valuable and nice.

@rxin
Copy link
Contributor

rxin commented Jul 6, 2016

@dongjoon-hyun it's ok to not support the hive case here. Majority of the use cases will be calling some literal functions anyway.

@dongjoon-hyun
Copy link
Member

Oh, if then, no problem. :)

@petermaxlee
Copy link
Contributor Author

JavaReflectMethod isn't the best, but calling it Reflect is pretty bad because there are many "Reflect" classes in various libraries -- just within Spark dependencies there are 3 classes called Reflect.

@petermaxlee
Copy link
Contributor Author

Alright - I've updated it. The expression is now called Reflect, and it prints different messages depending on whether it is a class not found or a method not found.

@petermaxlee
Copy link
Contributor Author

Ping!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a single space after newline, e.g. '\n->\n `?
In many cases, we do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jul 8, 2016

Oh, sorry, @petermaxlee .
It seems we are calling Utils.classForName(className) three places.
Could we reduce the number of calling? I'll leave a comment about this in code, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this is a boolean. Can we use this for val clazz: Class[_] instead?
For false, it could be null.

Copy link
Member

@dongjoon-hyun dongjoon-hyun Jul 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is the following.

- } else if (!classExists) {
+ } else if (clazz.getOrElse(null) == null) {
...
- @transient private lazy val classExists = Reflect.classExists(className)
+ @transient private lazy val clazz = Reflect.findClass(className)
...
- private def classExists(className: String): Boolean = { ... }
+ private def findClass(className: String): Try[Class[_]] = Try(Utils.classForName(className))
...
- Reflect.findMethod(className, methodName, ...
+ Reflect.findMethod(clazz.get, methodName, ...
...
- Reflect.instantiate(className).orNull.asInstanceOf[Object]
+ Reflect.instantiate(clazz.get).orNull.asInstanceOf[Object]
...
- def findMethod(className: String, methodName: String, ...
+ def findMethod(clazz: Class[_], methodName: String, ...
...
- def instantiate(className: String): Option[Any] = {
+ def instantiate(clazz: Class[_]): Option[Any] = {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd make unit test more annoying to write. I kind of prefer doing it this way, since the cost of creating a class 3 times is very small given it's created only once.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's forget about Try. It's not a good style, too.

BTW, do you mean Utils.classForName is called once in this PR?

given it's created only once.

Copy link
Contributor

@hvanhovell hvanhovell Jul 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about timestamps, dates, decimals, arrays, maps? I suppose structs are entirely out of the question? If they are please document this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me add a comment saying only string is supported for now.

@hvanhovell
Copy link
Contributor

@petermaxlee are you sure that we shouldn't implement this using RuntimeReplaceable, using NewInstance -> Invoke or a StaticInvoke? The added value is that this support code generation, and reduces code duplication.

@petermaxlee
Copy link
Contributor Author

@hvanhovell in its current form we'd need some refactoring to StaticInvoke to work with this, due to Hive allowing both static invocation and dynamic invocation.

Also - does StaticInvoke do type conversion? If not, it seems like extra work is needed in order to work with RuntimeReplaceable, due to a limitation that it does not really support proper type checking for expressions.

As for String, I remember @rxin making a comment somewhere that string type is sufficient for now.

@dongjoon-hyun
Copy link
Member

If you mean the following mention, what @rxin said was not about the return type. It's about function and method name, isn't it? Is there any other mentions about String?

Majority of the use cases will be calling some literal functions anyway.

@petermaxlee
Copy link
Contributor Author

I've pushed a new commit that addresses the review comments.

@SparkQA
Copy link

SparkQA commented Jul 9, 2016

Test build #3174 has finished for PR 13969 at commit d24ca97.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
    • case class ParseUrl(children: Seq[Expression])

usage = "_FUNC_(class,method[,arg1[,arg2..]]) calls method with reflection",
extended = "> SELECT _FUNC_('java.util.UUID', 'randomUUID');\n c33fb387-8500-4bfa-81d2-6e0e3e930df2")
// scalastyle:on line.size.limit
case class Reflect(children: Seq[Expression])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reflect is really ambiguous, how about CallMethod?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually named it JavaMethodReflect before but @dongjoon-hyun asked to use Reflect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also annoying if we search for reflect (based on the name) and then doesn't find an expression with reflect in the name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya. It's my fault. Sorry for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what's a good name? I am not attached to Reflect, but I think Reflect should be in the name, if the function is called reflect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CallMethodUsingReflect?

@cloud-fan
Copy link
Contributor

what's hive's behaviour if calling a non-static method but the class doesn't have no-arg constructor? null or exception?

@rxin
Copy link
Contributor

rxin commented Jul 11, 2016

One thing ... it might be better to remove the ability to call non-static methods. At least to me it'd make the things slightly simpler and more clear.

@rxin
Copy link
Contributor

rxin commented Jul 11, 2016

You can also remove half of the test cases.

@petermaxlee
Copy link
Contributor Author

I submitted a new pull request #14138 and in that version only static methods are supported.

asfgit pushed a commit that referenced this pull request Jul 13, 2016
## What changes were proposed in this pull request?
This patch implements reflect SQL function, which can be used to invoke a Java method in SQL. Slightly different from Hive, this implementation requires the class name and the method name to be literals. This implementation also supports only a smaller number of data types, and requires the function to be static, as suggested by rxin in #13969.

java_method is an alias for reflect, so this should also resolve SPARK-16277.

## How was this patch tested?
Added expression unit tests and an end-to-end test.

Author: petermaxlee <petermaxlee@gmail.com>

Closes #14138 from petermaxlee/reflect-static.
asfgit pushed a commit that referenced this pull request Jul 13, 2016
## What changes were proposed in this pull request?
This patch implements reflect SQL function, which can be used to invoke a Java method in SQL. Slightly different from Hive, this implementation requires the class name and the method name to be literals. This implementation also supports only a smaller number of data types, and requires the function to be static, as suggested by rxin in #13969.

java_method is an alias for reflect, so this should also resolve SPARK-16277.

## How was this patch tested?
Added expression unit tests and an end-to-end test.

Author: petermaxlee <petermaxlee@gmail.com>

Closes #14138 from petermaxlee/reflect-static.

(cherry picked from commit 56bd399)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@petermaxlee
Copy link
Contributor Author

I am going to close this one since #14138 has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants