New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17683][SQL] Support ArrayType in Literal.apply #15257
Conversation
Test build #65958 has finished for PR 15257 at commit
|
case i: CalendarInterval => Literal(i, CalendarIntervalType) | ||
case null => Literal(null, NullType) | ||
case v: Literal => v | ||
case _ => | ||
throw new RuntimeException("Unsupported literal type " + v.getClass + " " + v) | ||
} | ||
|
||
private def componentTypeToDataType(clz: Class[_]): DataType = clz match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks so similar to the other cases where Spark has to map from Scala types to Spark SQL's, like https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L65. When comparing them I noticed that CalendarInterval
is not included in your list. Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just my bad. I'll try use the func. thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaceklaskowski I checked these similar code though, I couldn't find better ideas to use reflection stuffs here in ScalaReflection
because Literal#apply
cannot catch type signatures in runtime. If possible, any better idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find either, but thought I'd ask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx, I keep trying.
Test build #65960 has finished for PR 15257 at commit
|
Test build #66022 has finished for PR 15257 at commit
|
Test build #66023 has finished for PR 15257 at commit
|
*/ | ||
private[this] def componentTypeToDataType(clz: Class[_]): DataType = clz match { | ||
// primitive types | ||
case c: Class[_] if c == jShort.TYPE => ShortType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the instance matching is not required here, it should be sufficient to use case jShort.TYPE
.
Btw, this is a nitpick but I would recommend that classes should start with a capital letter, i.e. java.lang.Short
could be imported as JShort
rather that jShort
. In this case it really doesn't matter, but lower case classes can bite you when your working with pattern matching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, I see. I'll fix this. thx!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jodersky BTW, what' a concrete example when the lower case issue in pattern matching occurs? (This is a ask from my interests :) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is an example:
case object container {def value: Int = 42}
case object Container {def value: Int = 42}
def extract(wrapper: Any) = wrapper match {
case container => container.value // error: value "value" is not a member of Any
case Container => Container.value // ok
}
It's not a very good example, maybe something like http://stackoverflow.com/a/6576663/917519 will give you a better use-case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aha, many thanks!
Test build #66075 has finished for PR 15257 at commit
|
Literal(Decimal(d), DecimalType(Math.max(d.precision, d.scale), d.scale())) | ||
case d: Decimal => Literal(d, DecimalType(Math.max(d.precision, d.scale), d.scale)) | ||
case t: Timestamp => Literal(DateTimeUtils.fromJavaTimestamp(t), TimestampType) | ||
case d: Date => Literal(DateTimeUtils.fromJavaDate(d), DateType) | ||
case a: Array[Byte] => Literal(a, BinaryType) | ||
case a: Array[_] => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use schemaFor
for this? Looks like we can remove the type inference codes added in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, I tried though, I couldn't use it here because scala Array
types are mapped into java ones in run-time, e.g., Array[Int]
=>int[]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may not understand your point. Where it will be mapped to java type? You use Array[_]
here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my bad explanation.
For example, I added check codes below;
case a: Array[_] =>
println("Array[] class in runtime:" + a.getClass.getSimpleName())
...
And, then run this code;
scala> Literal(scala.Array(1, 2, 3))
Array[] class in runtime:int[]
IIUC, int[]
is a java native array type and there is no type signature here.
Since schemaFor
needs type signatures, I'm afraid we cannot this reflection func here.
There are other workarounds though, I couldn't find them.
Any better idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems work:
val t = scala.Array(1, 2, 3)
val classSymbol = runtimeMirror(getClass.getClassLoader).classSymbol(t.getClass)
val tpe = classSymbol.selfType
ScalaReflection.schemaFor(tpe).dataType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it holds TypeTag
, seems we can write this;
def create[T : TypeTag](v: T): Literal = {
val ScalaReflection.Schema(dataType, _) = ScalaReflection.schemaFor[T]
val convert = CatalystTypeConverters.createToCatalystConverter(dataType)
Literal(convert(v), dataType)
}
This certainly works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu ya, we will get a generic type back and schemaFor
needs to know what the element type is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu, I tested my solution in a shell and it worked. To fix the compilation error, simply make the overloaded method ScalaReflection.dataTypeFor
public. I don't see any reason for it being private in the first place, apart from it just being a default.
Basically, just remove the "private" from the following in ScalaReflection.scala:
private def dataTypeFor(tpe:
Type): DataType = //...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jodersky Can you check the returned DataType
? From the code, looks like it will return ObjectType
instead of ArrayType
for an array type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, you're right. I didn't realize you specifically wanted an ArrayType
*/ | ||
private[this] def componentTypeToDataType(clz: Class[_]): DataType = clz match { | ||
// primitive types | ||
case c: Class[_] if c == JavaShort.TYPE => ShortType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matching against an instance of Class[_]
is not necessary. A simpler solution is case JavaShort.TYPE => ShortType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wrote along with HiveInspectors#javaClassToDataType
. Both are okay to me and I'll fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, I saw you updated the match. However you can remove the instance check everywhere, including further down. Basically a case c: Class[_]
is "equivalent" to an c.isInstanceOf[Class[_]]
, however that is redundant since your parameter clz
already specifies the type to be Class[_]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about the latest commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I didn't think about my own advice: matching agains lowercase classOf[X] doesn't work. So either case _ if clz==classOf[]
or case c if c == classOf[]
are good
Test build #66137 has finished for PR 15257 at commit
|
Test build #66141 has finished for PR 15257 at commit
|
@@ -17,14 +17,25 @@ | |||
|
|||
package org.apache.spark.sql.catalyst.expressions | |||
|
|||
import java.lang.{Boolean => JavaBoolean} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally think it is more clear to just say java.lang.Boolean in the code, but this is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have conventionally used JLong etc rather than JavaLong when doing this kind of rename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just named these along with other classes such as https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala#L20
. Nitpicking though, should we fix these name, too (e.g., JavaDecimal
=>JDecimal
)?
Merging in master/branch-2.1. Thanks! |
Actually I didn't end up merging it. The original ticket wanted Seq support as well. Can you take a look to see how difficult it is to add support for both Seq and java.util.List? Then we should be good to go. |
I think it is difficult to support Seq and JList because of type erasure. If it is possible to add a new interface below, we can support these types via
|
OK thanks for looking at this. I'm going to merge this into master/branch-2.1. |
## What changes were proposed in this pull request? This pr is to add pattern-matching entries for array data in `Literal.apply`. ## How was this patch tested? Added tests in `LiteralExpressionSuite`. Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #15257 from maropu/SPARK-17683. (cherry picked from commit 4af0ce2) Signed-off-by: Reynold Xin <rxin@databricks.com>
## What changes were proposed in this pull request? This pr is to add pattern-matching entries for array data in `Literal.apply`. ## How was this patch tested? Added tests in `LiteralExpressionSuite`. Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes apache#15257 from maropu/SPARK-17683.
What changes were proposed in this pull request?
This pr is to add pattern-matching entries for array data in
Literal.apply
.How was this patch tested?
Added tests in
LiteralExpressionSuite
.