Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17683][SQL] Support ArrayType in Literal.apply #15257

Closed
wants to merge 7 commits into from

Conversation

maropu
Copy link
Member

@maropu maropu commented Sep 27, 2016

What changes were proposed in this pull request?

This pr is to add pattern-matching entries for array data in Literal.apply.

How was this patch tested?

Added tests in LiteralExpressionSuite.

@SparkQA
Copy link

SparkQA commented Sep 27, 2016

Test build #65958 has finished for PR 15257 at commit c46e14c.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

case i: CalendarInterval => Literal(i, CalendarIntervalType)
case null => Literal(null, NullType)
case v: Literal => v
case _ =>
throw new RuntimeException("Unsupported literal type " + v.getClass + " " + v)
}

private def componentTypeToDataType(clz: Class[_]): DataType = clz match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks so similar to the other cases where Spark has to map from Scala types to Spark SQL's, like https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L65. When comparing them I noticed that CalendarInterval is not included in your list. Why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just my bad. I'll try use the func. thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaceklaskowski I checked these similar code though, I couldn't find better ideas to use reflection stuffs here in ScalaReflection because Literal#apply cannot catch type signatures in runtime. If possible, any better idea?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find either, but thought I'd ask.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, I keep trying.

@SparkQA
Copy link

SparkQA commented Sep 27, 2016

Test build #65960 has finished for PR 15257 at commit e495ff5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 28, 2016

Test build #66022 has finished for PR 15257 at commit c57d93e.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 28, 2016

Test build #66023 has finished for PR 15257 at commit c9cdd29.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

*/
private[this] def componentTypeToDataType(clz: Class[_]): DataType = clz match {
// primitive types
case c: Class[_] if c == jShort.TYPE => ShortType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the instance matching is not required here, it should be sufficient to use case jShort.TYPE.

Btw, this is a nitpick but I would recommend that classes should start with a capital letter, i.e. java.lang.Short could be imported as JShort rather that jShort. In this case it really doesn't matter, but lower case classes can bite you when your working with pattern matching

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, I see. I'll fix this. thx!

Copy link
Member Author

@maropu maropu Sep 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jodersky BTW, what' a concrete example when the lower case issue in pattern matching occurs? (This is a ask from my interests :) )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an example:

  case object container {def value: Int = 42}
  case object Container {def value: Int = 42}

  def extract(wrapper: Any) = wrapper match {
    case container => container.value // error: value "value" is not a member of Any
    case Container => Container.value // ok
  }

It's not a very good example, maybe something like http://stackoverflow.com/a/6576663/917519 will give you a better use-case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aha, many thanks!

@SparkQA
Copy link

SparkQA commented Sep 29, 2016

Test build #66075 has finished for PR 15257 at commit c843840.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Literal(Decimal(d), DecimalType(Math.max(d.precision, d.scale), d.scale()))
case d: Decimal => Literal(d, DecimalType(Math.max(d.precision, d.scale), d.scale))
case t: Timestamp => Literal(DateTimeUtils.fromJavaTimestamp(t), TimestampType)
case d: Date => Literal(DateTimeUtils.fromJavaDate(d), DateType)
case a: Array[Byte] => Literal(a, BinaryType)
case a: Array[_] =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use schemaFor for this? Looks like we can remove the type inference codes added in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, I tried though, I couldn't use it here because scala Array types are mapped into java ones in run-time, e.g., Array[Int]=>int[].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may not understand your point. Where it will be mapped to java type? You use Array[_] here, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my bad explanation.
For example, I added check codes below;

case a: Array[_] =>
   println("Array[] class in runtime:" + a.getClass.getSimpleName())
   ...

And, then run this code;

scala> Literal(scala.Array(1, 2, 3))
  Array[] class in runtime:int[]

IIUC, int[] is a java native array type and there is no type signature here.
Since schemaFor needs type signatures, I'm afraid we cannot this reflection func here.
There are other workarounds though, I couldn't find them.
Any better idea?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems work:

val t = scala.Array(1, 2, 3)
val classSymbol = runtimeMirror(getClass.getClassLoader).classSymbol(t.getClass)
val tpe = classSymbol.selfType
ScalaReflection.schemaFor(tpe).dataType

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it holds TypeTag, seems we can write this;

  def create[T : TypeTag](v: T): Literal = {
    val ScalaReflection.Schema(dataType, _) = ScalaReflection.schemaFor[T]
    val convert = CatalystTypeConverters.createToCatalystConverter(dataType)
    Literal(convert(v), dataType)
  }

This certainly works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu ya, we will get a generic type back and schemaFor needs to know what the element type is.

Copy link
Member

@jodersky jodersky Sep 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu, I tested my solution in a shell and it worked. To fix the compilation error, simply make the overloaded method ScalaReflection.dataTypeFor public. I don't see any reason for it being private in the first place, apart from it just being a default.
Basically, just remove the "private" from the following in ScalaReflection.scala:
private def dataTypeFor(tpe:Type): DataType = //...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jodersky Can you check the returned DataType? From the code, looks like it will return ObjectType instead of ArrayType for an array type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, you're right. I didn't realize you specifically wanted an ArrayType

*/
private[this] def componentTypeToDataType(clz: Class[_]): DataType = clz match {
// primitive types
case c: Class[_] if c == JavaShort.TYPE => ShortType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matching against an instance of Class[_] is not necessary. A simpler solution is case JavaShort.TYPE => ShortType

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wrote along with HiveInspectors#javaClassToDataType. Both are okay to me and I'll fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, I saw you updated the match. However you can remove the instance check everywhere, including further down. Basically a case c: Class[_] is "equivalent" to an c.isInstanceOf[Class[_]], however that is redundant since your parameter clz already specifies the type to be Class[_]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the latest commit?

Copy link
Member

@jodersky jodersky Sep 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I didn't think about my own advice: matching agains lowercase classOf[X] doesn't work. So either case _ if clz==classOf[] or case c if c == classOf[] are good

@SparkQA
Copy link

SparkQA commented Sep 30, 2016

Test build #66137 has finished for PR 15257 at commit 41b9276.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 30, 2016

Test build #66141 has finished for PR 15257 at commit ebee7e4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -17,14 +17,25 @@

package org.apache.spark.sql.catalyst.expressions

import java.lang.{Boolean => JavaBoolean}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think it is more clear to just say java.lang.Boolean in the code, but this is fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have conventionally used JLong etc rather than JavaLong when doing this kind of rename.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just named these along with other classes such as https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala#L20. Nitpicking though, should we fix these name, too (e.g., JavaDecimal=>JDecimal)?

@rxin
Copy link
Contributor

rxin commented Nov 2, 2016

Merging in master/branch-2.1. Thanks!

@rxin
Copy link
Contributor

rxin commented Nov 2, 2016

Actually I didn't end up merging it. The original ticket wanted Seq support as well. Can you take a look to see how difficult it is to add support for both Seq and java.util.List? Then we should be good to go.

@maropu
Copy link
Member Author

maropu commented Nov 2, 2016

I think it is difficult to support Seq and JList because of type erasure. If it is possible to add a new interface below, we can support these types via ScalaReflection.schemaFor[T].

  def create[T : TypeTag](v: T): Literal = {
    val ScalaReflection.Schema(dataType, _) = ScalaReflection.schemaFor[T]
    val convert = CatalystTypeConverters.createToCatalystConverter(dataType)
    Literal(convert(v), dataType)
  }

@rxin
Copy link
Contributor

rxin commented Nov 2, 2016

OK thanks for looking at this. I'm going to merge this into master/branch-2.1.

asfgit pushed a commit that referenced this pull request Nov 2, 2016
## What changes were proposed in this pull request?

This pr is to add pattern-matching entries for array data in `Literal.apply`.
## How was this patch tested?

Added tests in `LiteralExpressionSuite`.

Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>

Closes #15257 from maropu/SPARK-17683.

(cherry picked from commit 4af0ce2)
Signed-off-by: Reynold Xin <rxin@databricks.com>
@asfgit asfgit closed this in 4af0ce2 Nov 2, 2016
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

This pr is to add pattern-matching entries for array data in `Literal.apply`.
## How was this patch tested?

Added tests in `LiteralExpressionSuite`.

Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>

Closes apache#15257 from maropu/SPARK-17683.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants