Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21567][SQL] Dataset should work with type alias #18813

Closed
wants to merge 3 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Aug 2, 2017

What changes were proposed in this pull request?

If we create a type alias for a type workable with Dataset, the type alias doesn't work with Dataset.

A reproducible case looks like:

object C {
  type TwoInt = (Int, Int)
  def tupleTypeAlias: TwoInt = (1, 1)
}

Seq(1).toDS().map(_ => ("", C.tupleTypeAlias))

It throws an exception like:

type T1 is not a class
scala.ScalaReflectionException: type T1 is not a class
  at scala.reflect.api.Symbols$SymbolApi$class.asClass(Symbols.scala:275)
  ...

This patch accesses the dealias of type in many places in ScalaReflection to fix it.

How was this patch tested?

Added test case.

@SparkQA
Copy link

SparkQA commented Aug 2, 2017

Test build #80151 has finished for PR 18813 at commit 8a4b63d.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Aug 2, 2017

retest this please.

@SparkQA
Copy link

SparkQA commented Aug 2, 2017

Test build #80157 has finished for PR 18813 at commit 8a4b63d.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Aug 2, 2017

retest this please.

@SparkQA
Copy link

SparkQA commented Aug 2, 2017

Test build #80160 has finished for PR 18813 at commit 8a4b63d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Aug 2, 2017

cc @cloud-fan @hvanhovell

@viirya
Copy link
Member Author

viirya commented Aug 4, 2017

ping @cloud-fan May you have time to look at this? Thanks.

@viirya
Copy link
Member Author

viirya commented Aug 7, 2017

ping @cloud-fan @hvanhovell Can you help to review this change? Thanks.

@cloud-fan
Copy link
Contributor

is it possible to dealias the Type at the very beginning and avoid touching so many places?

@@ -94,7 +94,7 @@ object ScalaReflection extends ScalaReflection {
* JVM form instead of the Scala Array that handles auto boxing.
*/
private def arrayClassFor(tpe: `Type`): ObjectType = {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arrayClassFor is called at many place. The typical calling pattern looks like:

        val TypeRef(_, _, Seq(elementType)) = tpe
        arrayClassFor(elementType)

So instead of dealiasing when calling, we dealiase it here.

@@ -62,7 +62,7 @@ object ScalaReflection extends ScalaReflection {
def dataTypeFor[T : TypeTag]: DataType = dataTypeFor(localTypeOf[T])

private def dataTypeFor(tpe: `Type`): DataType = {
Copy link
Member Author

@viirya viirya Aug 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataTypeFor can be called like this at many places:

val TypeRef(_, _, Seq(optType)) = t
val unwrapped = UnwrapOption(dataTypeFor(optType), inputObject)

So we need to dealias it too.

@@ -193,7 +193,7 @@ object ScalaReflection extends ScalaReflection {
case _ => UpCast(expr, expected, walkedTypePath)
}

tpe match {
tpe.dealias match {
Copy link
Member Author

@viirya viirya Aug 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for deserializerFor. deserializerFor can call itself. It has many entrance points. So we need to dealias its given type parameter.

@@ -469,7 +469,7 @@ object ScalaReflection extends ScalaReflection {
}
}

tpe match {
tpe.dealias match {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For serializerFor. The same reason as deserializerFor.

@@ -690,7 +690,7 @@ object ScalaReflection extends ScalaReflection {
/*
* Retrieves the runtime class corresponding to the provided type.
*/
def getClassFromType(tpe: Type): Class[_] = mirror.runtimeClass(tpe.typeSymbol.asClass)
def getClassFromType(tpe: Type): Class[_] = mirror.runtimeClass(tpe.dealias.typeSymbol.asClass)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be saved from dealiasing. I'll remove it.

@@ -705,7 +705,7 @@ object ScalaReflection extends ScalaReflection {

/** Returns a catalyst DataType and its nullability for the given Scala Type using reflection. */
def schemaFor(tpe: `Type`): Schema = {
tpe match {
tpe.dealias match {
Copy link
Member Author

@viirya viirya Aug 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be saved from dealiasing.

@@ -775,7 +775,7 @@ object ScalaReflection extends ScalaReflection {
* Whether the fields of the given type is defined entirely by its constructor parameters.
*/
def definedByConstructorParams(tpe: Type): Boolean = {
tpe <:< localTypeOf[Product] || tpe <:< localTypeOf[DefinedByConstructorParams]
tpe.dealias <:< localTypeOf[Product] || tpe.dealias <:< localTypeOf[DefinedByConstructorParams]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be saved from dealiasing. I'll remove it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. no. definedByConstructorParams is called in ExpressionEncoder too. So we should do dealias here.

@@ -829,7 +829,7 @@ trait ScalaReflection {
* synthetic classes, emulating behaviour in Java bytecode.
*/
def getClassNameFromType(tpe: `Type`): String = {
tpe.erasure.typeSymbol.asClass.fullName
tpe.dealias.erasure.typeSymbol.asClass.fullName
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed.

val dealiasedTpe = tpe.dealias
val formalTypeArgs = dealiasedTpe.typeSymbol.asClass.typeParams
val TypeRef(_, _, actualTypeArgs) = dealiasedTpe
val params = constructParams(dealiasedTpe)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed.

@@ -864,7 +865,7 @@ trait ScalaReflection {
}

protected def constructParams(tpe: Type): Seq[Symbol] = {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called at different points. So it's needed too.

@viirya
Copy link
Member Author

viirya commented Aug 8, 2017

@cloud-fan I identified only one place getClassFromType we can remove the dealias. As there is only one and it's a public method, I prefer to keep it in case the future caller doesn't do dealias, except for you strongly feel we should remove it. Thanks.

@@ -34,6 +34,11 @@ import org.apache.spark.sql.types._
case class TestDataPoint(x: Int, y: Double, s: String, t: TestDataPoint2)
case class TestDataPoint2(x: Int, s: String)

object TestForTypeAlias {
type TwoInt = (Int, Int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also test nested type alias

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry what the nested type alias means?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like type TwoIntSeq = Seq[TwoInt]?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like

type TwoInt = (Int, Int)
type ThreeInt = (TowInt, Int)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Added another test for this case.

@cloud-fan
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Aug 8, 2017

Test build #80374 has finished for PR 18813 at commit 312c7b0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 8, 2017

Test build #80377 has finished for PR 18813 at commit 031d1d3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in ee13041 Aug 8, 2017
@cloud-fan
Copy link
Contributor

cloud-fan commented Aug 8, 2017

thanks, merging to master!

asfgit pushed a commit that referenced this pull request Aug 8, 2017
If we create a type alias for a type workable with Dataset, the type alias doesn't work with Dataset.

A reproducible case looks like:

    object C {
      type TwoInt = (Int, Int)
      def tupleTypeAlias: TwoInt = (1, 1)
    }

    Seq(1).toDS().map(_ => ("", C.tupleTypeAlias))

It throws an exception like:

    type T1 is not a class
    scala.ScalaReflectionException: type T1 is not a class
      at scala.reflect.api.Symbols$SymbolApi$class.asClass(Symbols.scala:275)
      ...

This patch accesses the dealias of type in many places in `ScalaReflection` to fix it.

Added test case.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #18813 from viirya/SPARK-21567.

(cherry picked from commit ee13041)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@viirya
Copy link
Member Author

viirya commented Aug 8, 2017

Thanks you @cloud-fan.

MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
If we create a type alias for a type workable with Dataset, the type alias doesn't work with Dataset.

A reproducible case looks like:

    object C {
      type TwoInt = (Int, Int)
      def tupleTypeAlias: TwoInt = (1, 1)
    }

    Seq(1).toDS().map(_ => ("", C.tupleTypeAlias))

It throws an exception like:

    type T1 is not a class
    scala.ScalaReflectionException: type T1 is not a class
      at scala.reflect.api.Symbols$SymbolApi$class.asClass(Symbols.scala:275)
      ...

This patch accesses the dealias of type in many places in `ScalaReflection` to fix it.

Added test case.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes apache#18813 from viirya/SPARK-21567.

(cherry picked from commit ee13041)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@viirya viirya deleted the SPARK-21567 branch December 27, 2023 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants