-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21567][SQL] Dataset should work with type alias #18813
Conversation
Test build #80151 has finished for PR 18813 at commit
|
retest this please. |
Test build #80157 has finished for PR 18813 at commit
|
retest this please. |
Test build #80160 has finished for PR 18813 at commit
|
ping @cloud-fan May you have time to look at this? Thanks. |
ping @cloud-fan @hvanhovell Can you help to review this change? Thanks. |
is it possible to dealias the |
@@ -94,7 +94,7 @@ object ScalaReflection extends ScalaReflection { | |||
* JVM form instead of the Scala Array that handles auto boxing. | |||
*/ | |||
private def arrayClassFor(tpe: `Type`): ObjectType = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arrayClassFor is called at many place. The typical calling pattern looks like:
val TypeRef(_, _, Seq(elementType)) = tpe
arrayClassFor(elementType)
So instead of dealiasing when calling, we dealiase it here.
@@ -62,7 +62,7 @@ object ScalaReflection extends ScalaReflection { | |||
def dataTypeFor[T : TypeTag]: DataType = dataTypeFor(localTypeOf[T]) | |||
|
|||
private def dataTypeFor(tpe: `Type`): DataType = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataTypeFor can be called like this at many places:
val TypeRef(_, _, Seq(optType)) = t
val unwrapped = UnwrapOption(dataTypeFor(optType), inputObject)
So we need to dealias it too.
@@ -193,7 +193,7 @@ object ScalaReflection extends ScalaReflection { | |||
case _ => UpCast(expr, expected, walkedTypePath) | |||
} | |||
|
|||
tpe match { | |||
tpe.dealias match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for deserializerFor
. deserializerFor
can call itself. It has many entrance points. So we need to dealias its given type parameter.
@@ -469,7 +469,7 @@ object ScalaReflection extends ScalaReflection { | |||
} | |||
} | |||
|
|||
tpe match { | |||
tpe.dealias match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For serializerFor
. The same reason as deserializerFor
.
@@ -690,7 +690,7 @@ object ScalaReflection extends ScalaReflection { | |||
/* | |||
* Retrieves the runtime class corresponding to the provided type. | |||
*/ | |||
def getClassFromType(tpe: Type): Class[_] = mirror.runtimeClass(tpe.typeSymbol.asClass) | |||
def getClassFromType(tpe: Type): Class[_] = mirror.runtimeClass(tpe.dealias.typeSymbol.asClass) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be saved from dealiasing. I'll remove it.
@@ -705,7 +705,7 @@ object ScalaReflection extends ScalaReflection { | |||
|
|||
/** Returns a catalyst DataType and its nullability for the given Scala Type using reflection. */ | |||
def schemaFor(tpe: `Type`): Schema = { | |||
tpe match { | |||
tpe.dealias match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can't be saved from dealiasing.
@@ -775,7 +775,7 @@ object ScalaReflection extends ScalaReflection { | |||
* Whether the fields of the given type is defined entirely by its constructor parameters. | |||
*/ | |||
def definedByConstructorParams(tpe: Type): Boolean = { | |||
tpe <:< localTypeOf[Product] || tpe <:< localTypeOf[DefinedByConstructorParams] | |||
tpe.dealias <:< localTypeOf[Product] || tpe.dealias <:< localTypeOf[DefinedByConstructorParams] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be saved from dealiasing. I'll remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. no. definedByConstructorParams
is called in ExpressionEncoder
too. So we should do dealias here.
@@ -829,7 +829,7 @@ trait ScalaReflection { | |||
* synthetic classes, emulating behaviour in Java bytecode. | |||
*/ | |||
def getClassNameFromType(tpe: `Type`): String = { | |||
tpe.erasure.typeSymbol.asClass.fullName | |||
tpe.dealias.erasure.typeSymbol.asClass.fullName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed.
val dealiasedTpe = tpe.dealias | ||
val formalTypeArgs = dealiasedTpe.typeSymbol.asClass.typeParams | ||
val TypeRef(_, _, actualTypeArgs) = dealiasedTpe | ||
val params = constructParams(dealiasedTpe) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed.
@@ -864,7 +865,7 @@ trait ScalaReflection { | |||
} | |||
|
|||
protected def constructParams(tpe: Type): Seq[Symbol] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is called at different points. So it's needed too.
@cloud-fan I identified only one place |
@@ -34,6 +34,11 @@ import org.apache.spark.sql.types._ | |||
case class TestDataPoint(x: Int, y: Double, s: String, t: TestDataPoint2) | |||
case class TestDataPoint2(x: Int, s: String) | |||
|
|||
object TestForTypeAlias { | |||
type TwoInt = (Int, Int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also test nested type alias
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry what the nested type alias means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like type TwoIntSeq = Seq[TwoInt]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like
type TwoInt = (Int, Int)
type ThreeInt = (TowInt, Int)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Added another test for this case.
LGTM |
Test build #80374 has finished for PR 18813 at commit
|
Test build #80377 has finished for PR 18813 at commit
|
thanks, merging to master! |
If we create a type alias for a type workable with Dataset, the type alias doesn't work with Dataset. A reproducible case looks like: object C { type TwoInt = (Int, Int) def tupleTypeAlias: TwoInt = (1, 1) } Seq(1).toDS().map(_ => ("", C.tupleTypeAlias)) It throws an exception like: type T1 is not a class scala.ScalaReflectionException: type T1 is not a class at scala.reflect.api.Symbols$SymbolApi$class.asClass(Symbols.scala:275) ... This patch accesses the dealias of type in many places in `ScalaReflection` to fix it. Added test case. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #18813 from viirya/SPARK-21567. (cherry picked from commit ee13041) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Thanks you @cloud-fan. |
If we create a type alias for a type workable with Dataset, the type alias doesn't work with Dataset. A reproducible case looks like: object C { type TwoInt = (Int, Int) def tupleTypeAlias: TwoInt = (1, 1) } Seq(1).toDS().map(_ => ("", C.tupleTypeAlias)) It throws an exception like: type T1 is not a class scala.ScalaReflectionException: type T1 is not a class at scala.reflect.api.Symbols$SymbolApi$class.asClass(Symbols.scala:275) ... This patch accesses the dealias of type in many places in `ScalaReflection` to fix it. Added test case. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes apache#18813 from viirya/SPARK-21567. (cherry picked from commit ee13041) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
If we create a type alias for a type workable with Dataset, the type alias doesn't work with Dataset.
A reproducible case looks like:
It throws an exception like:
This patch accesses the dealias of type in many places in
ScalaReflection
to fix it.How was this patch tested?
Added test case.