[WIP][SPARK-20384][SQL] Support value classes and always encoded as underlying type #33316
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The [WIP] is only since solutions to the same issue is already being proposed in #33205. But the code is complete and should work as expected.
What changes were proposed in this pull request?
This PR adds support for using value class in nested case classes. This has previously been proposed in the following PRs: #22309 #27153 #33205
The first PR had the same approach as in this one of always encoding the value class as the underlying type. But as noted by @cloud-fan: #22309 (comment) the implementation was quite spread out in
ScalaReflection.scala
this is addressed in this PR by "unwrapping" the value class only when used inside a case class. This removes the need for the extra argumentinstantiateValueClass
todeserializerFor
and makes the implementation less spread out.The other two are both by @mickjermsurawong-stripe and takes the approach that value class are only "unwrapped" is cases where there there will be no runtime object. This has the advantage of being fully backwards compatible with cases where case class currently work. The partial support for value classes was original added in: #15284
does only add test cases for single column value classes. Therefore it seems like other cases where value classes happen to work like
Option[ValueClass]
orArray[ValueClass]
are mostly accidental and where not thoughtfully designed to have the current behavior.This PR is meant for discussion and alternative approach to #33205 if there is no need for schema backwards compatibility.
Why are the changes needed?
Support value classes in Datasets, so that they can be used in code bases that currently use them in their modeling.
Does this PR introduce any user-facing change?
This add supports for using value class in DataFrames and Datasets that would previously led to exceptions.
The main change the can effect existing code is that previously single column value classes would be encoded as they where simple case class.
To avoid extra complexity this patch always treats value classes as their underlying type.
Therefore given the following case class:
this means that both
Dataset[Int]
andDataset[MyInt]
will have the same schemaBefore this patch the schema of
Dataset[MyInt]
would instead have beenHow was this patch tested?
New unit tests in:
ScalaReflectionSuite.scala
,ExpressionEncoderSuite.scala
,DataFrameSuite.scala
andDatasetSuite.scala
.