[SPARK-14063][SQL] SQLContext.range should return Dataset[java.lang.Long] #11880
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This patch changed the return type for SQLContext.range from
Dataset[Long]
(Scala primitive) toDataset[java.lang.Long]
(Java boxed long).Previously, SPARK-13894 changed the return type of range from
Dataset[Row]
toDataset[Long]
. The problem is that due to https://issues.scala-lang.org/browse/SI-4388, Scala compiles primitive types in generics into just Object, i.e. range at bytecode level now just returnsDataset[Object]
. This is really bad for Java users because they are losing type safety and also need to add a type cast every time they use range.Talked to Jason Zaugg from Lightbend (Typesafe) who suggested the best approach is to return
Dataset[java.lang.Long]
. The downside is that when Scala users want to explicitly type a closure used on the dataset returned by range, they would need to usejava.lang.Long
instead of the ScalaLong
.How was this patch tested?
The signature change should be covered by existing unit tests and API tests. I also added a new test case in DatasetSuite for range.