Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-14063][SQL] SQLContext.range should return Dataset[java.lang.Long] #11880

Closed
wants to merge 1 commit into from

Conversation

rxin
Copy link
Contributor

@rxin rxin commented Mar 22, 2016

What changes were proposed in this pull request?

This patch changed the return type for SQLContext.range from Dataset[Long] (Scala primitive) to Dataset[java.lang.Long] (Java boxed long).

Previously, SPARK-13894 changed the return type of range from Dataset[Row] to Dataset[Long]. The problem is that due to https://issues.scala-lang.org/browse/SI-4388, Scala compiles primitive types in generics into just Object, i.e. range at bytecode level now just returns Dataset[Object]. This is really bad for Java users because they are losing type safety and also need to add a type cast every time they use range.

Talked to Jason Zaugg from Lightbend (Typesafe) who suggested the best approach is to return Dataset[java.lang.Long]. The downside is that when Scala users want to explicitly type a closure used on the dataset returned by range, they would need to use java.lang.Long instead of the Scala Long.

How was this patch tested?

The signature change should be covered by existing unit tests and API tests. I also added a new test case in DatasetSuite for range.

@cloud-fan
Copy link
Contributor

LGTM

@@ -354,7 +354,7 @@ public void testCountMinSketch() {

@Test
public void testBloomFilter() {
Dataset df = context.range(1000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this code pass compile? Is it because we only turn scalac warnings to errors, but not javac warning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@SparkQA
Copy link

SparkQA commented Mar 22, 2016

Test build #53754 has finished for PR 11880 at commit 8272a1f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Mar 22, 2016

Good catch of course. I grepped around for similar issues and noticed this in JavaDoubleRDD:

  def histogram(bucketCount: Int): (Array[scala.Double], Array[Long]) = {
    val result = srdd.histogram(bucketCount)
    (result._1, result._2)
  }

  def histogram(buckets: Array[scala.Double]): Array[Long] = {
    srdd.histogram(buckets, false)
  }

  def histogram(buckets: Array[JDouble], evenBuckets: Boolean): Array[Long] = {
    srdd.histogram(buckets.map(_.toDouble), evenBuckets)
  }

For a second I thought it was the same problem, but I suppose an Array[scala.Long] is actually long[] and so is already fine in Java. Different from the example you're fixing.

@rxin
Copy link
Contributor Author

rxin commented Mar 22, 2016

@srowen yea I think arrays are fine since those are not generics.

Merging in master.

@asfgit asfgit closed this in 297c202 Mar 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants