[SPARK-12604] [CORE] Java count(AprroxDistinct)ByKey methods return Scala Long not Java #10554

srowen · 2016-01-02T13:09:12Z

Change Java countByKey, countApproxDistinctByKey return types to use Java Long, not Scala; update similar methods for consistency on java.long.Long.valueOf with no API change

SparkQA · 2016-01-02T15:26:47Z

Test build #2297 has finished for PR 10554 at commit 1a27421.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-01-04T01:06:35Z

Does this actually change anything w.r.t. bytecode singature?

srowen · 2016-01-04T09:14:16Z

I don't think it does. It may cause source incompatibility since the generic type changes. You can see an example of a fix/change that can happen in the caller in the example that changed.

srowen · 2016-01-04T09:15:15Z

streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala

@@ -848,6 +848,6 @@ object JavaPairDStream {

  def scalaToJavaLong[K: ClassTag](dstream: JavaPairDStream[K, Long])
  : JavaPairDStream[K, JLong] = {
-    DStream.toPairDStreamFunctions(dstream.dstream).mapValues(new JLong(_))
+    DStream.toPairDStreamFunctions(dstream.dstream).mapValues(JLong.valueOf)


Changed for consistency; valueOf is slightly more efficient than the constructor since the JDK actually caches Long objects for small values.

rxin · 2016-01-04T20:05:39Z

@srowen If this doesn't change any signature, I think it actually makes things slower by adding another layer of iterators. Maybe it'd make more sense to just change the signature rather than doing actual "conversions".

srowen · 2016-01-04T20:26:15Z

It doesn't compile that way though since the values are Scala Longs and the signature says Java Longs. One way or the other, such a conversion has to happen somewhere. If it works for anyone it's because they're already doing this in the calling code.

rxin · 2016-01-04T20:44:34Z

One thing I don't understand is how can these methods return scala types at the bytecode level? Scala Long is just a primitive long. All the places you find are generics, which cannot be primitive types, and as a result I assume they are all java boxed types?

srowen · 2016-01-04T21:02:52Z

Yeah that's a good point -- let me see if I can understand that better. At some level, it has to be an Object and not a long in order to be part of a java.util.Map, so something is transforming something. It may be that implicits convert it to a java.lang.Long in the end, maybe via RichLong, which would be pretty good news -- this is just then fixing the signature and making explicit some implicit conversion and maybe even making it more efficient. Let me experiment a bit

srowen · 2016-01-04T21:51:38Z

@rxin so I tried unpacking this code in IntelliJ:

val m = new java.util.HashMap[String, java.lang.Long]()
val l = 1L
m.put("foo", l)

... and it tells me that the implicit conversion that is applied here is Predef.long2Long:

implicit def long2Long(x: Long)           = java.lang.Long.valueOf(x)

It looks like this is the same underlying transformation that happens in SerializableMapWrapper and mapAsSerializableJavaMap already. So it looks to me like this is already the implicit transformation being applied anyway.

Right now the returned type of these methods in Java is Map<K, Object> and the wrapper does the conversion on the fly in get(). I believe. I think to make it a real Map<K, Long> requires making the transformation upfront -- well, short of making some other new special purpose wrapper.

WDYT?

srowen · 2016-01-04T22:20:34Z

... or I suppose the implementation can just cast to java.util.Map[K, java.lang.Long] since that's safe. It's less change, and mimics what callers are doing anyway in Java.

rxin · 2016-01-04T22:28:00Z

Yea your latest suggestion sounds great (a very tiny change).

srowen · 2016-01-05T11:34:25Z

@rxin OK I almost did that. I realized that JavaRDD.countByValue already does a mapValues. I left countByKey to act the same way, doing the mapping. Other methods that return JavaPairRDD went back to doing a cast. (Although similar methods in the streaming API actually also do a mapValues.) I also tried to use JLong vs jl.Long consistently within a file without going overboard; both of these files use about every different form. I think the result is more consistent internally, but WDYT? I'm neutral, and would further change things while keeping them consistent if anyone had a preference.

SparkQA · 2016-01-05T13:26:31Z

Test build #48759 has finished for PR 10554 at commit 39fa6e7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-01-06T06:48:19Z

core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala

@@ -288,17 +288,18 @@ class JavaPairRDD[K, V](val rdd: RDD[(K, V)])
   * immediately to the master as a Map. This will also perform the merging locally on each mapper
   * before sending results to a reducer, similarly to a "combiner" in MapReduce.
   */
-  def reduceByKeyLocally(func: JFunction2[V, V, V]): java.util.Map[K, V] =


fwiw, i actually think spelling out java.util.Map is more clear ...

…Java Long, not Scala; update similar methods for consistency on java.long.Long.valueOf with no API change

SparkQA · 2016-01-06T13:53:46Z

Test build #48847 has finished for PR 10554 at commit 293b5e4.

This patch fails from timeout after a configured wait of 250m.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-01-06T15:54:07Z

Test build #2335 has finished for PR 10554 at commit 293b5e4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-01-07T01:17:25Z

I'm going to merge this. Thanks Sean.

rxin · 2016-01-07T01:22:47Z

core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala

@@ -448,7 +448,7 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends Serializable {
   * combine step happens locally on the master, equivalent to running a single reduce task.
   */
  def countByValue(): java.util.Map[T, jl.Long] =
-    mapAsSerializableJavaMap(rdd.countByValue().map((x => (x._1, new jl.Long(x._2)))))
+    mapAsSerializableJavaMap(rdd.countByValue().mapValues(jl.Long.valueOf))


@srowen how come there is still a mapValues here?

See my reply at #10554 (comment) -- it was because I then saw this was how countByKey had been implemented. I picked consistency, but it's as reasonable to change countByKey I suppose. Do you feel strongly about it one way or the other?

srowen reviewed Jan 4, 2016
View reviewed changes

rxin reviewed Jan 6, 2016
View reviewed changes

Change Java countByKey, countApproxDistinctByKey return types to use …

293b5e4

…Java Long, not Scala; update similar methods for consistency on java.long.Long.valueOf with no API change

srowen force-pushed the SPARK-12604 branch from 39fa6e7 to 293b5e4 Compare January 6, 2016 09:30

asfgit closed this in ac56cf6 Jan 7, 2016

rxin reviewed Jan 7, 2016
View reviewed changes

srowen deleted the SPARK-12604 branch January 7, 2016 09:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-12604] [CORE] Java count(AprroxDistinct)ByKey methods return Scala Long not Java #10554

[SPARK-12604] [CORE] Java count(AprroxDistinct)ByKey methods return Scala Long not Java #10554

srowen commented Jan 2, 2016

SparkQA commented Jan 2, 2016

rxin commented Jan 4, 2016

srowen commented Jan 4, 2016

srowen Jan 4, 2016

rxin Jan 6, 2016

rxin commented Jan 4, 2016

srowen commented Jan 4, 2016

rxin commented Jan 4, 2016

srowen commented Jan 4, 2016

srowen commented Jan 4, 2016

srowen commented Jan 4, 2016

rxin commented Jan 4, 2016

srowen commented Jan 5, 2016

SparkQA commented Jan 5, 2016

rxin Jan 6, 2016

SparkQA commented Jan 6, 2016

SparkQA commented Jan 6, 2016

rxin commented Jan 7, 2016

rxin Jan 7, 2016

srowen Jan 7, 2016

[SPARK-12604] [CORE] Java count(AprroxDistinct)ByKey methods return Scala Long not Java #10554

[SPARK-12604] [CORE] Java count(AprroxDistinct)ByKey methods return Scala Long not Java #10554

Conversation

srowen commented Jan 2, 2016

SparkQA commented Jan 2, 2016

rxin commented Jan 4, 2016

srowen commented Jan 4, 2016

srowen Jan 4, 2016

Choose a reason for hiding this comment

rxin Jan 6, 2016

Choose a reason for hiding this comment

rxin commented Jan 4, 2016

srowen commented Jan 4, 2016

rxin commented Jan 4, 2016

srowen commented Jan 4, 2016

srowen commented Jan 4, 2016

srowen commented Jan 4, 2016

rxin commented Jan 4, 2016

srowen commented Jan 5, 2016

SparkQA commented Jan 5, 2016

rxin Jan 6, 2016

Choose a reason for hiding this comment

SparkQA commented Jan 6, 2016

SparkQA commented Jan 6, 2016

rxin commented Jan 7, 2016

rxin Jan 7, 2016

Choose a reason for hiding this comment

srowen Jan 7, 2016

Choose a reason for hiding this comment