-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-12604] [CORE] Java count(AprroxDistinct)ByKey methods return Scala Long not Java #10554
Conversation
Test build #2297 has finished for PR 10554 at commit
|
Does this actually change anything w.r.t. bytecode singature? |
I don't think it does. It may cause source incompatibility since the generic type changes. You can see an example of a fix/change that can happen in the caller in the example that changed. |
@@ -848,6 +848,6 @@ object JavaPairDStream { | |||
|
|||
def scalaToJavaLong[K: ClassTag](dstream: JavaPairDStream[K, Long]) | |||
: JavaPairDStream[K, JLong] = { | |||
DStream.toPairDStreamFunctions(dstream.dstream).mapValues(new JLong(_)) | |||
DStream.toPairDStreamFunctions(dstream.dstream).mapValues(JLong.valueOf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed for consistency; valueOf
is slightly more efficient than the constructor since the JDK actually caches Long
objects for small values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@srowen If this doesn't change any signature, I think it actually makes things slower by adding another layer of iterators. Maybe it'd make more sense to just change the signature rather than doing actual "conversions". |
It doesn't compile that way though since the values are Scala Longs and the signature says Java Longs. One way or the other, such a conversion has to happen somewhere. If it works for anyone it's because they're already doing this in the calling code. |
One thing I don't understand is how can these methods return scala types at the bytecode level? Scala Long is just a primitive long. All the places you find are generics, which cannot be primitive types, and as a result I assume they are all java boxed types? |
Yeah that's a good point -- let me see if I can understand that better. At some level, it has to be an |
@rxin so I tried unpacking this code in IntelliJ:
... and it tells me that the implicit conversion that is applied here is
It looks like this is the same underlying transformation that happens in Right now the returned type of these methods in Java is WDYT? |
... or I suppose the implementation can just cast to |
Yea your latest suggestion sounds great (a very tiny change). |
@rxin OK I almost did that. I realized that |
Test build #48759 has finished for PR 10554 at commit
|
@@ -288,17 +288,18 @@ class JavaPairRDD[K, V](val rdd: RDD[(K, V)]) | |||
* immediately to the master as a Map. This will also perform the merging locally on each mapper | |||
* before sending results to a reducer, similarly to a "combiner" in MapReduce. | |||
*/ | |||
def reduceByKeyLocally(func: JFunction2[V, V, V]): java.util.Map[K, V] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw, i actually think spelling out java.util.Map is more clear ...
…Java Long, not Scala; update similar methods for consistency on java.long.Long.valueOf with no API change
Test build #48847 has finished for PR 10554 at commit
|
Test build #2335 has finished for PR 10554 at commit
|
I'm going to merge this. Thanks Sean. |
@@ -448,7 +448,7 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends Serializable { | |||
* combine step happens locally on the master, equivalent to running a single reduce task. | |||
*/ | |||
def countByValue(): java.util.Map[T, jl.Long] = | |||
mapAsSerializableJavaMap(rdd.countByValue().map((x => (x._1, new jl.Long(x._2))))) | |||
mapAsSerializableJavaMap(rdd.countByValue().mapValues(jl.Long.valueOf)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen how come there is still a mapValues here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my reply at #10554 (comment) -- it was because I then saw this was how countByKey
had been implemented. I picked consistency, but it's as reasonable to change countByKey
I suppose. Do you feel strongly about it one way or the other?
Change Java countByKey, countApproxDistinctByKey return types to use Java Long, not Scala; update similar methods for consistency on java.long.Long.valueOf with no API change