New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Doc: PairRDDFunctions.reduceByKey should be stated as requiring a commutative binary op #11091
Conversation
Make the doc more coherent wrt RDD.reduce
Can one of the admins verify this patch? |
That's fine though this is pretty much by definition for reduce. |
@srowen I agree but the difference between the documentation of |
@@ -300,7 +300,7 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) | |||
} | |||
|
|||
/** | |||
* Merge the values for each key using an associative reduce function. This will also perform | |||
* Merge the values for each key using an associative and commutative binary operator. This will also perform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line exceeds 100 chars and will fail style checker
Yea we should make them consistent. Are there more inconsistencies you find? |
Yeah I think there similar statements about an 'associative' operation that really mean 'associative and commutative'. There are more occurrences in this file. There are some in Accumulator.scala, JavaPairRDD.scala, JavaRDDLike.scala, JavaDStreamLike.scala, JavaPairDStream.scala, DStream.scala, PairDStreamFunctions.scala, rdd.py, dstream.py, pairRDD.R. In each |
@YPares are you going to update this or should I continue it? |
@srowen Oh, sorry, I was waiting a bit to see if I found other inconsistencies. |
See my previous message @YPares -- I think I found all the other ones. You're welcome to address them so we can merge your PR, but I can do it too. |
…ements for reduce, fold Clarify that reduce functions need to be commutative, and fold functions do not See #11091 Author: Sean Owen <sowen@cloudera.com> Closes #11217 from srowen/SPARK-13339.
…ements for reduce, fold Clarify that reduce functions need to be commutative, and fold functions do not See apache/spark#11091 Author: Sean Owen <sowen@cloudera.com> Closes #11217 from srowen/SPARK-13339.
According to http://stackoverflow.com/questions/35205107/spark-difference-of-semantics-between-reduce-and-reducebykey , PairRDDFunctions.reduceByKey requires, just like RDD.reduce, an associative AND commutative binary operator.
This wasn't stated in the docs.