Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Doc: PairRDDFunctions.reduceByKey should be stated as requiring a commutative binary op #11091

Closed
wants to merge 1 commit into from

Conversation

YPares
Copy link

@YPares YPares commented Feb 5, 2016

According to http://stackoverflow.com/questions/35205107/spark-difference-of-semantics-between-reduce-and-reducebykey , PairRDDFunctions.reduceByKey requires, just like RDD.reduce, an associative AND commutative binary operator.
This wasn't stated in the docs.

Make the doc more coherent wrt RDD.reduce
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@YPares YPares changed the title PairRDDFunctions.reduceByKey should be stated as requiring a commutative binary op [Scala] Doc: PairRDDFunctions.reduceByKey should be stated as requiring a commutative binary op Feb 5, 2016
@YPares YPares changed the title [Scala] Doc: PairRDDFunctions.reduceByKey should be stated as requiring a commutative binary op [Core] Doc: PairRDDFunctions.reduceByKey should be stated as requiring a commutative binary op Feb 5, 2016
@srowen
Copy link
Member

srowen commented Feb 5, 2016

That's fine though this is pretty much by definition for reduce.

@YPares
Copy link
Author

YPares commented Feb 5, 2016

@srowen I agree but the difference between the documentation of reduce and reduceByKey seemed to imply a difference in behaviour.
Plus some tests on one node on my part (see the Stackoverflow post I mentioned) seemed to exhibit this difference in behaviour (adding to the confusion).

@@ -300,7 +300,7 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
}

/**
* Merge the values for each key using an associative reduce function. This will also perform
* Merge the values for each key using an associative and commutative binary operator. This will also perform
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line exceeds 100 chars and will fail style checker

@rxin
Copy link
Contributor

rxin commented Feb 5, 2016

Yea we should make them consistent. Are there more inconsistencies you find?

@srowen
Copy link
Member

srowen commented Feb 6, 2016

Yeah I think there similar statements about an 'associative' operation that really mean 'associative and commutative'. There are more occurrences in this file. There are some in Accumulator.scala, JavaPairRDD.scala, JavaRDDLike.scala, JavaDStreamLike.scala, JavaPairDStream.scala, DStream.scala, PairDStreamFunctions.scala, rdd.py, dstream.py, pairRDD.R. In each

@srowen
Copy link
Member

srowen commented Feb 13, 2016

@YPares are you going to update this or should I continue it?

@YPares
Copy link
Author

YPares commented Feb 14, 2016

@srowen Oh, sorry, I was waiting a bit to see if I found other inconsistencies.
But none came to mind, so yes you may continue it.

@srowen
Copy link
Member

srowen commented Feb 14, 2016

See my previous message @YPares -- I think I found all the other ones. You're welcome to address them so we can merge your PR, but I can do it too.

@srowen
Copy link
Member

srowen commented Feb 16, 2016

You can close this @YPares ; I created #11217

@YPares YPares closed this Feb 16, 2016
asfgit pushed a commit that referenced this pull request Feb 19, 2016
…ements for reduce, fold

Clarify that reduce functions need to be commutative, and fold functions do not

See #11091

Author: Sean Owen <sowen@cloudera.com>

Closes #11217 from srowen/SPARK-13339.
ckadner pushed a commit to ckadner/bahir_from_spark_8301fad that referenced this pull request Jun 6, 2016
…ements for reduce, fold

Clarify that reduce functions need to be commutative, and fold functions do not

See apache/spark#11091

Author: Sean Owen <sowen@cloudera.com>

Closes #11217 from srowen/SPARK-13339.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants