[SPARK-1701] Clarify slice vs partition in the programming guide #2305

mattf · 2014-09-06T18:27:04Z

This is a partial solution to SPARK-1701, only addressing the
documentation confusion.

Additional work can be to actually change the numSlices parameter name
across languages, with care required for scala & python to maintain
backward compatibility for named parameters.

This is a partial solution to SPARK-1701, only addressing the documentation confusion. Additional work can be to actually change the numSlices parameter name across languages, with care required for scala & python to maintain backward compatibility for named parameters.

SparkQA · 2014-09-06T18:44:07Z

QA tests have started for PR 2305 at commit 7b045e0.

This patch merges cleanly.

SparkQA · 2014-09-06T19:45:13Z

QA tests have finished for PR 2305 at commit 7b045e0.

This patch passes unit tests.
This patch merges cleanly.
This patch adds no public classes.

mattf · 2014-09-11T12:30:50Z

@JoshRosen will you take a look at this?

JoshRosen · 2014-09-19T01:14:52Z

Sorry for not reviewing this until now; it sort of fell off my radar.

JoshRosen · 2014-09-19T01:16:51Z

docs/programming-guide.md

@@ -286,7 +286,7 @@ We describe operations on distributed datasets later on.

 </div>

-One important parameter for parallel collections is the number of *slices* to cut the dataset into. Spark will run one task for each slice of the cluster. Typically you want 2-4 slices for each CPU in your cluster. Normally, Spark tries to set the number of slices automatically based on your cluster. However, you can also set it manually by passing it as a second parameter to `parallelize` (e.g. `sc.parallelize(data, 10)`).
+One important parameter for parallel collections is the number of *partitions* to cut the dataset into. Spark will run one task for each partition of the cluster. Typically you want 2-4 partitions for each CPU in your cluster. Normally, Spark tries to set the number of partitions automatically based on your cluster. However, you can also set it manually by passing it as a second parameter to `parallelize` (e.g. `sc.parallelize(data, 10)`). Note: the parameter is called numSlices (not numPartitions) to maintain backward compatibility.


Maybe the "Note:" should mention that in some places we still say numSlices (for backwards compatibility with earlier versions of Spark) and that "slices" should be considered as a synonym for "partitions"; there are a lot of places that use numPartitions, etc, so we may want to emphasize that this discrepancy only occurs in a few places.

mattf · 2014-09-19T01:33:08Z

thanks for the feedback. i've changed the language to be more inline with your suggestion.

SparkQA · 2014-09-19T01:39:26Z

QA tests have started for PR 2305 at commit c0af05d.

This patch merges cleanly.

SparkQA · 2014-09-19T02:30:21Z

QA tests have finished for PR 2305 at commit c0af05d.

This patch fails unit tests.
This patch merges cleanly.
This patch adds no public classes.

mattf · 2014-09-19T02:59:15Z

This patch fails unit tests.

i'm getting HTTP 503 from jenkins, but i'm gonna go out on a limb and say this doc change didn't break the unit tests.

JoshRosen · 2014-09-19T21:31:17Z

I think that Jenkins might have crashed or restarted overnight, but it seems to be working now.

This looks good to me, so I'm going to merge it. Feel free to open similar PRs for other documentation improvements / clarifications, since these types of edits are really helpful.

…nd code?) ## What changes were proposed in this pull request? Came across the term "slice" when running some spark scala code. Consequently, a Google search indicated that "slices" and "partitions" refer to the same things; indeed see: - [This issue](https://issues.apache.org/jira/browse/SPARK-1701) - [This pull request](apache#2305) - [This StackOverflow answer](http://stackoverflow.com/questions/23436640/what-is-the-difference-between-an-rdd-partition-and-a-slice) and [this one](http://stackoverflow.com/questions/24269495/what-are-the-differences-between-slices-and-partitions-of-rdds) Thus this pull request fixes the occurrence of slice I came accross. Nonetheless, [it would appear](https://github.com/apache/spark/search?utf8=%E2%9C%93&q=slice&type=) there are still many references to "slice/slices" - thus I thought I'd raise this Pull Request to address the issue (sorry if this is the wrong place, I'm not too familar with raising apache issues). ## How was this patch tested? (Not tested locally - only a minor exception message change.) Please review http://spark.apache.org/contributing.html before opening a pull request. Author: asmith26 <asmith26@users.noreply.github.com> Closes apache#17565 from asmith26/master.

JoshRosen reviewed Sep 19, 2014
View reviewed changes

Wording tweak from Josh Rosen's review

06f80fc

Further tweak

c0af05d

asfgit closed this in be0c756 Sep 19, 2014

mattf deleted the SPARK-1701 branch September 19, 2014 21:41

asmith26 mentioned this pull request Apr 7, 2017

[MINOR] Issue: Change "slice" vs "partition" in exception messages (and code?) #17565

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-1701] Clarify slice vs partition in the programming guide #2305

[SPARK-1701] Clarify slice vs partition in the programming guide #2305

mattf commented Sep 6, 2014

SparkQA commented Sep 6, 2014

SparkQA commented Sep 6, 2014

mattf commented Sep 11, 2014

JoshRosen commented Sep 19, 2014

JoshRosen Sep 19, 2014

mattf commented Sep 19, 2014

SparkQA commented Sep 19, 2014

SparkQA commented Sep 19, 2014

mattf commented Sep 19, 2014

JoshRosen commented Sep 19, 2014

[SPARK-1701] Clarify slice vs partition in the programming guide #2305

[SPARK-1701] Clarify slice vs partition in the programming guide #2305

Conversation

mattf commented Sep 6, 2014

SparkQA commented Sep 6, 2014

SparkQA commented Sep 6, 2014

mattf commented Sep 11, 2014

JoshRosen commented Sep 19, 2014

JoshRosen Sep 19, 2014

Choose a reason for hiding this comment

mattf commented Sep 19, 2014

SparkQA commented Sep 19, 2014

SparkQA commented Sep 19, 2014

mattf commented Sep 19, 2014

JoshRosen commented Sep 19, 2014