[SPARK-7150] SparkContext.range() and SQLContext.range() #6230

davies · 2015-05-18T08:31:40Z

This PR is based on #6081, thanks @adrian-wang.

Closes #6081

AmplabJenkins · 2015-05-18T08:32:11Z

Merged build triggered.

AmplabJenkins · 2015-05-18T08:32:20Z

Merged build started.

SparkQA · 2015-05-18T08:34:24Z

Test build #32984 has started for PR 6230 at commit 789eda5.

rxin · 2015-05-18T08:39:53Z

python/pyspark/sql/context.py

+        :param numPartitions: the number of partitions of the DataFrame
+        :return: A new DataFrame
+
+        >>> sqlContext.range(1, 7, 2).collect()


can we add a test for large ints (i.e. > 32 bits)?

might make sense to have that in tests.py

rxin · 2015-05-18T08:40:33Z

Can we update the title to "SparkContext.range() and SQLContext.range()" ?

rxin · 2015-05-18T08:40:51Z

LGTM other than the unit test.

SparkQA · 2015-05-18T10:58:28Z

Test build #32984 has finished for PR 6230 at commit 789eda5.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-05-18T10:58:33Z

Merged build finished. Test FAILed.

AmplabJenkins · 2015-05-18T10:58:34Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32984/
Test FAILed.

AmplabJenkins · 2015-05-18T15:47:11Z

Merged build triggered.

AmplabJenkins · 2015-05-18T15:47:18Z

Merged build started.

AmplabJenkins · 2015-05-18T16:02:22Z

Merged build finished. Test FAILed.

AmplabJenkins · 2015-05-18T16:02:22Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32997/
Test FAILed.

rxin · 2015-05-18T17:06:40Z

Jenkins, retest this please.

AmplabJenkins · 2015-05-18T17:07:11Z

Merged build triggered.

AmplabJenkins · 2015-05-18T17:07:19Z

Merged build started.

SparkQA · 2015-05-18T17:09:19Z

Test build #33000 has started for PR 6230 at commit d3ce5fe.

adrian-wang · 2015-05-18T17:36:51Z

python/pyspark/sql/context.py

+        """
+        if numPartitions is None:
+            numPartitions = self._sc.defaultParallelism
+        jdf = self._ssql_ctx.range(int(start), int(end), int(step), int(numPartitions))


This will make the parameters unpredictable, and lead to exceptions.

If the start or end is invalid, you will get an exception anyway. By converting them in Python, we will got an exception in Python way (failed to converted into int), not a Py4j exception (failed to find a method to call), the later is much harder to understand for most of users.

you are right.

SparkQA · 2015-05-18T19:35:31Z

Test build #33000 has finished for PR 6230 at commit d3ce5fe.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-05-18T19:35:36Z

Merged build finished. Test PASSed.

AmplabJenkins · 2015-05-18T19:35:37Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33000/
Test PASSed.

davies · 2015-05-19T00:11:56Z

@adrian-wang @rxin Is this ready to merge?

rxin · 2015-05-19T04:42:32Z

Merging. Thanks @adrian-wang and @davies.

This PR is based on #6081, thanks adrian-wang. Closes #6081 Author: Daoyuan Wang <daoyuan.wang@intel.com> Author: Davies Liu <davies@databricks.com> Closes #6230 from davies/range and squashes the following commits: d3ce5fe [Davies Liu] add tests 789eda5 [Davies Liu] add range() in Python 4590208 [Davies Liu] Merge commit 'refs/pull/6081/head' of github.com:apache/spark into range cbf5200 [Daoyuan Wang] let's add python support in a separate PR f45e3b2 [Daoyuan Wang] remove redundant toLong 617da76 [Daoyuan Wang] fix safe marge for corner cases 867c417 [Daoyuan Wang] fix 13dbe84 [Daoyuan Wang] update bd998ba [Daoyuan Wang] update comments d3a0c1b [Daoyuan Wang] add range api() (cherry picked from commit c2437de) Signed-off-by: Reynold Xin <rxin@databricks.com>

This PR is based on apache#6081, thanks adrian-wang. Closes apache#6081 Author: Daoyuan Wang <daoyuan.wang@intel.com> Author: Davies Liu <davies@databricks.com> Closes apache#6230 from davies/range and squashes the following commits: d3ce5fe [Davies Liu] add tests 789eda5 [Davies Liu] add range() in Python 4590208 [Davies Liu] Merge commit 'refs/pull/6081/head' of github.com:apache/spark into range cbf5200 [Daoyuan Wang] let's add python support in a separate PR f45e3b2 [Daoyuan Wang] remove redundant toLong 617da76 [Daoyuan Wang] fix safe marge for corner cases 867c417 [Daoyuan Wang] fix 13dbe84 [Daoyuan Wang] update bd998ba [Daoyuan Wang] update comments d3a0c1b [Daoyuan Wang] add range api()

adrian-wang and others added 9 commits May 13, 2015 22:29

add range api()

d3a0c1b

update comments

bd998ba

update

13dbe84

fix

867c417

fix safe marge for corner cases

617da76

remove redundant toLong

f45e3b2

let's add python support in a separate PR

cbf5200

Merge commit 'refs/pull/6081/head' of github.com:apache/spark into range

4590208

add range() in Python

789eda5

rxin reviewed May 18, 2015
View reviewed changes

adrian-wang mentioned this pull request May 18, 2015

[SPARK-7150] add range() api #6233

Closed

davies changed the title ~~[SPARK-7150] add range() api~~ [SPARK-7150] SparkContext.range() and SQLContext.range() May 18, 2015

add tests

d3ce5fe

adrian-wang reviewed May 18, 2015
View reviewed changes

asfgit closed this in c2437de May 19, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-7150] SparkContext.range() and SQLContext.range() #6230

[SPARK-7150] SparkContext.range() and SQLContext.range() #6230

davies commented May 18, 2015

AmplabJenkins commented May 18, 2015

AmplabJenkins commented May 18, 2015

SparkQA commented May 18, 2015

rxin May 18, 2015

rxin May 18, 2015

davies May 18, 2015

rxin commented May 18, 2015

rxin commented May 18, 2015

SparkQA commented May 18, 2015

AmplabJenkins commented May 18, 2015

AmplabJenkins commented May 18, 2015

AmplabJenkins commented May 18, 2015

AmplabJenkins commented May 18, 2015

AmplabJenkins commented May 18, 2015

AmplabJenkins commented May 18, 2015

rxin commented May 18, 2015

AmplabJenkins commented May 18, 2015

AmplabJenkins commented May 18, 2015

SparkQA commented May 18, 2015

adrian-wang May 18, 2015

davies May 18, 2015

adrian-wang May 19, 2015

SparkQA commented May 18, 2015

AmplabJenkins commented May 18, 2015

AmplabJenkins commented May 18, 2015

davies commented May 19, 2015

rxin commented May 19, 2015

[SPARK-7150] SparkContext.range() and SQLContext.range() #6230

[SPARK-7150] SparkContext.range() and SQLContext.range() #6230

Conversation

davies commented May 18, 2015

AmplabJenkins commented May 18, 2015

AmplabJenkins commented May 18, 2015

SparkQA commented May 18, 2015

rxin May 18, 2015

Choose a reason for hiding this comment

rxin May 18, 2015

Choose a reason for hiding this comment

davies May 18, 2015

Choose a reason for hiding this comment

rxin commented May 18, 2015

rxin commented May 18, 2015

SparkQA commented May 18, 2015

AmplabJenkins commented May 18, 2015

AmplabJenkins commented May 18, 2015

AmplabJenkins commented May 18, 2015

AmplabJenkins commented May 18, 2015

AmplabJenkins commented May 18, 2015

AmplabJenkins commented May 18, 2015

rxin commented May 18, 2015

AmplabJenkins commented May 18, 2015

AmplabJenkins commented May 18, 2015

SparkQA commented May 18, 2015

adrian-wang May 18, 2015

Choose a reason for hiding this comment

davies May 18, 2015

Choose a reason for hiding this comment

adrian-wang May 19, 2015

Choose a reason for hiding this comment

SparkQA commented May 18, 2015

AmplabJenkins commented May 18, 2015

AmplabJenkins commented May 18, 2015

davies commented May 19, 2015

rxin commented May 19, 2015