[SPARK-16005][R] Add `randomSplit` to SparkR by dongjoon-hyun · Pull Request #13721 · apache/spark

dongjoon-hyun · 2016-06-16T23:38:08Z

What changes were proposed in this pull request?

This PR adds randomSplit to SparkR for API parity.

How was this patch tested?

Pass the Jenkins tests (with new testcase.)

SparkQA · 2016-06-16T23:45:28Z

Test build #60669 has finished for PR 13721 at commit 3fba8b8.

This patch fails R style tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2016-06-16T23:50:19Z

R/pkg/R/DataFrame.R

Could you add #' @note since 2.0.0 - we are trying to add that for 2.0, there should be some example already. I'm going to do a pass on that later - btw, you are welcome to take on that too!

Sure!
And, thank you for review, @felixcheung !

felixcheung · 2016-06-17T00:11:43Z

looks good!

dongjoon-hyun · 2016-06-17T00:14:21Z

Thank you, @felixcheung .
For SPARK-14995, I'll do that tonight. It looks good as an exercise for me.
Thank you for let me know that.

SparkQA · 2016-06-17T00:29:04Z

Test build #60670 has finished for PR 13721 at commit 5eff72b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-06-17T00:34:53Z

Test build #60671 has finished for PR 13721 at commit 0c0e00b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-06-17T00:39:58Z

Test build #60673 has finished for PR 13721 at commit 2e1a3cf.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2016-06-17T00:44:17Z

Now, it passed all tests and become ready for review again.
Could you review this PR, @shivaram ?

shivaram · 2016-06-17T03:43:35Z

R/pkg/R/DataFrame.R

nit: seed to use for random split will be better here

SparkQA · 2016-06-17T05:13:07Z

Test build #60682 has finished for PR 13721 at commit eed4b1f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-06-17T05:22:54Z

Test build #60684 has finished for PR 13721 at commit 09a079c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2016-06-17T05:40:44Z

Hi, @shivaram . The followings are updated and become ready for review again.

The param description is improved.
The size and ratio of returned list are compared with those of weights.

dongjoon-hyun · 2016-06-17T17:28:38Z

Hi, @shivaram .
Although it seems to be late for becoming a part of the Spark 2.0.0, could you review again ?

shivaram · 2016-06-17T22:17:00Z

R/pkg/R/DataFrame.R

+
+#' randomSplit
+#'
+#' Return a list of randomly split dataframes with the provided weights.


Could we explain that the dataframes are split by rows ? Its not really clear what randomSplit means without some context otherwise.

Yes. Given 1000 rows, randomSplit(c(2,3,5)) will return three dataframe having approximately 200, 300, 500 rows .

dongjoon-hyun · 2016-06-17T22:35:10Z

@shivaram . I added the description. Thank you for review!

shivaram · 2016-06-17T22:40:15Z

Thanks @dongjoon-hyun - LGTM. Will merge once Jenkins passes

SparkQA · 2016-06-17T23:05:20Z

Test build #60731 has finished for PR 13721 at commit 019cbcf.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

## What changes were proposed in this pull request? This PR adds `randomSplit` to SparkR for API parity. ## How was this patch tested? Pass the Jenkins tests (with new testcase.) Author: Dongjoon Hyun <dongjoon@apache.org> Closes #13721 from dongjoon-hyun/SPARK-16005. (cherry picked from commit 7d65a0d) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

dongjoon-hyun · 2016-06-18T03:59:48Z

Thank you for merging, @shivaram .

felixcheung reviewed Jun 16, 2016
View reviewed changes

shivaram reviewed Jun 17, 2016
View reviewed changes

dongjoon-hyun added 6 commits June 16, 2016 21:47

[SPARK-16005][R] Add randomSplit to SparkR

4528f51

Fix R coding style.

642f99a

Fix descriptions and add since note.

c175ec1

Add a testcase without seed.

0c7be99

Improve testcases and param description.

76ee3a5

Add blank line at the end of the file.

09a079c

shivaram reviewed Jun 17, 2016
View reviewed changes

Add more description in the example.

019cbcf

asfgit closed this in 7d65a0d Jun 17, 2016

dongjoon-hyun deleted the SPARK-16005 branch July 20, 2016 07:39

Conversation

dongjoon-hyun commented Jun 16, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Jun 16, 2016

Uh oh!

felixcheung Jun 16, 2016

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jun 16, 2016

Choose a reason for hiding this comment

Uh oh!

felixcheung commented Jun 17, 2016

Uh oh!

dongjoon-hyun commented Jun 17, 2016

Uh oh!

SparkQA commented Jun 17, 2016

Uh oh!

SparkQA commented Jun 17, 2016

Uh oh!

SparkQA commented Jun 17, 2016

Uh oh!

dongjoon-hyun commented Jun 17, 2016

Uh oh!

shivaram Jun 17, 2016

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jun 17, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 17, 2016

Uh oh!

SparkQA commented Jun 17, 2016

Uh oh!

dongjoon-hyun commented Jun 17, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Jun 17, 2016

Uh oh!

shivaram Jun 17, 2016

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jun 17, 2016

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jun 17, 2016

Uh oh!

shivaram commented Jun 17, 2016

Uh oh!

SparkQA commented Jun 17, 2016

Uh oh!

dongjoon-hyun commented Jun 18, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

dongjoon-hyun commented Jun 17, 2016 •

edited

Loading