Skip to content

[SPARK-16005][R] Add randomSplit to SparkR#13721

Closed
dongjoon-hyun wants to merge 7 commits intoapache:masterfrom
dongjoon-hyun:SPARK-16005
Closed

[SPARK-16005][R] Add randomSplit to SparkR#13721
dongjoon-hyun wants to merge 7 commits intoapache:masterfrom
dongjoon-hyun:SPARK-16005

Conversation

@dongjoon-hyun
Copy link
Member

What changes were proposed in this pull request?

This PR adds randomSplit to SparkR for API parity.

How was this patch tested?

Pass the Jenkins tests (with new testcase.)

@SparkQA
Copy link

SparkQA commented Jun 16, 2016

Test build #60669 has finished for PR 13721 at commit 3fba8b8.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add #' @note since 2.0.0 - we are trying to add that for 2.0, there should be some example already. I'm going to do a pass on that later - btw, you are welcome to take on that too!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!
And, thank you for review, @felixcheung !

@felixcheung
Copy link
Member

looks good!

@dongjoon-hyun
Copy link
Member Author

Thank you, @felixcheung .
For SPARK-14995, I'll do that tonight. It looks good as an exercise for me.
Thank you for let me know that.

@SparkQA
Copy link

SparkQA commented Jun 17, 2016

Test build #60670 has finished for PR 13721 at commit 5eff72b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 17, 2016

Test build #60671 has finished for PR 13721 at commit 0c0e00b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 17, 2016

Test build #60673 has finished for PR 13721 at commit 2e1a3cf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Now, it passed all tests and become ready for review again.
Could you review this PR, @shivaram ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: seed to use for random split will be better here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

@SparkQA
Copy link

SparkQA commented Jun 17, 2016

Test build #60682 has finished for PR 13721 at commit eed4b1f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 17, 2016

Test build #60684 has finished for PR 13721 at commit 09a079c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Jun 17, 2016

Hi, @shivaram . The followings are updated and become ready for review again.

  • The param description is improved.
  • The size and ratio of returned list are compared with those of weights.

@dongjoon-hyun
Copy link
Member Author

Hi, @shivaram .
Although it seems to be late for becoming a part of the Spark 2.0.0, could you review again ?


#' randomSplit
#'
#' Return a list of randomly split dataframes with the provided weights.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we explain that the dataframes are split by rows ? Its not really clear what randomSplit means without some context otherwise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Given 1000 rows, randomSplit(c(2,3,5)) will return three dataframe having approximately 200, 300, 500 rows .

@dongjoon-hyun
Copy link
Member Author

@shivaram . I added the description. Thank you for review!

@shivaram
Copy link
Contributor

Thanks @dongjoon-hyun - LGTM. Will merge once Jenkins passes

@SparkQA
Copy link

SparkQA commented Jun 17, 2016

Test build #60731 has finished for PR 13721 at commit 019cbcf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Jun 17, 2016
## What changes were proposed in this pull request?

This PR adds `randomSplit` to SparkR for API parity.

## How was this patch tested?

Pass the Jenkins tests (with new testcase.)

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #13721 from dongjoon-hyun/SPARK-16005.

(cherry picked from commit 7d65a0d)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
@asfgit asfgit closed this in 7d65a0d Jun 17, 2016
@dongjoon-hyun
Copy link
Member Author

Thank you for merging, @shivaram .

@dongjoon-hyun dongjoon-hyun deleted the SPARK-16005 branch July 20, 2016 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments