[SPARK-16005][R] Add randomSplit to SparkR#13721
[SPARK-16005][R] Add randomSplit to SparkR#13721dongjoon-hyun wants to merge 7 commits intoapache:masterfrom dongjoon-hyun:SPARK-16005
randomSplit to SparkR#13721Conversation
|
Test build #60669 has finished for PR 13721 at commit
|
R/pkg/R/DataFrame.R
Outdated
There was a problem hiding this comment.
Could you add #' @note since 2.0.0 - we are trying to add that for 2.0, there should be some example already. I'm going to do a pass on that later - btw, you are welcome to take on that too!
There was a problem hiding this comment.
Sure!
And, thank you for review, @felixcheung !
|
looks good! |
|
Thank you, @felixcheung . |
|
Test build #60670 has finished for PR 13721 at commit
|
|
Test build #60671 has finished for PR 13721 at commit
|
|
Test build #60673 has finished for PR 13721 at commit
|
|
Now, it passed all tests and become ready for review again. |
R/pkg/R/DataFrame.R
Outdated
There was a problem hiding this comment.
nit: seed to use for random split will be better here
|
Test build #60682 has finished for PR 13721 at commit
|
|
Test build #60684 has finished for PR 13721 at commit
|
|
Hi, @shivaram . The followings are updated and become ready for review again.
|
|
Hi, @shivaram . |
|
|
||
| #' randomSplit | ||
| #' | ||
| #' Return a list of randomly split dataframes with the provided weights. |
There was a problem hiding this comment.
Could we explain that the dataframes are split by rows ? Its not really clear what randomSplit means without some context otherwise.
There was a problem hiding this comment.
Yes. Given 1000 rows, randomSplit(c(2,3,5)) will return three dataframe having approximately 200, 300, 500 rows .
|
@shivaram . I added the description. Thank you for review! |
|
Thanks @dongjoon-hyun - LGTM. Will merge once Jenkins passes |
|
Test build #60731 has finished for PR 13721 at commit
|
## What changes were proposed in this pull request? This PR adds `randomSplit` to SparkR for API parity. ## How was this patch tested? Pass the Jenkins tests (with new testcase.) Author: Dongjoon Hyun <dongjoon@apache.org> Closes #13721 from dongjoon-hyun/SPARK-16005. (cherry picked from commit 7d65a0d) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
|
Thank you for merging, @shivaram . |
What changes were proposed in this pull request?
This PR adds
randomSplitto SparkR for API parity.How was this patch tested?
Pass the Jenkins tests (with new testcase.)