-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-12158] [SparkR] [SQL] Fix 'sample' functions that break R unit test cases #10160
Conversation
Test build #47230 has finished for PR 10160 at commit
|
@davies Could you take a look at this PR? Thank you! |
cc @sun-rui |
weird - when I added the seed param it actually was harder to fail. (See #9549) |
setMethod("sample", | ||
# we can send seed as an argument through callJMethod | ||
signature(x = "DataFrame", withReplacement = "logical", | ||
fraction = "numeric", seed = "numeric"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could in fact merge these overload/variant into one. Please see this for an example:
Line 1177 in 5011f26
if (!missing(j)) { |
if (!missing(seed)) {
sdf <- callJMethod(x@sdf, "sample", withReplacement, fraction, as.integer(seed))
} else {
sdf <- callJMethod(x@sdf, "sample", withReplacement, fraction)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@felixcheung @sun-rui Thank you! Based on your comments, I did the changes. Please review the changes. : ) |
ok to test |
looks good! could you think of the best way to add a test for not setting seed? |
Test build #47236 has finished for PR 10160 at commit
|
@felixcheung I am not sure if we need to add a test case for
If needed, maybe we can add something like the below:
|
@gatorsmile Sure - I guess the main thing is to ensure the seed is getting set. How about:
? |
Yeah thats a good idea @felixcheung |
@felixcheung @shivaram Sure, just added that test case. Please review it. Thank you! : ) |
LGTM. I'll wait for @felixcheung and Jenkins before merging |
function(x, withReplacement, fraction) { | ||
sample(x, withReplacement, fraction) | ||
function(x, withReplacement, fraction, seed) { | ||
if (!missing(seed)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not call directly: sample(x, withReplacement, fraction, seed)?
: ) @sun-rui Done. Thank you! |
Test build #47253 has finished for PR 10160 at commit
|
function(x, withReplacement, fraction) { | ||
sample(x, withReplacement, fraction) | ||
function(x, withReplacement, fraction, seed) { | ||
sample(x, withReplacement, fraction, as.integer(seed)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 space ident. directly pass seed is ok instead of as.integer(seed)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, done! This is my first time to read and write R. : ) Thank you!
Test build #47254 has finished for PR 10160 at commit
|
Test build #47255 has finished for PR 10160 at commit
|
@shivaram @felixcheung @sun-rui Thank you everyone! Hopefully, my code changes resolve all your concerns. I learned a lot from you! : ) |
LGTM |
@gatorsmile This will need to be rebased to master as we moved test file locations in #10030 -- Let me know once thats done and I'll merge |
looks good. |
retest this please |
Test build #47279 has finished for PR 10160 at commit
|
Test build #47280 has finished for PR 10160 at commit
|
signature(x = "DataFrame", withReplacement = "logical", | ||
fraction = "numeric"), | ||
function(x, withReplacement, fraction) { | ||
function(x, withReplacement, fraction, seed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixcheung Shouldn't we document this param in the roxygen doc above ? Otherwise how would anybody know we support a seed
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes we should add a @param seed
above, thanks for catching it
looks good thanks @gatorsmile |
Thank you, @shivaram @felixcheung ! |
Test build #47287 has finished for PR 10160 at commit
|
@shivaram Will it be merged before the release of 1.6? Thanks! |
Sorry @gatorsmile -- I missed this PR for a couple of days. LGTM. Merging this to master and |
…est cases The existing sample functions miss the parameter `seed`, however, the corresponding function interface in `generics` has such a parameter. Thus, although the function caller can call the function with the 'seed', we are not using the value. This could cause SparkR unit tests failed. For example, I hit it in another PR: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47213/consoleFull Author: gatorsmile <gatorsmile@gmail.com> Closes #10160 from gatorsmile/sampleR. (cherry picked from commit 1e3526c) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Thank you, everyone! : ) |
The existing sample functions miss the parameter
seed
, however, the corresponding function interface ingenerics
has such a parameter. Thus, although the function caller can call the function with the 'seed', we are not using the value.This could cause SparkR unit tests failed. For example, I hit it in another PR:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47213/consoleFull