Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12158] [SparkR] [SQL] Fix 'sample' functions that break R unit test cases #10160

Closed
wants to merge 7 commits into from

Conversation

gatorsmile
Copy link
Member

The existing sample functions miss the parameter seed, however, the corresponding function interface in generics has such a parameter. Thus, although the function caller can call the function with the 'seed', we are not using the value.

This could cause SparkR unit tests failed. For example, I hit it in another PR:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47213/consoleFull

@SparkQA
Copy link

SparkQA commented Dec 5, 2015

Test build #47230 has finished for PR 10160 at commit ec77010.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

@davies Could you take a look at this PR? Thank you!

@shivaram
Copy link
Contributor

shivaram commented Dec 5, 2015

cc @sun-rui

@felixcheung
Copy link
Member

weird - when I added the seed param it actually was harder to fail. (See #9549)
anyway, thanks for fixing this.

setMethod("sample",
# we can send seed as an argument through callJMethod
signature(x = "DataFrame", withReplacement = "logical",
fraction = "numeric", seed = "numeric"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could in fact merge these overload/variant into one. Please see this for an example:

if (!missing(j)) {

if (!missing(seed)) {
   sdf <- callJMethod(x@sdf, "sample", withReplacement, fraction, as.integer(seed))
} else {
   sdf <- callJMethod(x@sdf, "sample", withReplacement, fraction)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@gatorsmile
Copy link
Member Author

@felixcheung @sun-rui Thank you! Based on your comments, I did the changes. Please review the changes. : )

@gatorsmile
Copy link
Member Author

ok to test

@felixcheung
Copy link
Member

looks good! could you think of the best way to add a test for not setting seed?
hmm.. perhaps the loop I use in #9549?

@SparkQA
Copy link

SparkQA commented Dec 6, 2015

Test build #47236 has finished for PR 10160 at commit 2ab89e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

@felixcheung I am not sure if we need to add a test case for sample. Normally, using a specific seed is the common way to verify the result of sample. The existing test case may be enough?

  sampled <- sample(df, FALSE, 1.0)
  expect_equal(nrow(collect(sampled)), count(df))

If needed, maybe we can add something like the below:

repeat {  if (count(sample(df, FALSE, 0.1)) != count(sample(df, FALSE, 0.1))) { break } }

@gatorsmile gatorsmile changed the title [SPARK-12158] [R] [SQL] Fix 'sample' functions that break R unit test cases [SPARK-12158] [SparkR] [SQL] Fix 'sample' functions that break R unit test cases Dec 6, 2015
@felixcheung
Copy link
Member

@gatorsmile Sure - I guess the main thing is to ensure the seed is getting set. How about:

count1 <-  count(sample(df, FALSE, 0.1, 0))
count2 <-  count(sample(df, FALSE, 0.1, 0))
expect_equal(count1, count2)

?

@shivaram
Copy link
Contributor

shivaram commented Dec 7, 2015

Yeah thats a good idea @felixcheung

@gatorsmile
Copy link
Member Author

@felixcheung @shivaram Sure, just added that test case. Please review it. Thank you! : )

@shivaram
Copy link
Contributor

shivaram commented Dec 7, 2015

LGTM. I'll wait for @felixcheung and Jenkins before merging

function(x, withReplacement, fraction) {
sample(x, withReplacement, fraction)
function(x, withReplacement, fraction, seed) {
if (!missing(seed)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not call directly: sample(x, withReplacement, fraction, seed)?

@gatorsmile
Copy link
Member Author

: ) @sun-rui Done. Thank you!

@SparkQA
Copy link

SparkQA commented Dec 7, 2015

Test build #47253 has finished for PR 10160 at commit 34d0118.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

function(x, withReplacement, fraction) {
sample(x, withReplacement, fraction)
function(x, withReplacement, fraction, seed) {
sample(x, withReplacement, fraction, as.integer(seed))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 space ident. directly pass seed is ok instead of as.integer(seed)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, done! This is my first time to read and write R. : ) Thank you!

@SparkQA
Copy link

SparkQA commented Dec 7, 2015

Test build #47254 has finished for PR 10160 at commit 4337c35.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 7, 2015

Test build #47255 has finished for PR 10160 at commit a78109e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

@shivaram @felixcheung @sun-rui Thank you everyone! Hopefully, my code changes resolve all your concerns. I learned a lot from you! : )

@sun-rui
Copy link
Contributor

sun-rui commented Dec 7, 2015

LGTM

@shivaram
Copy link
Contributor

shivaram commented Dec 7, 2015

@gatorsmile This will need to be rebased to master as we moved test file locations in #10030 -- Let me know once thats done and I'll merge

@felixcheung
Copy link
Member

looks good.

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 7, 2015

Test build #47279 has finished for PR 10160 at commit ad3ea31.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 7, 2015

Test build #47280 has finished for PR 10160 at commit ad3ea31.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

signature(x = "DataFrame", withReplacement = "logical",
fraction = "numeric"),
function(x, withReplacement, fraction) {
function(x, withReplacement, fraction, seed) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixcheung Shouldn't we document this param in the roxygen doc above ? Otherwise how would anybody know we support a seed ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we should add a @param seed above, thanks for catching it

@felixcheung
Copy link
Member

looks good thanks @gatorsmile

@gatorsmile
Copy link
Member Author

Thank you, @shivaram @felixcheung !

@SparkQA
Copy link

SparkQA commented Dec 7, 2015

Test build #47287 has finished for PR 10160 at commit 493b368.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

@shivaram Will it be merged before the release of 1.6? Thanks!

@shivaram
Copy link
Contributor

Sorry @gatorsmile -- I missed this PR for a couple of days. LGTM. Merging this to master and branch-1.6. Regarding whether this will make 1.6 release, it depends on which RC becomes a release -- I think RC2 was cut earlier today, but I'm not sure.

asfgit pushed a commit that referenced this pull request Dec 12, 2015
…est cases

The existing sample functions miss the parameter `seed`, however, the corresponding function interface in `generics` has such a parameter. Thus, although the function caller can call the function with the 'seed', we are not using the value.

This could cause SparkR unit tests failed. For example, I hit it in another PR:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47213/consoleFull

Author: gatorsmile <gatorsmile@gmail.com>

Closes #10160 from gatorsmile/sampleR.

(cherry picked from commit 1e3526c)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
@asfgit asfgit closed this in 1e3526c Dec 12, 2015
@gatorsmile
Copy link
Member Author

Thank you, everyone! : )

@gatorsmile gatorsmile deleted the sampleR branch December 30, 2015 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants