[SPARK-21780][R] Simpler Dataset.sample API in R#19243
[SPARK-21780][R] Simpler Dataset.sample API in R#19243HyukjinKwon wants to merge 2 commits intoapache:masterfrom
Conversation
|
@felixcheung, could you take a look when you have some time please? |
680157e to
3debd9f
Compare
|
Test build #81818 has finished for PR 19243 at commit
|
|
retest this please |
|
Test build #81817 has finished for PR 19243 at commit
|
|
Test build #81819 has finished for PR 19243 at commit
|
|
let me think about this a bit... |
|
Sure. This one is a bit tricky. Let me try to find out a better way too. |
|
thinking about this, I wonder if it is more common in R to skip param with default values and the rest of param by names, like with for instance, these just work (tested this) these don't |
|
Sure, let me minimise the changes as you suggested for now and keep the current change somewhere in my local just in case. That makes sense to me too. |
3debd9f to
2bd86c8
Compare
|
Test build #81903 has finished for PR 19243 at commit
|
R/pkg/R/DataFrame.R
Outdated
There was a problem hiding this comment.
we actually want to change to not documenting the default value if it is already in the signature - because then it would be the roxygen2 generated doc
R/pkg/R/DataFrame.R
Outdated
There was a problem hiding this comment.
I think the style here is to have one space between the param name and its value, like as seed = 3 and fraction = 0.5 for the line above
R/pkg/R/DataFrame.R
Outdated
There was a problem hiding this comment.
I'd then wrap withReplacement as as.logical(withReplacement) and fraction as as.numeric(fraction)
because it might be coercible (note the L)
> is.numeric(1L)
[1] TRUE
but passing as integer could cause callJMethod to match to a different signature on the JVM.
R/pkg/R/DataFrame.R
Outdated
There was a problem hiding this comment.
we should be careful only as.integer if it isn't NULL or NA
> as.integer(NULL)
integer(0)
> as.integer(NA)
[1] NA
|
Thanks @felixcheung, will address the comments soon. |
580ae5e to
cea3625
Compare
|
Test build #81925 has finished for PR 19243 at commit
|
|
Test build #81926 has finished for PR 19243 at commit
|
R/pkg/R/DataFrame.R
Outdated
|
|
||
| if (!missing(seed)) { | ||
| if (is.null(seed) || is.na(seed)) { | ||
| stop(paste("seed must not be NULL or NA; however, got", class(seed))) |
There was a problem hiding this comment.
this actually doesn't work for NA
> class(NULL)
[1] "NULL"
> class(NA)
[1] "logical"
|
Test build #81978 has finished for PR 19243 at commit
|
|
Merged to master. |
What changes were proposed in this pull request?
This PR make
sample(...)able to omitwithReplacementdefaulting toFALSE.In short, the following examples are allowed:
In addition, this PR also adds some type checking logics as below:
How was this patch tested?
Manually tested, unit tests added in
R/pkg/tests/fulltests/test_sparkSQL.R.