[SPARK-14353] Dataset Time Window window API for R#12141
[SPARK-14353] Dataset Time Window window API for R#12141brkyvz wants to merge 6 commits intoapache:masterfrom
window API for R#12141Conversation
|
Test build #54819 has finished for PR 12141 at commit
|
| #'} | ||
| setMethod("window", signature(x = "Column"), | ||
| function(x, windowDuration, slideDuration = NULL, startTime = NULL) { | ||
| if (!is.null(slideDuration) && !is.null(startTime)) { |
There was a problem hiding this comment.
Check we check the type of windowDuration, slideDuration and startTime?
|
@brkyvz LGTM, only one minor comment. |
|
@davies Addressed your comment |
|
Test build #54917 has finished for PR 12141 at commit
|
|
Retest this please
|
| "var_samp", | ||
| "weekofyear", | ||
| "when", | ||
| "window", |
There was a problem hiding this comment.
this should be added to https://github.com/apache/spark/blob/master/R/pkg/R/generics.R?
There was a problem hiding this comment.
also need to check window from the stats package can still be called?
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/window.html
There was a problem hiding this comment.
done. window can be called using stats::window
|
retest this please |
|
Test build #54930 has finished for PR 12141 at commit
|
|
Test build #54934 has finished for PR 12141 at commit
|
|
Test build #54938 has finished for PR 12141 at commit
|
|
Test build #54946 has finished for PR 12141 at commit
|
| })) | ||
| maskedCompletely <- masked[!funcHasAny] | ||
| namesOfMaskedCompletely <- c("cov", "filter", "sample") | ||
| namesOfMaskedCompletely <- c("cov", "filter", "sample", "window") |
There was a problem hiding this comment.
if this fails that means the current definition will block the stats::window function completely, (ie. user will not be able to call it without altering the code with stats:: prefix)
this is in fact delibrate - to avoid masking the function by accident. for instance, please set the generic (in R/pkg/R/generics.R) as window(x, ...) to match the stats package definition - generally that would allow both to be called at the same time.
@sun-rui @shivaram
There was a problem hiding this comment.
I see... Thanks for pointing it out!
|
Test build #54953 has finished for PR 12141 at commit
|
|
LGTM, merging into master. |
What changes were proposed in this pull request?
The
windowfunction was added to Dataset with this PR.This PR adds the R API for this function.
With this PR, SQL, Java, and Scala will share the same APIs as in users can use:
window(timeColumn, windowDuration)window(timeColumn, windowDuration, slideDuration)window(timeColumn, windowDuration, slideDuration, startTime)In Python and R, users can access all APIs above, but in addition they can do
window(timeColumn, windowDuration, startTime=...)that is, they can provide the startTime without providing the
slideDuration. In this case, we will generate tumbling windows.How was this patch tested?
Unit tests + manual tests