New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32799][R][SQL] Add allowMissingColumns to SparkR unionByName #29813
Conversation
Test build #128908 has finished for PR 29813 at commit
|
#' Note: This does not remove duplicate rows across the two SparkDataFrames. | ||
#' This function resolves columns by name (not by position). | ||
#' | ||
#' @param x A SparkDataFrame | ||
#' @param y A SparkDataFrame | ||
#' @param allowMissingColumns logical | ||
#' @param ... further arguments to be passed to or from other methods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...
is not actually supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, seen below it's added to the generic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, but I am not sure if there is a better way of handling that.
Right now we have generic as follows:
setGeneric("unionByName", function(x, y, ...) { standardGeneric("unionByName") })
‒ as far as I am aware this is the convention for handling optional arguments we use in SparkR.
Technically speaking we could have
setGeneric("unionByName", function(x, y, allowMissingColumns) { standardGeneric("unionByName") })
but then we'd have to support
signature(x = "SparkDataFrame", y = "SparkDataFrame", allowMissingColumns = "missing")
and
signature(x = "SparkDataFrame", y = "SparkDataFrame", allowMissingColumns = "logical")
if I am not mistaken, and in the past I've been told that's too much.
Do I miss something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way you've done it looks natural to me
Test build #128917 has finished for PR 29813 at commit
|
Test build #128920 has finished for PR 29813 at commit
|
Merged to master. |
Thanks @HyukjinKwon and @MichaelChirico! |
What changes were proposed in this pull request?
Add optional
allowMissingColumns
argument to SparkRunionByName
.Why are the changes needed?
Feature parity.
Does this PR introduce any user-facing change?
unionByName
supportsallowMissingColumns
.How was this patch tested?
Existing unit tests. New unit tests targeting this feature.