Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32799][R][SQL] Add allowMissingColumns to SparkR unionByName #29813

Closed
wants to merge 3 commits into from

Conversation

zero323
Copy link
Member

@zero323 zero323 commented Sep 20, 2020

What changes were proposed in this pull request?

Add optional allowMissingColumns argument to SparkR unionByName.

Why are the changes needed?

Feature parity.

Does this PR introduce any user-facing change?

unionByName supports allowMissingColumns.

How was this patch tested?

Existing unit tests. New unit tests targeting this feature.

@zero323 zero323 changed the title [SPARK-32799][R] Add allowMissingColumns to SparkR unionByName [SPARK-32799][R][SQL] Add allowMissingColumns to SparkR unionByName Sep 20, 2020
@SparkQA
Copy link

SparkQA commented Sep 20, 2020

Test build #128908 has finished for PR 29813 at commit 94d8317.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zero323 zero323 marked this pull request as ready for review September 20, 2020 06:11
R/pkg/R/DataFrame.R Outdated Show resolved Hide resolved
#' Note: This does not remove duplicate rows across the two SparkDataFrames.
#' This function resolves columns by name (not by position).
#'
#' @param x A SparkDataFrame
#' @param y A SparkDataFrame
#' @param allowMissingColumns logical
#' @param ... further arguments to be passed to or from other methods.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... is not actually supported?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, seen below it's added to the generic

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, but I am not sure if there is a better way of handling that.

Right now we have generic as follows:

setGeneric("unionByName", function(x, y, ...) { standardGeneric("unionByName") })

‒ as far as I am aware this is the convention for handling optional arguments we use in SparkR.

Technically speaking we could have

setGeneric("unionByName", function(x, y, allowMissingColumns) { standardGeneric("unionByName") })

but then we'd have to support

signature(x = "SparkDataFrame", y = "SparkDataFrame", allowMissingColumns = "missing")

and

signature(x = "SparkDataFrame", y = "SparkDataFrame", allowMissingColumns = "logical")

if I am not mistaken, and in the past I've been told that's too much.

Do I miss something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way you've done it looks natural to me

R/pkg/R/DataFrame.R Outdated Show resolved Hide resolved
@SparkQA
Copy link

SparkQA commented Sep 20, 2020

Test build #128917 has finished for PR 29813 at commit e019e3d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 20, 2020

Test build #128920 has finished for PR 29813 at commit 0597724.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

@zero323
Copy link
Member Author

zero323 commented Sep 21, 2020

Thanks @HyukjinKwon and @MichaelChirico!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants