[SPARK-32799][R][SQL] Add allowMissingColumns to SparkR unionByName #29813

zero323 · 2020-09-20T05:03:39Z

What changes were proposed in this pull request?

Add optional allowMissingColumns argument to SparkR unionByName.

Why are the changes needed?

Feature parity.

Does this PR introduce any user-facing change?

unionByName supports allowMissingColumns.

How was this patch tested?

Existing unit tests. New unit tests targeting this feature.

SparkQA · 2020-09-20T05:38:51Z

Test build #128908 has finished for PR 29813 at commit 94d8317.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

R/pkg/R/DataFrame.R

MichaelChirico · 2020-09-20T16:06:36Z

R/pkg/R/DataFrame.R

 #' Note: This does not remove duplicate rows across the two SparkDataFrames.
 #' This function resolves columns by name (not by position).
 #'
 #' @param x A SparkDataFrame
 #' @param y A SparkDataFrame
+#' @param allowMissingColumns logical
+#' @param ... further arguments to be passed to or from other methods.


... is not actually supported?

nvm, seen below it's added to the generic

That's correct, but I am not sure if there is a better way of handling that.

Right now we have generic as follows:

setGeneric("unionByName", function(x, y, ...) { standardGeneric("unionByName") })

‒ as far as I am aware this is the convention for handling optional arguments we use in SparkR.

Technically speaking we could have

setGeneric("unionByName", function(x, y, allowMissingColumns) { standardGeneric("unionByName") })

but then we'd have to support

signature(x = "SparkDataFrame", y = "SparkDataFrame", allowMissingColumns = "missing")

and

signature(x = "SparkDataFrame", y = "SparkDataFrame", allowMissingColumns = "logical")

if I am not mistaken, and in the past I've been told that's too much.

Do I miss something?

The way you've done it looks natural to me

R/pkg/R/DataFrame.R

SparkQA · 2020-09-20T17:16:40Z

Test build #128917 has finished for PR 29813 at commit e019e3d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-09-20T22:35:53Z

Test build #128920 has finished for PR 29813 at commit 0597724.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2020-09-21T00:39:06Z

Merged to master.

zero323 · 2020-09-21T05:26:02Z

Thanks @HyukjinKwon and @MichaelChirico!

Add allowMissingColumns to SparkR unionByName

94d8317

probot-autolabeler bot added R SQL labels Sep 20, 2020

zero323 changed the title ~~[SPARK-32799][R] Add allowMissingColumns to SparkR unionByName~~ [SPARK-32799][R][SQL] Add allowMissingColumns to SparkR unionByName Sep 20, 2020

zero323 marked this pull request as ready for review September 20, 2020 06:11

MichaelChirico reviewed Sep 20, 2020

View reviewed changes

R/pkg/R/DataFrame.R Outdated Show resolved Hide resolved

MichaelChirico reviewed Sep 20, 2020

View reviewed changes

R/pkg/R/DataFrame.R Outdated Show resolved Hide resolved

Fix example

e019e3d

Adjust description according to Michael's suggestion

0597724

probot-autolabeler bot added the PYTHON label Sep 20, 2020

HyukjinKwon approved these changes Sep 21, 2020

View reviewed changes

HyukjinKwon closed this in 7fb9f68 Sep 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-32799][R][SQL] Add allowMissingColumns to SparkR unionByName #29813

[SPARK-32799][R][SQL] Add allowMissingColumns to SparkR unionByName #29813

zero323 commented Sep 20, 2020

SparkQA commented Sep 20, 2020

MichaelChirico Sep 20, 2020

MichaelChirico Sep 20, 2020

zero323 Sep 20, 2020

MichaelChirico Sep 20, 2020

SparkQA commented Sep 20, 2020

SparkQA commented Sep 20, 2020

HyukjinKwon commented Sep 21, 2020

zero323 commented Sep 21, 2020

[SPARK-32799][R][SQL] Add allowMissingColumns to SparkR unionByName #29813

[SPARK-32799][R][SQL] Add allowMissingColumns to SparkR unionByName #29813

Conversation

zero323 commented Sep 20, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Sep 20, 2020

MichaelChirico Sep 20, 2020

Choose a reason for hiding this comment

MichaelChirico Sep 20, 2020

Choose a reason for hiding this comment

zero323 Sep 20, 2020

Choose a reason for hiding this comment

MichaelChirico Sep 20, 2020

Choose a reason for hiding this comment

SparkQA commented Sep 20, 2020

SparkQA commented Sep 20, 2020

HyukjinKwon commented Sep 21, 2020

zero323 commented Sep 21, 2020