[SPARK-12204][SPARKR] Implement drop method for DataFrame in SparkR. #10201

sun-rui · 2015-12-08T12:07:05Z

No description provided.

SparkQA · 2015-12-08T12:35:22Z

Test build #47335 has finished for PR 10201 at commit e4372d6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2015-12-08T21:26:44Z

R/pkg/R/DataFrame.R

@@ -1324,12 +1312,16 @@ setMethod("selectExpr",
 #' path <- "path/to/file.json"
 #' df <- jsonFile(sqlContext, path)
 #' newDF <- withColumn(df, "newCol", df$col1 * 5)
+#' # Replace an existing column
+#' newDF2 <- withColumn(newDF, "newCol", newDF$col1)


I'm not 100% about the replace existing column behavior - I thought it was intentional that we support multiple columns with the same name before?

I don't know the reason. The original commit can be found at amplab-extras/SparkR-pkg#204.

I don't think it is related to supporting multiple columns with the same name. Spark Core itself allows multiple columns with the same name:

scala> val df=sqlContext.createDataFrame(Seq((1,2,3))).toDF("a","a","c") df: org.apache.spark.sql.DataFrame = [a: int, a: int, c: int] scala> df.show +---+---+---+ | a| a| c| +---+---+---+ | 1| 2| 3| +---+---+---+ scala> df.withColumn("a", df("c")).show +---+---+---+ | a| a| c| +---+---+---+ | 3| 3| 3| +---+---+---+

You can see all columns of the same name will be replaced in the above example.

I know the reason. When the withColumn was implemented in SparkR, the withColumn() in Scala support just adding columns, without support for replacing existing columns. But later, withColumn() in Scala was enhanced to support replacing existing columns, see #5541. However, withColumn in SparkR have not been synced with Scala until this PR:)

SparkQA · 2015-12-11T05:52:23Z

Test build #47571 has finished for PR 10201 at commit aa7682d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2015-12-11T07:26:41Z

R/pkg/R/DataFrame.R

+          function(x, col) {
+            stopifnot(class(col) == "character" || class(col) == "Column")
+
+            if (class(col) == "character") {


I'd flip this check, since @jc should only be called on Column
but minor point since it's checked in line 2245.

felixcheung · 2015-12-11T07:29:49Z

looks good, though I'm concerned with the replace column which could be breaking behavior change we should document.
Also @shivaram I think this shows it's good to be able to detected new/accidentally masked function :) #10171

sun-rui · 2015-12-11T09:11:36Z

@felixcheung, yes, this may cause backward-compatibility issue. But this is not SparkR specific, as it's change in Spark SQL core. Where is the appropriate place for documentation?

SparkQA · 2015-12-11T09:35:58Z

Test build #47580 has finished for PR 10201 at commit 619b946.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2015-12-12T01:32:09Z

SQL and MLlib have a "Migration guide" section, perhaps something like that? http://spark.apache.org/docs/latest/sql-programming-guide.html#migration-guide
In fact, there's language specific stuff in SQL's migration guide.

shivaram · 2015-12-12T04:41:00Z

@felixcheung Was there a migration guide entry for withColumn changing in Scala / Python. If so, we can also add one to a SparkR migration guide. At a high level, adding functionality that was added in Scala seems fine to me.

felixcheung · 2015-12-13T20:16:23Z

@shivaram I checked but release notes and programming guide/migration guide and I don't see referencing to withColumn for Spark 1.4.0 or 1.4.1. Perhaps the behavior change happened before the 1.4.0 release?

sun-rui · 2015-12-14T01:47:55Z

According to https://issues.apache.org/jira/browse/SPARK-6635 and https://issues.apache.org/jira/browse/SPARK-10073, the feature for Scala was in Spark 1.4.0 and python in 1.5.0. But seems both just have API updated without any migration guide for compatibility break. Do we need to do it specifically for SparkR?

sun-rui · 2015-12-14T06:28:30Z

@felixcheung, @shivaram, documentation for withColumn changed. please take a review

SparkQA · 2015-12-14T06:48:19Z

Test build #47643 has finished for PR 10201 at commit e6e9f10.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-14T07:47:02Z

Test build #47647 has finished for PR 10201 at commit 14215d3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sun-rui · 2015-12-15T01:35:10Z

@felixcheung, refine the wording:

Prior to 1.4, DataFrame.withColumn() supports adding a column only. The column will always be added as a new column with its specified name in the result DataFrame even if there may be any existing columns of the same name. Since 1.4, DataFrame.withColumn() supports adding a column of a different name from names of all existing columns or replacing existing columns of the same name.

Any comment?

felixcheung · 2015-12-15T05:30:19Z

that's good, thanks

SparkQA · 2015-12-15T07:14:23Z

Test build #47718 has finished for PR 10201 at commit 5eba7f9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2015-12-16T06:02:40Z

R/pkg/R/DataFrame.R

+#' sc <- sparkR.init()
+#' sqlCtx <- sparkRSQL.init(sc)
+#' path <- "path/to/file.json"
+#' df <- jsonFile(sqlCtx, path)


update this to read.json?

good catch. thanks

felixcheung · 2015-12-16T06:03:36Z

looks good - only a minor code doc comment.

SparkQA · 2015-12-16T08:13:19Z

Test build #47803 has finished for PR 10201 at commit f9659db.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sun-rui · 2015-12-22T02:37:37Z

any other comments? @shivaram, could you merge it?

shivaram · 2016-01-19T22:58:59Z

@sun-rui Sorry for the delay in looking at this. Could you bring this up to date with master ? It looks good to me.

sun-rui · 2016-01-20T03:03:41Z

docs/sql-programming-guide.md

@@ -2073,6 +2073,8 @@ options.
     --conf spark.sql.hive.thriftServer.singleSession=true \
     ...
   {% endhighlight %}
+ - Since 1.6.1, withColumn method in sparkR supports adding a new column to or replacing existing columns


which version is appropriate here? 1.6.1 or 2.0?

maybe we want to put this in the R migration guide session instead of SQL? or both?

sun-rui · 2016-01-20T04:35:11Z

rebased to master

SparkQA · 2016-01-20T04:44:52Z

Test build #49758 has finished for PR 10201 at commit 89657f8.

This patch fails R style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-01-20T09:29:21Z

Test build #49777 has finished for PR 10201 at commit c08c1ea.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-01-21T03:26:17Z

Test build #49846 has finished for PR 10201 at commit 5eb3004.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

shivaram · 2016-01-21T03:39:39Z

LGTM

shivaram · 2016-01-21T05:05:43Z

Merging this to master

felixcheung reviewed Dec 8, 2015
View reviewed changes

felixcheung reviewed Dec 11, 2015
View reviewed changes

felixcheung reviewed Dec 16, 2015
View reviewed changes

sun-rui reviewed Jan 20, 2016
View reviewed changes

Sun Rui added 5 commits January 20, 2016 12:32

[SPARK-12204][SPARKR] Implement drop method for DataFrame in SparkR.

f6b66ef

expose base::drop.

6416d3c

Fix comments.

6f44594

Update documentation for withColumn.

9bd4ef1

Use read.json() instead of deprecated jsonFile().

4646760

Sun Rui added 2 commits January 20, 2016 12:32

Refine documentation update.

9ebae59

Use read.json() instead of jsonFile() in function description of drop().

89657f8

sun-rui force-pushed the SPARK-12204 branch from f9659db to 89657f8 Compare January 20, 2016 04:34

Fix R style.

c08c1ea

Fix test break.

5eb3004

asfgit closed this in 1b2a918 Jan 21, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-12204][SPARKR] Implement drop method for DataFrame in SparkR. #10201

[SPARK-12204][SPARKR] Implement drop method for DataFrame in SparkR. #10201

sun-rui commented Dec 8, 2015

SparkQA commented Dec 8, 2015

felixcheung Dec 8, 2015

sun-rui Dec 9, 2015

sun-rui Dec 9, 2015

SparkQA commented Dec 11, 2015

felixcheung Dec 11, 2015

felixcheung commented Dec 11, 2015

sun-rui commented Dec 11, 2015

SparkQA commented Dec 11, 2015

felixcheung commented Dec 12, 2015

shivaram commented Dec 12, 2015

felixcheung commented Dec 13, 2015

sun-rui commented Dec 14, 2015

sun-rui commented Dec 14, 2015

SparkQA commented Dec 14, 2015

SparkQA commented Dec 14, 2015

sun-rui commented Dec 15, 2015

felixcheung commented Dec 15, 2015

SparkQA commented Dec 15, 2015

felixcheung Dec 16, 2015

sun-rui Dec 16, 2015

felixcheung commented Dec 16, 2015

SparkQA commented Dec 16, 2015

sun-rui commented Dec 22, 2015

shivaram commented Jan 19, 2016

sun-rui Jan 20, 2016

felixcheung Jan 20, 2016

sun-rui commented Jan 20, 2016

SparkQA commented Jan 20, 2016

SparkQA commented Jan 20, 2016

SparkQA commented Jan 21, 2016

shivaram commented Jan 21, 2016

shivaram commented Jan 21, 2016

[SPARK-12204][SPARKR] Implement drop method for DataFrame in SparkR. #10201

[SPARK-12204][SPARKR] Implement drop method for DataFrame in SparkR. #10201

Conversation

sun-rui commented Dec 8, 2015

SparkQA commented Dec 8, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Dec 11, 2015

Choose a reason for hiding this comment

felixcheung commented Dec 11, 2015

sun-rui commented Dec 11, 2015

SparkQA commented Dec 11, 2015

felixcheung commented Dec 12, 2015

shivaram commented Dec 12, 2015

felixcheung commented Dec 13, 2015

sun-rui commented Dec 14, 2015

sun-rui commented Dec 14, 2015

SparkQA commented Dec 14, 2015

SparkQA commented Dec 14, 2015

sun-rui commented Dec 15, 2015

felixcheung commented Dec 15, 2015

SparkQA commented Dec 15, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felixcheung commented Dec 16, 2015

SparkQA commented Dec 16, 2015

sun-rui commented Dec 22, 2015

shivaram commented Jan 19, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sun-rui commented Jan 20, 2016

SparkQA commented Jan 20, 2016

SparkQA commented Jan 20, 2016

SparkQA commented Jan 21, 2016

shivaram commented Jan 21, 2016

shivaram commented Jan 21, 2016