[SPARK-21727][R] Allow multi-element atomic vector as column type in SparkR DataFrame #20352

neilalex · 2018-01-22T21:01:27Z

What changes were proposed in this pull request?

A fix to https://issues.apache.org/jira/browse/SPARK-21727, "Operating on an ArrayType in a SparkR DataFrame throws error"

How was this patch tested?

Ran tests at R\pkg\tests\run-all.R (see below attached results)
Tested the following lines in SparkR, which now seem to execute without error:

indices <- 1:4
myDf <- data.frame(indices)
myDf$data <- list(rep(0, 20))
mySparkDf <- as.DataFrame(myDf)
collect(mySparkDf)

2018-01-22 SPARK-21727 Test Results.txt

@felixcheung @yanboliang @sun-rui @shivaram

The contribution is my original work and I license the work to the project under the project’s open source license

…at it as an array if so.

shivaram · 2018-01-22T21:22:33Z

Jenkins, ok to test

shivaram · 2018-01-22T21:26:52Z

@neilalex Can you add the code snippet in the PR description as a new test case ? That way we will ensure this behavior is tested going forward

neilalex · 2018-01-22T21:28:08Z

sure

SparkQA · 2018-01-22T22:14:27Z

Test build #86494 has finished for PR 20352 at commit f8ae698.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

neilalex · 2018-01-23T01:12:18Z

@shivaram alright, should be good with the tests now -- let me know how it seems

SparkQA · 2018-01-23T01:51:27Z

Test build #86500 has finished for PR 20352 at commit 01fc9e1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

shivaram · 2018-01-23T02:16:45Z

Thanks @neilalex - Change LGTM. Lets also see if @felixcheung has any comments.

felixcheung · 2018-01-23T14:44:41Z

That looks good. Thanks! We work together on this fix so I’m pretty confident about this change. On a side note assuming we push this to 2.3 should we add the behavior change to the R programming guide migration guide section?

…SparkR DataFrame ## What changes were proposed in this pull request? A fix to https://issues.apache.org/jira/browse/SPARK-21727, "Operating on an ArrayType in a SparkR DataFrame throws error" ## How was this patch tested? - Ran tests at R\pkg\tests\run-all.R (see below attached results) - Tested the following lines in SparkR, which now seem to execute without error: ``` indices <- 1:4 myDf <- data.frame(indices) myDf$data <- list(rep(0, 20)) mySparkDf <- as.DataFrame(myDf) collect(mySparkDf) ``` [2018-01-22 SPARK-21727 Test Results.txt](https://github.com/apache/spark/files/1653535/2018-01-22.SPARK-21727.Test.Results.txt) felixcheung yanboliang sun-rui shivaram _The contribution is my original work and I license the work to the project under the project’s open source license_ Author: neilalex <neil@neilalex.com> Closes #20352 from neilalex/neilalex-sparkr-arraytype. (cherry picked from commit f54b65c) Signed-off-by: Felix Cheung <felixcheung@apache.org>

felixcheung · 2018-01-24T06:37:45Z

merged to master/2.3. we could revisit migration guide if necessary.
thanks!

neilalex · 2018-01-24T14:03:30Z

@felixcheung @yanboliang @shivaram thank you for your guidance!

neilalex added 2 commits January 22, 2018 15:10

Check if an atomic R type is actually a vector of length > 1, and tre…

6bdf687

…at it as an array if so.

Use is.atomic(object) to check type

f8ae698

tests for SparkR data.frame with multi-element atomic vector

01fc9e1

felixcheung approved these changes Jan 23, 2018

View reviewed changes

asfgit closed this in f54b65c Jan 24, 2018

neilalex deleted the neilalex-sparkr-arraytype branch January 24, 2018 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-21727][R] Allow multi-element atomic vector as column type in SparkR DataFrame #20352

[SPARK-21727][R] Allow multi-element atomic vector as column type in SparkR DataFrame #20352

neilalex commented Jan 22, 2018

shivaram commented Jan 22, 2018

shivaram commented Jan 22, 2018

neilalex commented Jan 22, 2018

SparkQA commented Jan 22, 2018

neilalex commented Jan 23, 2018

SparkQA commented Jan 23, 2018

shivaram commented Jan 23, 2018

felixcheung commented Jan 23, 2018 via email

felixcheung commented Jan 24, 2018

neilalex commented Jan 24, 2018

[SPARK-21727][R] Allow multi-element atomic vector as column type in SparkR DataFrame #20352

[SPARK-21727][R] Allow multi-element atomic vector as column type in SparkR DataFrame #20352

Conversation

neilalex commented Jan 22, 2018

What changes were proposed in this pull request?

How was this patch tested?

shivaram commented Jan 22, 2018

shivaram commented Jan 22, 2018

neilalex commented Jan 22, 2018

SparkQA commented Jan 22, 2018

neilalex commented Jan 23, 2018

SparkQA commented Jan 23, 2018

shivaram commented Jan 23, 2018

felixcheung commented Jan 23, 2018 via email

felixcheung commented Jan 24, 2018

neilalex commented Jan 24, 2018