Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-10904][SPARKR] Fix to support select(df, c("col1", "col2")) #8961

Closed
wants to merge 4 commits into from

Conversation

felixcheung
Copy link
Member

The fix is to coerce c("a", "b") into a list such that it could be serialized to call JVM with.

@shivaram
Copy link
Contributor

shivaram commented Oct 2, 2015

Jenkins, ok to test

@SparkQA
Copy link

SparkQA commented Oct 2, 2015

Test build #43168 has finished for PR 8961 at commit 41b5673.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

shivaram commented Oct 2, 2015

Thanks @felixcheung - Fix looks good to me.
cc @sun-rui -- Is there a slightly better fix with the new SerDe we have.

Also just a note to myself that we should backport this branch-1.5

@sun-rui
Copy link
Contributor

sun-rui commented Oct 2, 2015

@felixcheung, could you verify that your PR works when there are columns more than 2? for example, select(df, c("c1", "c2", "c3")). I suspect that it does not work. Note that the signature of the Scala select:
select(col:String, cols: String*)

If we want to support the case where the parameter col is a vector, I think:

  1. extract the col[[1]]
  2. merge the remaining col[2:length(col)] and the possible ... varargs.

@sun-rui
Copy link
Contributor

sun-rui commented Oct 2, 2015

My fault, ignore my comment above. I did not realize that there is select(df, list) already in SparkR.

This is irrelevant to the SerDe enhancement. This is basically OK. The remaining question is how to handle the ... varargs in this case? for example, select(df, c("c1","c2"), "c3", "c4")? Do we allow such invokation of select?

@felixcheung
Copy link
Member Author

We could certainly handle select(df, c("c1","c2"), "c3", "c4"), by changing the code to

            if (length(col) > 1) {
                select(x, append(as.list(col), list(...)))

This seems like an edge case - should we do that?

@shivaram
Copy link
Contributor

shivaram commented Oct 2, 2015

I'm fine with not supporting the edge cases as long as we can throw a reasonable error message.

@felixcheung
Copy link
Member Author

ok, either case would be similar amount of code then. Please advise.

@shivaram
Copy link
Contributor

shivaram commented Oct 2, 2015

I'd say lets keep the API simple and only support a vector or varargs but not both.

@felixcheung
Copy link
Member Author

Updated:

> head(select(df,c("name", "age"), "abc"))
Error in head(select(df, c("name", "age"), "abc")) :
  error in evaluating the argument 'x' in selecting a method for function 'head': Error in select(df, c("name", "age"), "abc") :
  To select multiple columns, use a character vector or list for col
> head(select(df,c("name", "age"), "abc", "bc"))
Error in head(select(df, c("name", "age"), "abc", "bc")) :
  error in evaluating the argument 'x' in selecting a method for function 'head': Error in select(df, c("name", "age"), "abc", "bc") :
  To select multiple columns, use a character vector or list for col
> head(select(df,c("name", "age"), c("abc", "bc")))
Error in head(select(df, c("name", "age"), c("abc", "bc"))) :
  error in evaluating the argument 'x' in selecting a method for function 'head': Error in select(df, c("name", "age"), c("abc", "bc")) :
  To select multiple columns, use a character vector or list for col

@SparkQA
Copy link

SparkQA commented Oct 2, 2015

Test build #43200 has finished for PR 8961 at commit dfac25c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 3, 2015

Test build #43204 has finished for PR 8961 at commit d4dd39c.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 3, 2015

Test build #43205 has finished for PR 8961 at commit 4f8f170.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sun-rui
Copy link
Contributor

sun-rui commented Oct 3, 2015

LGTM

@shivaram
Copy link
Contributor

shivaram commented Oct 3, 2015

Thanks @felixcheung @sun-rui -- LGTM. Merging this

@asfgit asfgit closed this in 721e8b5 Oct 4, 2015
asfgit pushed a commit that referenced this pull request Oct 4, 2015
The fix is to coerce `c("a", "b")` into a list such that it could be serialized to call JVM with.

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #8961 from felixcheung/rselect.

(cherry picked from commit 721e8b5)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants