-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-10904][SPARKR] Fix to support select(df, c("col1", "col2"))
#8961
Conversation
Jenkins, ok to test |
Test build #43168 has finished for PR 8961 at commit
|
Thanks @felixcheung - Fix looks good to me. Also just a note to myself that we should backport this branch-1.5 |
@felixcheung, could you verify that your PR works when there are columns more than 2? for example, select(df, c("c1", "c2", "c3")). I suspect that it does not work. Note that the signature of the Scala select: If we want to support the case where the parameter col is a vector, I think:
|
My fault, ignore my comment above. I did not realize that there is select(df, list) already in SparkR. This is irrelevant to the SerDe enhancement. This is basically OK. The remaining question is how to handle the ... varargs in this case? for example, select(df, c("c1","c2"), "c3", "c4")? Do we allow such invokation of select? |
We could certainly handle
This seems like an edge case - should we do that? |
I'm fine with not supporting the edge cases as long as we can throw a reasonable error message. |
ok, either case would be similar amount of code then. Please advise. |
I'd say lets keep the API simple and only support a vector or varargs but not both. |
Updated:
|
Test build #43200 has finished for PR 8961 at commit
|
Test build #43204 has finished for PR 8961 at commit
|
Test build #43205 has finished for PR 8961 at commit
|
LGTM |
Thanks @felixcheung @sun-rui -- LGTM. Merging this |
The fix is to coerce
c("a", "b")
into a list such that it could be serialized to call JVM with.