-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-11976][SPARKR] Support "." character in DataFrame column name #14264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
rerngvit
commented
Jul 19, 2016
|
@sun-rui Please have a look. |
|
Jenkins, ok to test |
|
Test build #62550 has finished for PR 14264 at commit
|
|
There's this (and I thought one more, if I recall): And its tests in test_sparkSQL.R L779 - I'm surprised these tests pass? |
|
@rerngvit, could you share the background that this PR can fix the issue. I see that https://issues.apache.org/jira/browse/SPARK-11976 is still open. Any other PR in Spark 2.0 make this possible? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I'm just curious. Is it okay for other Spark module, too? The assumption of function looks like the following.
Different from resolveAsTableColumn, this assumes
namedoes NOT start with a qualifier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun I do not fully understand your question. Could you please elaborate a bit more? What do you mean by "okay for other Spark module"?
|
Test build #62612 has finished for PR 14264 at commit
|
## What changes were proposed in this pull request?
- Add support "." character in DataFrame column name
- Remove R code in "createDataFrame()" that replaces "." with "_"
- Remove warning suppression in createDataFrame() for SPARK-12034
- Remove checking for "." for function colnames()
- Replace usage of "_" to "." for column names in test_mllib.R and test_sparkSQL.R
## How was this patch tested?
SparkR unit tests and manual testing with R script described in SPARK-11976
|
@felixcheung: in the recent patch, I removed checking for "." for function colnames() and its test code for the file you indicated. |
As stated in https://issues.apache.org/jira/browse/SPARK-11976, Spark core is already supporting column name ".". However, it does not work without the backticks (`). The missing part to make it work is a corresponding logic in the Analyzer layer.
I do not fully understand your question. Could you please elaborate a bit more? This PR is actually SPARK-11976. |
|
Test build #62640 has finished for PR 14264 at commit
|
|
@rerngvit, sorry, I mean https://issues.apache.org/jira/browse/SPARK-11977. If your PR can enable accesses to columns with "." in their names without backticks, please first submit a PR for SPARK-11977, as the change is for the Spark Core, not SparkR specific. After that PR gets merged, you can then submit a PR for SPARK-11976 which contains SparkR only changes. |
|
@sun-rui Thanks. I understand now. SPARK-11977 (https://issues.apache.org/jira/browse/SPARK-11977) is for all special characters (e.g., "-", ".", " "), which is a broader scope than supporting the "." character (this PR). Would it make more sense to continue on this PR or create a separate JIRA issue for the Spark Core instead? Another alternative would be to limit the scope of SPARK-11977 to just ".", which was originally discussed? |
|
@rerngvit, I modifed the title of SPARK-11977 to a narrow scope. You can go for it. |
|
Since SPARK-11977 didn't get merge and this PR is blocked on that. I decided to close this PR. |