Skip to content

Conversation

@rerngvit
Copy link
Contributor

## What changes were proposed in this pull request?
- Add support "." character in DataFrame column name
- Remove R code in "createDataFrame()" that replaces "." with "_"
- Remove warning suppression in createDataFrame() for iris dataset (introduced in SPARK-12034)

## How was this patch tested?
SparkR unit tests and manual testing with the R script described in SPARK-11976

@rerngvit
Copy link
Contributor Author

@sun-rui Please have a look.

@shivaram
Copy link
Contributor

Jenkins, ok to test

@SparkQA
Copy link

SparkQA commented Jul 19, 2016

Test build #62550 has finished for PR 14264 at commit e1cf3d5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

There's this (and I thought one more, if I recall):
https://github.com/apache/spark/blob/master/R/pkg/R/DataFrame.R#L349

And its tests in test_sparkSQL.R L779 - I'm surprised these tests pass?

@sun-rui
Copy link
Contributor

sun-rui commented Jul 20, 2016

@rerngvit, could you share the background that this PR can fix the issue. I see that https://issues.apache.org/jira/browse/SPARK-11976 is still open. Any other PR in Spark 2.0 make this possible?

Copy link
Member

@dongjoon-hyun dongjoon-hyun Jul 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I'm just curious. Is it okay for other Spark module, too? The assumption of function looks like the following.

Different from resolveAsTableColumn, this assumes name does NOT start with a qualifier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun I do not fully understand your question. Could you please elaborate a bit more? What do you mean by "okay for other Spark module"?

@SparkQA
Copy link

SparkQA commented Jul 20, 2016

Test build #62612 has finished for PR 14264 at commit b9000c7.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

    ## What changes were proposed in this pull request?
    - Add support "." character in DataFrame column name
    - Remove R code in "createDataFrame()" that replaces "." with "_"
    - Remove warning suppression in createDataFrame() for SPARK-12034
    - Remove checking for "." for function colnames()
    - Replace usage of "_" to "." for column names in test_mllib.R and test_sparkSQL.R

    ## How was this patch tested?
    SparkR unit tests and manual testing with R script described in SPARK-11976
@rerngvit
Copy link
Contributor Author

@felixcheung: in the recent patch, I removed checking for "." for function colnames() and its test code for the file you indicated.

@rerngvit
Copy link
Contributor Author

rerngvit commented Jul 20, 2016

@sun-rui

could you share the background that this PR can fix the issue.

As stated in https://issues.apache.org/jira/browse/SPARK-11976, Spark core is already supporting column name ".". However, it does not work without the backticks (`). The missing part to make it work is a corresponding logic in the Analyzer layer.

I see that https://issues.apache.org/jira/browse/SPARK-11976 is still open.

I do not fully understand your question. Could you please elaborate a bit more? This PR is actually SPARK-11976.

@SparkQA
Copy link

SparkQA commented Jul 21, 2016

Test build #62640 has finished for PR 14264 at commit 60d0145.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sun-rui
Copy link
Contributor

sun-rui commented Jul 21, 2016

@rerngvit, sorry, I mean https://issues.apache.org/jira/browse/SPARK-11977. If your PR can enable accesses to columns with "." in their names without backticks, please first submit a PR for SPARK-11977, as the change is for the Spark Core, not SparkR specific. After that PR gets merged, you can then submit a PR for SPARK-11976 which contains SparkR only changes.

@rerngvit
Copy link
Contributor Author

rerngvit commented Jul 21, 2016

@sun-rui Thanks. I understand now. SPARK-11977 (https://issues.apache.org/jira/browse/SPARK-11977) is for all special characters (e.g., "-", ".", " "), which is a broader scope than supporting the "." character (this PR). Would it make more sense to continue on this PR or create a separate JIRA issue for the Spark Core instead? Another alternative would be to limit the scope of SPARK-11977 to just ".", which was originally discussed?

@sun-rui
Copy link
Contributor

sun-rui commented Jul 21, 2016

@rerngvit, I modifed the title of SPARK-11977 to a narrow scope. You can go for it.

@rerngvit
Copy link
Contributor Author

rerngvit commented Aug 5, 2016

Since SPARK-11977 didn't get merge and this PR is blocked on that. I decided to close this PR.

@rerngvit rerngvit closed this Aug 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants