Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20980] [SQL] Rename wholeFile to multiLine for both CSV and JSON #18202

Closed
wants to merge 3 commits into from

Conversation

gatorsmile
Copy link
Member

What changes were proposed in this pull request?

The current option name wholeFile is misleading for CSV users. Currently, it is not representing a record per file. Actually, one file could have multiple records. Thus, we should rename it. Now, the proposal is multiLine.

How was this patch tested?

N/A

@gatorsmile gatorsmile changed the title [SPARK-20980] [SQL] CSV: rename the option wholeFile to multiLine [SPARK-20980] [SQL] Rename the CSV option wholeFile to multiLine Jun 5, 2017
@gatorsmile
Copy link
Member Author

gatorsmile commented Jun 5, 2017

cc @cloud-fan @HyukjinKwon

@SparkQA
Copy link

SparkQA commented Jun 5, 2017

Test build #77733 has started for PR 18202 at commit 1fae66a.

@dongjoon-hyun
Copy link
Member

Retest this please.

@@ -128,7 +128,7 @@ class CSVOptions(
FastDateFormat.getInstance(
parameters.getOrElse("timestampFormat", "yyyy-MM-dd'T'HH:mm:ss.SSSXXX"), timeZone, Locale.US)

val wholeFile = parameters.get("wholeFile").map(_.toBoolean).getOrElse(false)
val multiLine = parameters.get("multiLine").map(_.toBoolean).getOrElse(false)
Copy link
Member

@dongjoon-hyun dongjoon-hyun Jun 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is different. Each JSON file only can parse at most one record when wholeFile is on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. Thank you!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After rethinking the issue, we need to rename both CSV and JSON to multiLine and fix the JSON parsing to make them consistent.

@gatorsmile gatorsmile changed the title [SPARK-20980] [SQL] Rename the CSV option wholeFile to multiLine [SPARK-20980] [SQL] Rename wholeFile to multiLine for both CSV and JSON Jun 5, 2017
@SparkQA
Copy link

SparkQA commented Jun 5, 2017

Test build #77746 has finished for PR 18202 at commit 1fae66a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jun 5, 2017

I think the change itself looks good as targeted to me (if this is going to be included in 2.2.0 - I just saw https://issues.apache.org/jira/browse/SPARK-20980?focusedCommentId=16037416&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16037416). Looks we just need a decision. Probably, please let me cc @rxin who I believe came up the option name initially.

@rxin
Copy link
Contributor

rxin commented Jun 5, 2017

Wouldn't this break compatibility?

@SparkQA
Copy link

SparkQA commented Jun 5, 2017

Test build #77751 has finished for PR 18202 at commit f70994c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 5, 2017

Test build #77752 has finished for PR 18202 at commit 5780ef5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jun 5, 2017

Both options look added in 2.2 assuming from the JIRAs https://issues.apache.org/jira/browse/SPARK-19610 and https://issues.apache.org/jira/browse/SPARK-18352. If it targets 2.2.0, I guess It wouldn't.

@cloud-fan
Copy link
Contributor

let's hold it until RC4 finishes. If RC4 passes, we need to update this PR to support the old option name, otherwise we can just rename.

@HyukjinKwon
Copy link
Member

Hi all, it sounds RC4 vote was failed. Should we proceed this one?

@asfgit asfgit closed this in 2051428 Jun 15, 2017
@cloud-fan
Copy link
Contributor

ah lucky :) merging to master/2.2!

asfgit pushed a commit that referenced this pull request Jun 15, 2017
… JSON

The current option name `wholeFile` is misleading for CSV users. Currently, it is not representing a record per file. Actually, one file could have multiple records. Thus, we should rename it. Now, the proposal is `multiLine`.

N/A

Author: Xiao Li <gatorsmile@gmail.com>

Closes #18202 from gatorsmile/renameCVSOption.

(cherry picked from commit 2051428)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@felixcheung
Copy link
Member

@felixcheung
Copy link
Member

opened #18312

dataknocker pushed a commit to dataknocker/spark that referenced this pull request Jun 16, 2017
… JSON

### What changes were proposed in this pull request?
The current option name `wholeFile` is misleading for CSV users. Currently, it is not representing a record per file. Actually, one file could have multiple records. Thus, we should rename it. Now, the proposal is `multiLine`.

### How was this patch tested?
N/A

Author: Xiao Li <gatorsmile@gmail.com>

Closes apache#18202 from gatorsmile/renameCVSOption.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants