Skip to content

Comments

[SPARK-30234][SQL][FOLLOWUP] Rename spark.sql.legacy.addDirectory.recursive.enabled to spark.sql.legacy.addSingleFileInAddFile#27725

Closed
iRakson wants to merge 2 commits intoapache:masterfrom
iRakson:SPARK-30234_CONFIG
Closed

[SPARK-30234][SQL][FOLLOWUP] Rename spark.sql.legacy.addDirectory.recursive.enabled to spark.sql.legacy.addSingleFileInAddFile#27725
iRakson wants to merge 2 commits intoapache:masterfrom
iRakson:SPARK-30234_CONFIG

Conversation

@iRakson
Copy link
Contributor

@iRakson iRakson commented Feb 27, 2020

What changes were proposed in this pull request?

Rename spark.sql.legacy.addDirectory.recursive.enabled to spark.sql.legacy.addSingleFileInAddFile

Why are the changes needed?

To follow the naming convention

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing UTs.

…cursive.enabled` to `spark.sql.legacy.addSingleFileInAddFile`
@iRakson
Copy link
Contributor Author

iRakson commented Feb 27, 2020

@dongjoon-hyun
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Feb 28, 2020

Test build #119075 has finished for PR 27725 at commit 346b3e8.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

Copy link
Member

@Ngone51 Ngone51 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually OK with this change as it fix the inconsistent of the config.

But I still doubt whether we should treat it as legacy, here. To me, legacy just sounds like the previous behavior is not good, especially there's some problem with it. And, in practical, I believe Spark would aways recommend user to use the new behavior over the legacy behavior.

But, in this case, does Spark has a strong prefer to recommend user to use add directory when using ADD FILE? I think it depends on use case, right?

So, personally, I still think the config should be a new feature flag rather than legacy. And we can make it turn on by default to be friendly to user.

@SparkQA
Copy link

SparkQA commented Feb 28, 2020

Test build #119079 has finished for PR 27725 at commit 346b3e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Mar 1, 2020

To me, legacy just sounds like the previous behavior is not good, especially there's some problem with it.

It doesn't necessarily have to have a problem with it. We have a legacy configuration based on case-by-case, for example, when the previous behaviour has been widely used, newer behaviour supersedes the old behaviour, etc. Note that bugs have to be fixed without legacy config as a general guideline!

spark.sql.legacy.addSingleFileInAddFile supersedes old behaviour, and we will remove this configuration away in the future because disabling spark.sql.legacy.addSingleFileInAddFile will support both a directory and single file.

Also, I don't actually completely buy this argument #26863 (comment) since we already have different default values for legacy configurations, e.g., spark.sql.legacy.replaceDatabricksSparkAvro.enabled vs spark.sql.legacy.setopsPrecedence.enabled. I only buy the argument about the namespace. Doesn't quite matter if it's called directory or file.

@HyukjinKwon
Copy link
Member

Merged to master and branch-3.0.

HyukjinKwon pushed a commit that referenced this pull request Mar 1, 2020
…cursive.enabled` to `spark.sql.legacy.addSingleFileInAddFile`

### What changes were proposed in this pull request?
Rename `spark.sql.legacy.addDirectory.recursive.enabled` to `spark.sql.legacy.addSingleFileInAddFile`

### Why are the changes needed?
To follow the naming convention

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Existing UTs.

Closes #27725 from iRakson/SPARK-30234_CONFIG.

Authored-by: iRakson <raksonrakesh@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(cherry picked from commit 92a5ae2)
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
@HyukjinKwon
Copy link
Member

But, in this case, does Spark has a strong prefer to recommend user to use add directory when using ADD FILE? I think it depends on use case, right?

Spark will have support a directory in ADD FILE as a Hive compatibility concern as discussed in #26863.

sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…cursive.enabled` to `spark.sql.legacy.addSingleFileInAddFile`

### What changes were proposed in this pull request?
Rename `spark.sql.legacy.addDirectory.recursive.enabled` to `spark.sql.legacy.addSingleFileInAddFile`

### Why are the changes needed?
To follow the naming convention

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Existing UTs.

Closes apache#27725 from iRakson/SPARK-30234_CONFIG.

Authored-by: iRakson <raksonrakesh@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants