-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-8690][SQL] Add a setting to disable SparkSQL parquet schema merge by using datasource API #7070
Conversation
…tasource API schema merge feature.
Can one of the admins verify this patch? |
ok to test |
Merged build triggered. |
Merged build started. |
Test build #36048 has started for PR 7070 at commit |
Test build #36048 has finished for PR 7070 at commit
|
Merged build finished. Test PASSed. |
@@ -114,7 +114,7 @@ private[sql] class ParquetRelation2( | |||
|
|||
// Should we merge schemas from all Parquet part-files? | |||
private val shouldMergeSchemas = | |||
parameters.getOrElse(ParquetRelation2.MERGE_SCHEMA, "true").toBoolean | |||
parameters.getOrElse(ParquetRelation2.MERGE_SCHEMA, sqlContext.getConf("spark.sql.parquet.mergeSchema" , "true") ).toBoolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several styling issue here:
- 100 columns exceeded
- Remove the space before
,
- Remove the space before
)
Thanks for contributing this! This feature looks good, but still requires some more polishing:
|
Sure. I have already finish the coding. But I need some more time to write the test case |
Merged build triggered. |
Merged build started. |
Test build #36247 has started for PR 7070 at commit |
Test build #36247 has finished for PR 7070 at commit
|
Merged build finished. Test FAILed. |
Merged build triggered. |
Hi @liancheng , Can you help to check if it is what you suggest ? |
Merged build started. |
Test build #36259 has started for PR 7070 at commit |
Test build #36259 has finished for PR 7070 at commit
|
Merged build finished. Test PASSed. |
@@ -227,6 +227,13 @@ private[spark] object SQLConf { | |||
defaultValue = Some(true), | |||
doc = "<TODO>") | |||
|
|||
val PARQUET_MERGE_SCHEMA_ENABLED = booleanConf("spark.sql.parquet.mergeSchema", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PARQUET_SCHEMA_MERGING_ENABLED
might be a better name.
|
||
|
||
class ParquetSchemaMergeConfigSuite extends QueryTest with ParquetTest with BeforeAndAfterAll { | ||
val sqlContext = TestSQLContext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add override
.
@thegiive I left several comments on styling issues, otherwise your changes looks pretty good to me now. The Databricks Scala style guide and Spark code style guide can be good references for code styling. |
|
||
import org.apache.spark.sql.{SQLConf, QueryTest} | ||
import org.apache.spark.sql.test.TestSQLContext | ||
import org.scalatest.BeforeAndAfterAll |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reorder imports. Please refer to https://github.com/databricks/scala-style-guide#imports
Merged build triggered. |
Merged build started. |
Test build #36310 has started for PR 7070 at commit |
Test build #36310 has finished for PR 7070 at commit
|
Merged build finished. Test PASSed. |
HI @liancheng , thanks for the suggestion. I think your comment is really good and I have modified it already. Please help to check if there is anything else |
@thegiive Thanks for working on this! Merging to master. |
Thanks you. @liancheng |
The detail problem story is in https://issues.apache.org/jira/browse/SPARK-8690
General speaking, I add a config spark.sql.parquet.mergeSchema to achieve the sqlContext.load("parquet" , Map( "path" -> "..." , "mergeSchema" -> "false" ))
It will become a simple flag and without any side affect.