Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8690][SQL] Add a setting to disable SparkSQL parquet schema merge by using datasource API #7070

Closed
wants to merge 5 commits into from

Conversation

thegiive
Copy link

The detail problem story is in https://issues.apache.org/jira/browse/SPARK-8690

General speaking, I add a config spark.sql.parquet.mergeSchema to achieve the sqlContext.load("parquet" , Map( "path" -> "..." , "mergeSchema" -> "false" ))

It will become a simple flag and without any side affect.

@thegiive thegiive changed the title [Spark-8690][SQL] Add a setting to disable SparkSQL parquet schema merge by using datasource API [SPARK-8690][SQL] Add a setting to disable SparkSQL parquet schema merge by using datasource API Jun 28, 2015
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@liancheng
Copy link
Contributor

ok to test

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jun 29, 2015

Test build #36048 has started for PR 7070 at commit b6806fb.

@SparkQA
Copy link

SparkQA commented Jun 29, 2015

Test build #36048 has finished for PR 7070 at commit b6806fb.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@@ -114,7 +114,7 @@ private[sql] class ParquetRelation2(

// Should we merge schemas from all Parquet part-files?
private val shouldMergeSchemas =
parameters.getOrElse(ParquetRelation2.MERGE_SCHEMA, "true").toBoolean
parameters.getOrElse(ParquetRelation2.MERGE_SCHEMA, sqlContext.getConf("spark.sql.parquet.mergeSchema" , "true") ).toBoolean
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several styling issue here:

  1. 100 columns exceeded
  2. Remove the space before ,
  3. Remove the space before )

@liancheng
Copy link
Contributor

Thanks for contributing this! This feature looks good, but still requires some more polishing:

  1. Please fix the style issues as I commented.

  2. Please add this configuration to SQLConf. Example:

  3. Please add a test case for this.

    For example, we can write two Parquet files with incompatible schemas and put them into the same directory, then read the directory to see whether the schema merging is triggered (if it's triggered, exception will be thrown due to incompatible schemas).

@thegiive
Copy link
Author

Sure. I have already finish the coding. But I need some more time to write the test case

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jul 1, 2015

Test build #36247 has started for PR 7070 at commit db8ef1b.

@SparkQA
Copy link

SparkQA commented Jul 1, 2015

Test build #36247 has finished for PR 7070 at commit db8ef1b.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Merged build triggered.

@thegiive
Copy link
Author

thegiive commented Jul 1, 2015

Hi @liancheng , Can you help to check if it is what you suggest ?

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jul 1, 2015

Test build #36259 has started for PR 7070 at commit 94c9307.

@SparkQA
Copy link

SparkQA commented Jul 1, 2015

Test build #36259 has finished for PR 7070 at commit 94c9307.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@@ -227,6 +227,13 @@ private[spark] object SQLConf {
defaultValue = Some(true),
doc = "<TODO>")

val PARQUET_MERGE_SCHEMA_ENABLED = booleanConf("spark.sql.parquet.mergeSchema",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PARQUET_SCHEMA_MERGING_ENABLED might be a better name.



class ParquetSchemaMergeConfigSuite extends QueryTest with ParquetTest with BeforeAndAfterAll {
val sqlContext = TestSQLContext
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add override.

@liancheng
Copy link
Contributor

@thegiive I left several comments on styling issues, otherwise your changes looks pretty good to me now. The Databricks Scala style guide and Spark code style guide can be good references for code styling.


import org.apache.spark.sql.{SQLConf, QueryTest}
import org.apache.spark.sql.test.TestSQLContext
import org.scalatest.BeforeAndAfterAll
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jul 1, 2015

Test build #36310 has started for PR 7070 at commit c6f3e86.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36310 has finished for PR 7070 at commit c6f3e86.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@thegiive
Copy link
Author

thegiive commented Jul 2, 2015

HI @liancheng , thanks for the suggestion. I think your comment is really good and I have modified it already.

Please help to check if there is anything else

@liancheng
Copy link
Contributor

@thegiive Thanks for working on this! Merging to master.

@asfgit asfgit closed this in 246265f Jul 2, 2015
@thegiive
Copy link
Author

thegiive commented Jul 2, 2015

Thanks you. @liancheng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants