Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22158][SQL] convertMetastore should not ignore table property #19382

Closed
wants to merge 5 commits into from
Closed

[SPARK-22158][SQL] convertMetastore should not ignore table property #19382

wants to merge 5 commits into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Sep 28, 2017

What changes were proposed in this pull request?

From the beginning, convertMetastoreOrc ignores table properties and use an empty map instead. This PR fixes that. For the diff, please see this. convertMetastoreParquet also ignore.

val options = Map[String, String]()

How was this patch tested?

Pass the Jenkins with an updated test suite.

@SparkQA
Copy link

SparkQA commented Sep 28, 2017

Test build #82283 has finished for PR 19382 at commit 09c621b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

sql(
s"""
|CREATE TABLE t(id int) USING hive
|OPTIONS(fileFormat 'orc', compression 'Zlib')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like we have the same issue for parquet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for review, @gatorsmile . I'll check that, too.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-22158][SQL] convertMetastoreOrc should not ignore table property [SPARK-22158][SQL] convertMetastoreOrc/Parquet should not ignore table property Sep 29, 2017
@@ -189,12 +189,12 @@ case class RelationConversions(
private def convert(relation: HiveTableRelation): LogicalRelation = {
val serde = relation.tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT)
if (serde.contains("parquet")) {
val options = Map(ParquetOptions.MERGE_SCHEMA ->
val options = relation.tableMeta.storage.properties + (ParquetOptions.MERGE_SCHEMA ->
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @gatorsmile . It's the same in the Paquet. Without this, the test case for parquet fails.

|LOCATION '${path.toURI}'
""".stripMargin)
|CREATE TABLE t(id int) USING hive
|OPTIONS(fileFormat 'parquet', compression 'gzip')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parquet use gzip.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you combine these two test cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, it's not feasible. First of all,

  • Old ORC test case covers convertMetastoreOrc=false only. So, I extended it to cover both.
  • New Parquet test case covers `convertMetastoreParquet=true (default) only because Parquet has another bug when convertMetastoreParquet=false.

Also, the compression codec check is quite different between ORC and Parquet. Please see line 1510.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still can create a common helper function, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try your best to reduce duplicate codes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

sparkContext.hadoopConfiguration,
new Path(maybeParquetFile.get.getPath),
NO_FILTER)
assert("GZIP" === footer.getBlocks.get(0).getColumns().get(0).getCodec.toString)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for getting compression codec.

@dongjoon-hyun
Copy link
Member Author

@gatorsmile .
Whiling testing, I noticed that Parquet ignores table properties in case of convertMetastoreParquet=false, too. For that case, we need to proceed in another JIRA issue. #19218 may cover that, too.

@SparkQA
Copy link

SparkQA commented Sep 29, 2017

Test build #82324 has finished for PR 19382 at commit d218c98.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

It fails due to flume dependency.

[warn] 	::::::::::::::::::::::::::::::::::::::::::::::
[warn] 	::          UNRESOLVED DEPENDENCIES         ::
[warn] 	::::::::::::::::::::::::::::::::::::::::::::::
[warn] 	:: org.apache.spark#spark-streaming-flume_2.11;2.3.0-SNAPSHOT: not found

@dongjoon-hyun
Copy link
Member Author

Retest this please

@dongjoon-hyun
Copy link
Member Author

Retest this please.

@dongjoon-hyun
Copy link
Member Author

I retriggered because I noticed that master branch is fixed after the previous triggering.

@SparkQA
Copy link

SparkQA commented Sep 29, 2017

Test build #82327 has finished for PR 19382 at commit d218c98.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 29, 2017

Test build #82332 has finished for PR 19382 at commit d218c98.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-22158][SQL] convertMetastoreOrc/Parquet should not ignore table property [SPARK-22158][SQL] convertMetastore should not ignore table property Oct 2, 2017
@dongjoon-hyun
Copy link
Member Author

Retest this please

@SparkQA
Copy link

SparkQA commented Oct 2, 2017

Test build #82391 has finished for PR 19382 at commit 452a003.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 2, 2017

Test build #82395 has finished for PR 19382 at commit 17ccf9a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

gatorsmile commented Oct 2, 2017

LGTM

Thanks! Merged to master.

@gatorsmile
Copy link
Member

Please submit a separate PR to 2.2. Thanks!

@asfgit asfgit closed this in e5431f2 Oct 2, 2017
@dongjoon-hyun
Copy link
Member Author

Thank you, @gatorsmile . I'll.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants