[SPARK-10400] [SQL] Renames SQLConf.PARQUET_FOLLOW_PARQUET_FORMAT_SPEC #8566

liancheng · 2015-09-02T07:10:16Z

We introduced SQL option spark.sql.parquet.followParquetFormatSpec while working on implementing Parquet backwards-compatibility rules in SPARK-6777. It indicates whether we should use legacy Parquet format adopted by Spark 1.4 and prior versions or the standard format defined in parquet-format spec to write Parquet files.

This option defaults to false and is marked as a non-public option (isPublic = false) because we haven't finished refactored Parquet write path. The problem is, the name of this option is somewhat confusing, because it's not super intuitive why we shouldn't follow the spec. Would be nice to rename it to spark.sql.parquet.writeLegacyFormat, and invert its default value (the two option names have opposite meanings).

Although this option is private in 1.5, we'll make it public in 1.6 after refactoring Parquet write path. So that users can decide whether to write Parquet files in standard format or legacy format.

liancheng · 2015-09-02T07:15:59Z

Opened #8568 for the same purpose but against branch-1.5.

liancheng · 2015-09-02T07:17:51Z

This PR was originally part of the closed PR #7679, which aimed to refactor Parquet write path for better interoperability. I put too many things into that one and decided to split it into several smaller ones to ease code review.

SparkQA · 2015-09-02T09:08:03Z

Test build #41917 has finished for PR 8566 at commit b3f7877.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-09-14T09:20:23Z

Test build #42413 has finished for PR 8566 at commit 85bbfde.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class Stddev(child: Expression) extends StddevAgg(child)
- case class StddevPop(child: Expression) extends StddevAgg(child)
- case class StddevSamp(child: Expression) extends StddevAgg(child)
- abstract class StddevAgg(child: Expression) extends AlgebraicAggregate
- abstract class StddevAgg1(child: Expression) extends UnaryExpression with PartialAggregate1
- case class Stddev(child: Expression) extends StddevAgg1(child)
- case class StddevPop(child: Expression) extends StddevAgg1(child)
- case class StddevSamp(child: Expression) extends StddevAgg1(child)
- case class ComputePartialStd(child: Expression) extends UnaryExpression with AggregateExpression1
- case class ComputePartialStdFunction (
- case class MergePartialStd(
- case class MergePartialStdFunction(
- case class StddevFunction(

liancheng · 2015-10-01T22:06:47Z

retest this please

SparkQA · 2015-10-02T00:04:44Z

Test build #43161 has finished for PR 8566 at commit 85bbfde.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2015-10-02T00:22:07Z

LGTM. Merging to master.

## What changes were proposed in this pull request? Some improvements: 1. Point out we are using both Spark SQ native syntax and HQL syntax in the example 2. Avoid using the same table name with temp view, to not confuse users. 3. Create the external hive table with a directory that already has data, which is a more common use case. 4. Remove the usage of `spark.sql.parquet.writeLegacyFormat`. This config was introduced by #8566 and has nothing to do with Hive. 5. Remove `repartition` and `coalesce` example. These 2 are not Hive specific, we should put them in a different example file. BTW they can't accurately control the number of output files, `spark.sql.files.maxRecordsPerFile` also controls it. ## How was this patch tested? N/A Author: Wenchen Fan <wenchen@databricks.com> Closes #20081 from cloud-fan/minor.

liancheng mentioned this pull request Sep 2, 2015

[SPARK-10400] [SQL] Renames SQLConf.PARQUET_FOLLOW_PARQUET_FORMAT_SPEC (for branch-1.5) #8568

Closed

liancheng added 2 commits September 14, 2015 15:23

Deprecates SQLConf.PARQUET_FOLLOW_PARQUET_FORMAT_SPEC

7d94924

Removes instead of deprecates the old option

85bbfde

liancheng force-pushed the spark-10400/deprecate-follow-parquet-format-spec branch from b3f7877 to 85bbfde Compare September 14, 2015 07:24

asfgit closed this in 01cd688 Oct 2, 2015

liancheng deleted the spark-10400/deprecate-follow-parquet-format-spec branch October 2, 2015 01:04

liancheng restored the spark-10400/deprecate-follow-parquet-format-spec branch October 6, 2015 23:11

cloud-fan mentioned this pull request Dec 26, 2017

[SPARK-22833][EXAMPLE] Improvement SparkHive Scala Examples #20081

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-10400] [SQL] Renames SQLConf.PARQUET_FOLLOW_PARQUET_FORMAT_SPEC #8566

[SPARK-10400] [SQL] Renames SQLConf.PARQUET_FOLLOW_PARQUET_FORMAT_SPEC #8566

liancheng commented Sep 2, 2015

liancheng commented Sep 2, 2015

liancheng commented Sep 2, 2015

SparkQA commented Sep 2, 2015

SparkQA commented Sep 14, 2015

liancheng commented Oct 1, 2015

SparkQA commented Oct 2, 2015

yhuai commented Oct 2, 2015

[SPARK-10400] [SQL] Renames SQLConf.PARQUET_FOLLOW_PARQUET_FORMAT_SPEC #8566

[SPARK-10400] [SQL] Renames SQLConf.PARQUET_FOLLOW_PARQUET_FORMAT_SPEC #8566

Conversation

liancheng commented Sep 2, 2015

liancheng commented Sep 2, 2015

liancheng commented Sep 2, 2015

SparkQA commented Sep 2, 2015

SparkQA commented Sep 14, 2015

liancheng commented Oct 1, 2015

SparkQA commented Oct 2, 2015

yhuai commented Oct 2, 2015