Skip to content

Commit

Permalink
[SPARK-16777][SQL] Do not use deprecated listType API in ParquetSchem…
Browse files Browse the repository at this point in the history
…aConverter

## What changes were proposed in this pull request?

This PR removes build waning as below.

```scala
[WARNING] .../spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala:448: method listType in object ConversionPatterns is deprecated: see corresponding Javadoc for more information.
[WARNING]         ConversionPatterns.listType(
[WARNING]                            ^
[WARNING] .../spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala:464: method listType in object ConversionPatterns is deprecated: see corresponding Javadoc for more information.
[WARNING]         ConversionPatterns.listType(
[WARNING]                            ^
```

This should not use `listOfElements` (recommended to be replaced from `listType`) instead because the new method checks if the name of elements in Parquet's `LIST` is `element` in Parquet schema and throws an exception if not. However, It seems Spark prior to 1.4.x writes `ArrayType` with Parquet's `LIST` but with `array` as its element name.

Therefore, this PR avoids to use both `listOfElements` and `listType` but just use the existing schema builder to construct the same `GroupType`.

## How was this patch tested?

Existing tests should cover this.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #14399 from HyukjinKwon/SPARK-16777.
  • Loading branch information
HyukjinKwon authored and liancheng committed Sep 27, 2016
1 parent 6a68c5d commit 5de1737
Showing 1 changed file with 17 additions and 9 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -445,14 +445,20 @@ private[parquet] class ParquetSchemaConverter(
// repeated <element-type> array;
// }
// }
ConversionPatterns.listType(
repetition,
field.name,
Types

// This should not use `listOfElements` here because this new method checks if the
// element name is `element` in the `GroupType` and throws an exception if not.
// As mentioned above, Spark prior to 1.4.x writes `ArrayType` as `LIST` but with
// `array` as its element name as below. Therefore, we build manually
// the correct group type here via the builder. (See SPARK-16777)
Types
.buildGroup(repetition).as(LIST)
.addField(Types
.buildGroup(REPEATED)
// "array_element" is the name chosen by parquet-hive (1.7.0 and prior version)
// "array" is the name chosen by parquet-hive (1.7.0 and prior version)
.addField(convertField(StructField("array", elementType, nullable)))
.named("bag"))
.named(field.name)

// Spark 1.4.x and prior versions convert ArrayType with non-nullable elements into a 2-level
// LIST structure. This behavior mimics parquet-avro (1.6.0rc3). Note that this case is
Expand All @@ -461,11 +467,13 @@ private[parquet] class ParquetSchemaConverter(
// <list-repetition> group <name> (LIST) {
// repeated <element-type> element;
// }
ConversionPatterns.listType(
repetition,
field.name,

// Here too, we should not use `listOfElements`. (See SPARK-16777)
Types
.buildGroup(repetition).as(LIST)
// "array" is the name chosen by parquet-avro (1.7.0 and prior version)
convertField(StructField("array", elementType, nullable), REPEATED))
.addField(convertField(StructField("array", elementType, nullable), REPEATED))
.named(field.name)

// Spark 1.4.x and prior versions convert MapType into a 3-level group annotated by
// MAP_KEY_VALUE. This is covered by `convertGroupField(field: GroupType): DataType`.
Expand Down

0 comments on commit 5de1737

Please sign in to comment.