Skip to content
Permalink
Browse files

[SPARK-27080][SQL] bug fix: mergeWithMetastoreSchema with uniform low…

…er case comparison

When reading parquet file with merging metastore schema and file schema, we should compare field names using uniform case. In current implementation, lowercase is used but one omission. And this patch fix it.

Unit test

Closes #24001 from codeborui/mergeSchemaBugFix.

Authored-by: CodeGod <>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit a29df5f)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
  • Loading branch information...
CodeGod authored and cloud-fan committed Mar 9, 2019
1 parent c45f8da commit b6d5b0a6347faf4dd95321c9646e78d8bb6bb00d
@@ -282,7 +282,7 @@ private[hive] object HiveMetastoreCatalog {
// Merge missing nullable fields to inferred schema and build a case-insensitive field map.
val inferredFields = StructType(inferredSchema ++ missingNullables)
.map(f => f.name.toLowerCase -> f).toMap
StructType(metastoreSchema.map(f => f.copy(name = inferredFields(f.name).name)))
StructType(metastoreSchema.map(f => f.copy(name = inferredFields(f.name.toLowerCase).name)))
} catch {
case NonFatal(_) =>
val msg = s"""Detected conflicting schemas when merging the schema obtained from the Hive
@@ -262,6 +262,32 @@ class HiveSchemaInferenceSuite
StructType(Seq(StructField("lowerCase", BinaryType))))
}

// Parquet schema is subset of metaStore schema and has uppercase field name
assertResult(
StructType(Seq(
StructField("UPPERCase", DoubleType, nullable = true),
StructField("lowerCase", BinaryType, nullable = true)))) {

HiveMetastoreCatalog.mergeWithMetastoreSchema(
StructType(Seq(
StructField("UPPERCase", DoubleType, nullable = true),
StructField("lowerCase", BinaryType, nullable = true))),

StructType(Seq(
StructField("lowerCase", BinaryType, nullable = true))))
}

// Metastore schema contains additional nullable fields.
assert(intercept[Throwable] {
HiveMetastoreCatalog.mergeWithMetastoreSchema(
StructType(Seq(
StructField("UPPERCase", DoubleType, nullable = false),
StructField("lowerCase", BinaryType, nullable = true))),

StructType(Seq(
StructField("lowerCase", BinaryType, nullable = true))))
}.getMessage.contains("Detected conflicting schemas"))

// Check that merging missing nullable fields works as expected.
assertResult(
StructType(Seq(

0 comments on commit b6d5b0a

Please sign in to comment.
You can’t perform that action at this time.