Skip to content

Fix unqualified namespace issues in Spark3.1 #16237

@hudi-bot

Description

@hudi-bot

Spark3.1 uses Avro 1.8.2, where Avro schema resolution on any types that are allowed to have defined namespaces are strictly-matched. i.e. fields are resolved using their fully qualified name. 

 

This means that namespaces must match-up for reader and writer schema. However, when ALTER-TABLE-NAME-DLL is performed, the tableName in hoodie.properties is changed. The Avro schema that is generated is from the requiredSchema struct is hence different for both reader and writer schema (although the field names and types are the same).

 

This will lead to read errors, when there are log files when performing ALTER-TABLE-NAME-DLL.

 
{code:java}
test("Test rename table") {
  withTempDir { tmp =>
    // Create table with INMEMORY index to generate log only mor table.
    val tableName = generateTableName
    spark.sql(
      s"""
         |create table $tableName (
         |  id int,
         |  name string,
         |  price decimal(20,0),
         |  ts long
         |) using hudi
         | location '${tmp.getCanonicalPath}'
         | tblproperties (
         |  primaryKey ='id',
         |  type = 'mor',
         |  preCombineField = 'ts',
         |  hoodie.index.type = 'INMEMORY',
         |  hoodie.compact.inline = 'true'
         | )
     """.stripMargin)
    spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000),(2, 'a2', 10, 1000),(3, 'a3', 10, 1000)")
    spark.sql(s"ALTER TABLE $tableName rename to h0NewTableName")
    spark.sql(s"insert into h0NewTableName values(2, 'a1', 10, 1001),(2, 'a2', 10, 1000),(3, 'a3', 10, 1000)")
    spark.sql(s"select id, name, price, ts from h0NewTableName order by id").show(false)
  }
} {code}
 

Spark3.2 will not have this issue as it uses Avro 1.10.2. Avro schema resolution will resolve fields using their unqualified name.

JIRA info

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions