-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Spark3.1 uses Avro 1.8.2, where Avro schema resolution on any types that are allowed to have defined namespaces are strictly-matched. i.e. fields are resolved using their fully qualified name.
This means that namespaces must match-up for reader and writer schema. However, when ALTER-TABLE-NAME-DLL is performed, the tableName in hoodie.properties is changed. The Avro schema that is generated is from the requiredSchema struct is hence different for both reader and writer schema (although the field names and types are the same).
This will lead to read errors, when there are log files when performing ALTER-TABLE-NAME-DLL.
{code:java}
test("Test rename table") {
withTempDir { tmp =>
// Create table with INMEMORY index to generate log only mor table.
val tableName = generateTableName
spark.sql(
s"""
|create table $tableName (
| id int,
| name string,
| price decimal(20,0),
| ts long
|) using hudi
| location '${tmp.getCanonicalPath}'
| tblproperties (
| primaryKey ='id',
| type = 'mor',
| preCombineField = 'ts',
| hoodie.index.type = 'INMEMORY',
| hoodie.compact.inline = 'true'
| )
""".stripMargin)
spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000),(2, 'a2', 10, 1000),(3, 'a3', 10, 1000)")
spark.sql(s"ALTER TABLE $tableName rename to h0NewTableName")
spark.sql(s"insert into h0NewTableName values(2, 'a1', 10, 1001),(2, 'a2', 10, 1000),(3, 'a3', 10, 1000)")
spark.sql(s"select id, name, price, ts from h0NewTableName order by id").show(false)
}
} {code}
Spark3.2 will not have this issue as it uses Avro 1.10.2. Avro schema resolution will resolve fields using their unqualified name.
JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-6877
- Type: Bug