Fail to read 2-level structure Parquet

If I have {{"spark.hadoop.parquet.avro.write-old-list-structure", "false"}} explicitly set - to being able to write nulls inside arrays(the only way), Hudi starts to write Parquets with the following schema inside:
 {{   required group internal_list (LIST) \{
    repeated group list {
      required int64 element;
    }
  }}}
 
But if I had some files produced before setting {{{}"spark.hadoop.parquet.avro.write-old-list-structure", "false"{}}}, they have the following schema inside
 {{  required group internal_list (LIST) \{
    repeated int64 array;
  }}}
 
And Hudi 0.14.x at least fails to read records from such file - failing with exception
{{Caused by: java.lang.RuntimeException: Null-value for required field: }}

Even though the contents of arrays is {{{}not null{}}}(it cannot be null in fact since Avro requires {{spark.hadoop.parquet.avro.write-old-list-structure}} = {{false}} to write {{{}null{}}}s.
h3. Expected behavior

Taken from Hudi 0.12.1(not sure what exactly broke that):
 # If I have a file with 2 level structure and update(not matter having nulls inside array or not - both produce the same) arrives with "spark.hadoop.parquet.avro.write-old-list-structure", "false" - overwrite it into 3 level.({*}fails in 0.14.1{*})
 # If I have 3 level structure with nulls and update cames(not matter with nulls or without) - read and write correctly

The simple reproduction of issue can be found here:
[https://github.com/VitoMakarevich/hudi-issue-014]

Highly likely, the problem appeared after Hudi made some changes, so values from Hadoop conf started to propagate into Reader instance(likely they were not propagated before).

## JIRA info

- Link: https://issues.apache.org/jira/browse/HUDI-7874
- Type: Bug

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail to read 2-level structure Parquet #16520

If I have a file with 2 level structure and update(not matter having nulls inside array or not - both produce the same) arrives with "spark.hadoop.parquet.avro.write-old-list-structure", "false" - overwrite it into 3 level.({}fails in 0.14.1{})

If I have 3 level structure with nulls and update cames(not matter with nulls or without) - read and write correctly

JIRA info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fail to read 2-level structure Parquet #16520

Description

If I have a file with 2 level structure and update(not matter having nulls inside array or not - both produce the same) arrives with "spark.hadoop.parquet.avro.write-old-list-structure", "false" - overwrite it into 3 level.({}fails in 0.14.1{})

If I have 3 level structure with nulls and update cames(not matter with nulls or without) - read and write correctly

JIRA info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions