Improve reading of Parquet files with nested schema

blocked by #536 

There is a schema of a Parquet file:
```json
{
  "name": "book",
  "type": "record",
  "fields": [
    {"name": "id", "type": "int"},
    {"name": "title", "type": "string"},
    {
      "name": "author",
      "type": {
        "type": "record",
        "name": "author",
        "fields": [
          {"name": "id", "type": "int"},
          {"name": "firstName", "type": "string"},
          {"name": "lastName", "type": "string"}
        ]
      }
    },
    {"name": "genre", "type": "string"},
    {"name": "publisher", "type": "string"}
  ]
}
```

The field `author` is nested. 
The schema is parsed as `org.apache.avro.Schema` and `AvroParquetWriter` is used to write the Parquet file.
When this file is read with `DataFrame.readParquet()`, the nested field `author` is represented in a DataFrame as a ValueColumn containing a map in each cell:

![dataframe](https://github.com/user-attachments/assets/c11dfdbd-2bec-44f3-b760-377f9b18f9d1)

This kind of fields, however, could be represented as a ColumnGroup.

The Parquet file mentioned above and the Kotlin Notebook in the screenshot are attached below (as a zip archive, as GitHub does not accept .parquet files).

[parquet_file_and_notebook.zip](https://github.com/user-attachments/files/23967545/parquet_file_and_notebook.zip)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve reading of Parquet files with nested schema #1619

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve reading of Parquet files with nested schema #1619

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions