-
Notifications
You must be signed in to change notification settings - Fork 77
Closed
Labels
enhancementNew feature or requestNew feature or request
Milestone
Description
blocked by #536
There is a schema of a Parquet file:
{
"name": "book",
"type": "record",
"fields": [
{"name": "id", "type": "int"},
{"name": "title", "type": "string"},
{
"name": "author",
"type": {
"type": "record",
"name": "author",
"fields": [
{"name": "id", "type": "int"},
{"name": "firstName", "type": "string"},
{"name": "lastName", "type": "string"}
]
}
},
{"name": "genre", "type": "string"},
{"name": "publisher", "type": "string"}
]
}The field author is nested.
The schema is parsed as org.apache.avro.Schema and AvroParquetWriter is used to write the Parquet file.
When this file is read with DataFrame.readParquet(), the nested field author is represented in a DataFrame as a ValueColumn containing a map in each cell:
This kind of fields, however, could be represented as a ColumnGroup.
The Parquet file mentioned above and the Kotlin Notebook in the screenshot are attached below (as a zip archive, as GitHub does not accept .parquet files).
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request
