Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rust] [Parquet] List schema to Arrow parser misinterpreting child nullability #27172

Closed
asfimport opened this issue Jan 16, 2021 · 1 comment
Assignees
Milestone

Comments

@asfimport
Copy link

We currently do not propagate child nullability correctly when reading parquet files from Spark 3.0.1 (parquet-mr 1.10.1).

For example, the below taken from https://github.com/apache/parquet-format/blob/master/LogicalTypes.md is currently interpreted incorrectly:

 

// List<String> (list nullable, elements non-null) 
optional group my_list (LIST) {
    repeated group list { 
        required binary element (UTF8); 
    } 
}

The Arrow type should be:

Field::new(
    "my_list",
    DataType::List(
        box Field::new("element", DataType::Utf8, nullable: false),
    ),
    nullable: true
)

but we currently end up with 

Field::new(
   "my_list",
   DataType::List(
       box Field::new("list", DataType::Utf8, nullable: true),
   ),
   nullable: true
)

This doesn't seem to be an issue with the master branch as of opening this issue, so it might not be severe enough to try force into the 3.0.0 release.

I tested null and non-null Spark files, and was able to read them correctly. This becomes an issue with nested lists, which I'm working on.

 

Reporter: Neville Dipale / @nevi-me
Assignee: Neville Dipale / @nevi-me

PRs and other links:

Note: This issue was originally created as ARROW-11271. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Andrew Lamb / @alamb:
Issue resolved by pull request 9216
#9216

@asfimport asfimport added this to the 3.0.0 milestone Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants