Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON reader can create null struct children on empty lists #825

Closed
nevi-me opened this issue Oct 12, 2021 · 0 comments · Fixed by #826
Closed

JSON reader can create null struct children on empty lists #825

nevi-me opened this issue Oct 12, 2021 · 0 comments · Fixed by #826
Labels

Comments

@nevi-me
Copy link
Contributor

nevi-me commented Oct 12, 2021

Describe the bug

If a list of a nested struct is empty, the JSON reader should not create a null struct value.

To Reproduce

Try to write an array of below:

#[test]
fn json_list_roundtrip() {
    let json_content = r#"
    {"list": [{"ints": 1}]}
    {"list": [{}]}
    {"list": []}
    {"list": null}
    {"list": [{"ints": null}]}
    {"list": [null]}
    "#;
    let ints_struct =
        DataType::Struct(vec![Field::new("ints", DataType::Int32, true)]);
    let list_type = DataType::List(Box::new(Field::new("item", ints_struct, true)));
    let list_field = Field::new("list", list_type, true);
    let schema = Arc::new(Schema::new(vec![list_field]));
    let builder = ReaderBuilder::new().with_schema(schema).with_batch_size(64);
    let mut reader = builder.build(std::io::Cursor::new(json_content)).unwrap();

    let batch = reader.next().unwrap().unwrap();

    let list_row = batch
        .column(0)
        .as_any()
        .downcast_ref::<ListArray>()
        .unwrap();
    let values = list_row.values();
    // the  {"list": []} value gets interpreted as an empty struct, causing the length below to be 5
    assert_eq!(values.len(), 4);
    assert_eq!(values.null_count(), 1);
}

Expected behavior

Empty lists should not create an empty/null struct as this can lead to incorrect writing of data as the list's child length would disagree with the offsets.

Additional context

This relates to #704 but doesn't yet fix it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant