Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FixedSizeBinaryArray::try_from_sparse_iter failed when given all Nones #1390

Closed
jamescorbett opened this issue Mar 3, 2022 · 10 comments · Fixed by #1405 or #3054
Closed

FixedSizeBinaryArray::try_from_sparse_iter failed when given all Nones #1390

jamescorbett opened this issue Mar 3, 2022 · 10 comments · Fixed by #1405 or #3054
Labels
arrow Changes to the arrow crate bug good first issue Good for newcomers

Comments

@jamescorbett
Copy link

jamescorbett commented Mar 3, 2022

Describe the bug

try_from_sparse_iter expects at least one item to have Some to determine the size. This is a bad assumption as it may recieve all Nones.

InvalidArgumentError("column types must match schema types, expected FixedSizeBinary(32) but found FixedSizeBinary(0) at column index 0"

To Reproduce

        let none_option: Option<[u8; 32]> = None;
        let item = arrow::array::FixedSizeBinaryArray::try_from_sparse_iter(
            vec![none_option, none_option, none_option].into_iter(),
        )
        .unwrap();
       let rb = arrow::record_batch::RecordBatch::try_new(
            arrow_schema.clone(),
            vec![std::sync::Arc::new(item)],
        )
        .unwrap();

Expected behavior

A record batch with a colum of all nulls.

Additional context

I would suggest fixing it by changing the behaviour of defaulting in array_binary.rs:560

let size = size.unwrap_or(0);

or changing the schema validation to handle a 0 size if the array is empty.

@alamb alamb added good first issue Good for newcomers arrow Changes to the arrow crate labels Mar 3, 2022
@alamb
Copy link
Contributor

alamb commented Mar 3, 2022

Thank you for the report @jamescorbett

@alamb alamb changed the title try_from_sparse_iter failed when given all Nones FixedSizeBinaryArray::try_from_sparse_iter failed when given all Nones Mar 3, 2022
@jackwener
Copy link
Member

Has it already been fixed?

fn test_fixed_size_binary_array_from_sparse_iter() {
    let none_option: Option<[u8; 32]> = None;
    let input_arg = vec![none_option, none_option, none_option, none_option];
    let arr =
        FixedSizeBinaryArray::try_from_sparse_iter(input_arg.into_iter()).unwrap();
    assert_eq!(0, arr.value_length());
    assert_eq!(4, arr.len())
}

I create and run this unit test, and it is ok.

@alamb
Copy link
Contributor

alamb commented Mar 6, 2022

@jackwener one thing I noticed that that function isn't marked with #[test] (not sure if you did that locally or not).

#[test]
fn test_fixed_size_binary_array_from_sparse_iter() {
...
}

Is there any chance you can put up a PR with the test? That way we can close this ticket -- as 'already fixed' if it passes

@jamescorbett
Copy link
Author

Has it already been fixed?

fn test_fixed_size_binary_array_from_sparse_iter() {
    let none_option: Option<[u8; 32]> = None;
    let input_arg = vec![none_option, none_option, none_option, none_option];
    let arr =
        FixedSizeBinaryArray::try_from_sparse_iter(input_arg.into_iter()).unwrap();
    assert_eq!(0, arr.value_length());
    assert_eq!(4, arr.len())
}

I create and run this unit test, and it is ok.

The issue is not this. The error occurs when using the schema.

@jackwener
Copy link
Member

jackwener commented Mar 6, 2022

I reproduce it successfully.

Error case

    #[test]
    fn test() {
        let schema =
            Schema::new(vec![Field::new("a", DataType::FixedSizeBinary(2), false)]);
        let none_option: Option<[u8; 2]> = None;
        let item = FixedSizeBinaryArray::try_from_sparse_iter(
            vec![none_option, none_option, none_option].into_iter(),
        )
        .unwrap();
        let rb = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(item)]).unwrap();
    }

correct example

#[test]
    fn test() {
        let schema =
            Schema::new(vec![Field::new("a", DataType::FixedSizeBinary(2), false)]);
        let item = FixedSizeBinaryArray::try_from_sparse_iter(
            vec![
                None,
                Some(vec![7, 8]),
                Some(vec![9, 10]),
                None,
                Some(vec![13, 14]),
            ]
            .into_iter(),
        )
        .unwrap();
        let rb = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(item)]).unwrap();
    }

@jackwener
Copy link
Member

jackwener commented Mar 6, 2022

The essence of the problem is that we can't get the the type of None. We need Some() to determine None type.

@jamescorbett
Copy link
Author

The essence of the problem is that we can't get the the type None. We need Some() to determine None type.

Or the schema doesn't validate size for all none array

@jamescorbett
Copy link
Author

Please reopen this issue. The pr does not fix the problem. See Jack's comment above for a test reproducing the issue

@alamb alamb reopened this Mar 7, 2022
@alamb
Copy link
Contributor

alamb commented Mar 7, 2022

Sorry @jamescorbett -- Github got overly excited and automatically closed the PR

@alamb
Copy link
Contributor

alamb commented Apr 12, 2022

Here is a PR to add the test: #1551

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate bug good first issue Good for newcomers
Projects
None yet
3 participants