Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arrow-arith is_null is buggy with NullArray #4565

Closed
smiklos opened this issue Jul 24, 2023 · 3 comments · Fixed by #4566
Closed

arrow-arith is_null is buggy with NullArray #4565

smiklos opened this issue Jul 24, 2023 · 3 comments · Fixed by #4566
Labels
arrow Changes to the arrow crate bug good first issue Good for newcomers help wanted

Comments

@smiklos
Copy link
Contributor

smiklos commented Jul 24, 2023

Describe the bug

When using the arith kernel's is_null or is_not_null on arrays of type NullArray (with datatype Null) the .nulls() buffer is always empty for these types. Therefor the kernel returns an array with all values false/true which is the exact opposite of what should be returned.

To Reproduce

Noticed it here apache/datafusion#7038

Run this unit test in datafusion/is_null.rs

   fn is_null_op_allnull() -> Result<()> {
        let schema = Schema::new(vec![Field::new("a", DataType::Null, true)]);
        let mut a = NullBuilder::with_capacity(1);

        // expression: "a is null"
        let expr = is_null(col("a", &schema)?).unwrap();
        let batch = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(a.finish())])?;

        let result = expr.evaluate(&batch)?.into_array(batch.num_rows());
        let result =
            as_boolean_array(&result).expect("failed to downcast to BooleanArray");

        let expected = &BooleanArray::from(vec![true]);

        assert_eq!(expected, result);

        Ok(())
    } 

Expected behavior

for is_null I expect a BoolArray of true values for NullArray input (irrespective of length).
for is_not_null I expect a BoolArray of false values for NullArray input (irrespective of length).

Additional context

@smiklos smiklos added the bug label Jul 24, 2023
@smiklos
Copy link
Contributor Author

smiklos commented Jul 24, 2023

I can create a pr for this with my solution but was curious if the NullArray::nulls should return a nullbuffer based on self length or if the kernel should be made Null aware

@tustvold
Copy link
Contributor

tustvold commented Jul 24, 2023

I think the kernel should special case NullArray, as the arrow specification states that a NullArray should not have a null buffer

@tustvold tustvold added the arrow Changes to the arrow crate label Jul 30, 2023
@tustvold
Copy link
Contributor

label_issue.py automatically added labels {'arrow'} from #4566

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate bug good first issue Good for newcomers help wanted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants