Remove `null_count` from ArrayData::try_new() #911

alamb · 2021-11-04T19:14:53Z

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Re #817 ; As @jhorstmann pointed out on #810 at https://github.com/apache/arrow-rs/pull/810/files#r742905920

ArrayData::try_new() function includes an optional null_count argument and a validity buffer. If these numbers differ, wrong results can occur (and maybe also undefined behavior) in some later operation.

Note that as part of #817 we will be validating that the bitmaps are consistent with the declared null_count

Describe the solution you'd like

Since most callers pass None anyway at which point we calculate the number, if we simply removed the option to pass in an inconsistent value we would avoid a class of inconsistencies I would suggest to remove the null_count parameter from try_new to completely avoid of inconsistencies.

Describe alternatives you've considered
N/A

Additional context
Add any other context or screenshots about the feature request here.

The text was updated successfully, but these errors were encountered:

HaoYang670 · 2022-05-16T09:28:24Z

If we need to implement this, I can have a try!

alamb · 2022-05-16T11:46:04Z

If we need to implement this, I can have a try!

I think it is still worth trying 👍

HaoYang670 · 2022-05-17T13:26:25Z

Something I find is that we have to move the check of null_bit_buffer size from ArrayData::validate to ArrayData::new_unchecked:

    pub unsafe fn new_unchecked(
        data_type: DataType,
        len: usize,
        null_bit_buffer: Option<Buffer>,
        offset: usize,
        buffers: Vec<Buffer>,
        child_data: Vec<ArrayData>,
    ) -> Self {
...
        if let Some(null_bit_map) = null_bitmap.as_ref() {
            let null_bit_buffer = null_bit_map.buffer_ref();
            let needed_len = bit_util::ceil(len + offset, 8);
            if null_bit_buffer.len() < needed_len {
                return Err(ArrowError::InvalidArgumentError(format!(
                    "null_bit_buffer size too small. got {} needed {}",
                    null_bit_buffer.len(),
                    needed_len
                )));
            }
        }
        let null_count = count_nulls(null_bit_buffer.as_ref(), offset, len);
...

Otherwise, we may panic if the null_bit_buffer is too short.

HaoYang670 · 2022-05-17T13:34:26Z

I find the above bug in the test :

    #[test]
    #[should_panic(expected = "null_bit_buffer size too small. got 1 needed 2")]
    fn test_bitmap_too_small() {
        let buffer = make_i32_buffer(9);
        let null_bit_buffer = Buffer::from(vec![0b11111111]);

        ArrayData::try_new(
            DataType::Int32,
            9,
            Some(0),
            Some(null_bit_buffer),
            0,
            vec![buffer],
            vec![],
        )
        .unwrap();
    }

Here we will not calculate the null count in new_unchecked because we have provided a null count Some(0). And when we validate the array data, we will get the error ArrayError(null_bit_buffer size too small. ...).

However, if we set null_count to be None in this test, we will panic at range end index 2 out of range for slice of length 1 when calling count_nulls(null_bit_buffer.as_ref(), offset, len) in new_unchecked.

HaoYang670 · 2022-05-17T13:40:54Z

Personally, my suggestions are:

move the check of the size of null_bit_buffer into new_unchecked.
Change the return type of new_unchecked from Self to Result<Self>.

alamb · 2022-05-17T20:07:28Z

cc @tustvold

tustvold · 2022-05-17T22:01:59Z

I think I'm missing something, new_unchecked is unsafe because it trusts the caller. Making it fallible / perform validation seems counter to that, and I'm not sure I follow why it is necessary? Isn't the issue in try_new?

HaoYang670 · 2022-05-18T01:11:42Z

Isn't the issue in try_new?

Sorry, my mistake. I think you are right.
Maybe we should modify ArrayData::try_new like that:

    pub fn try_new(
        ...
    ) -> Result<Self> {
        check_size_of_null_bit_buffer()?

        let new_self = unsafe {
            Self::new_unchecked(
                ...
            )
        };

        // As the data is not trusted, do a full validation of its contents
        new_self.validate_full()?;
        Ok(new_self)
    }

HaoYang670 · 2022-05-18T03:42:31Z

I will fix this in #1707

alamb · 2022-05-23T11:23:35Z

🎉

alamb added arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog api-change Changes to the arrow API labels Nov 4, 2021

alamb mentioned this issue Nov 4, 2021

Validate arguments to ArrayData::new and null bit buffer and buffers #810

Merged

2 tasks

HaoYang670 mentioned this issue May 17, 2022

ArrayData::try_new cannot always return expected error. #1707

Closed

HaoYang670 mentioned this issue May 22, 2022

Remove null_count from ArrayData::try_new() #1721

Merged

viirya closed this as completed in #1721 May 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove `null_count` from ArrayData::try_new() #911

Remove `null_count` from ArrayData::try_new() #911

alamb commented Nov 4, 2021

HaoYang670 commented May 16, 2022

alamb commented May 16, 2022

HaoYang670 commented May 17, 2022 •

edited

Loading

HaoYang670 commented May 17, 2022

HaoYang670 commented May 17, 2022

alamb commented May 17, 2022

tustvold commented May 17, 2022 •

edited

Loading

HaoYang670 commented May 18, 2022 •

edited

Loading

HaoYang670 commented May 18, 2022

alamb commented May 23, 2022

Remove null_count from ArrayData::try_new() #911

Remove null_count from ArrayData::try_new() #911

Comments

alamb commented Nov 4, 2021

HaoYang670 commented May 16, 2022

alamb commented May 16, 2022

HaoYang670 commented May 17, 2022 • edited Loading

HaoYang670 commented May 17, 2022

HaoYang670 commented May 17, 2022

alamb commented May 17, 2022

tustvold commented May 17, 2022 • edited Loading

HaoYang670 commented May 18, 2022 • edited Loading

HaoYang670 commented May 18, 2022

alamb commented May 23, 2022

Remove `null_count` from ArrayData::try_new() #911

Remove `null_count` from ArrayData::try_new() #911

HaoYang670 commented May 17, 2022 •

edited

Loading

tustvold commented May 17, 2022 •

edited

Loading

HaoYang670 commented May 18, 2022 •

edited

Loading