Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-10448: [Rust] Remove PrimitiveArray::new that can cause UB #8560

Closed
wants to merge 1 commit into from
Closed

ARROW-10448: [Rust] Remove PrimitiveArray::new that can cause UB #8560

wants to merge 1 commit into from

Conversation

jorgecarleitao
Copy link
Member

This PR removes PrimitiveArray::new. PrimitiveArray::new is pub, but it is dangerous because:

  • when the buffer's content is not aligned with T, many of its methods cause UB
  • when used with null_count > 0, many calls panic as the null bitmap is None, but the null_count != 0
  • when used with null_count > 0, it creates an array out of spec (as the buffer for the null bitmap is None but the null count is not zero)

Since:

  • a change in this method's signature (to either add the bitmap or remove null_count) requires a backward incompatible change
  • it is only used in tests
  • we have good offers to create primitive arrays:
    • from an ArrayData,
    • from a vector or vector of optionals
    • from an iterator

This PR removes it.

@jorgecarleitao jorgecarleitao changed the title ARROW-10448: [Rust] Remove PrimitiveArray::new. ARROW-10448: [Rust] Remove PrimitiveArray::new that is out of spec Oct 31, 2020
@jorgecarleitao jorgecarleitao changed the title ARROW-10448: [Rust] Remove PrimitiveArray::new that is out of spec ARROW-10448: [Rust] Remove PrimitiveArray::new that causes UB Oct 31, 2020
@github-actions
Copy link

@jorgecarleitao jorgecarleitao changed the title ARROW-10448: [Rust] Remove PrimitiveArray::new that causes UB ARROW-10448: [Rust] Remove PrimitiveArray::new that can cause UB Nov 1, 2020
@nevi-me nevi-me self-requested a review November 3, 2020 22:24
@nevi-me
Copy link
Contributor

nevi-me commented Nov 3, 2020

I think we have a similar problem with building arrays from ArrayDataBuilder because when building ArrayData, we don't check if the length of the array corresponds with the buffer's length (or if it's even specified).

So something like the below gets created, but ends up as a 0-len array

// try build array data without specifying length
ArrayData::builder(DataType::_).buffers(vec![buffer1, buffer2]).build()

@jorgecarleitao
Copy link
Member Author

I think we have a similar problem with building arrays from ArrayDataBuilder because when building ArrayData, we don't check if the length of the array corresponds with the buffer's length (or if it's even specified).

So something like the below gets created, but ends up as a 0-len array

// try build array data without specifying length
ArrayData::builder(DataType::_).buffers(vec![buffer1, buffer2]).build()

That is a very good point.

I think that we have a trade-off here: do we check that the ArrayData is according to spec (and incur the testing cost), or do we not check and live with potential UB? We can have APIs for a safe and unsafe, but we still need to decide what is public and what is not :P

This could be a good topic for the mailing list, as it is a general question, specially for IPC and the c data interface.

@alamb
Copy link
Contributor

alamb commented Nov 6, 2020

Is it the case that "UB" means "Undefined Behavior"?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good improvement to me. 👍 It also makes it clearer to people how to construct arrays (aka use the Builders) rather than also potentially using PrimitiveArray::new directly

Copy link
Contributor

@nevi-me nevi-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nevi-me nevi-me closed this in 4dfbc8b Nov 7, 2020
@jorgecarleitao jorgecarleitao deleted the remove_new branch November 7, 2020 13:14
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
This PR removes `PrimitiveArray::new`. `PrimitiveArray::new` is `pub`, but it is dangerous because:
* when the buffer's content is not aligned with T, many of its methods cause UB
* when used with `null_count > 0`, many calls panic as the null bitmap is `None`, but the `null_count != 0`
* when used with `null_count > 0`, it creates an array out of spec (as the buffer for the null bitmap is `None` but the null count is not zero)

Since:
* a change in this method's signature (to either add the bitmap or remove `null_count`) requires a backward incompatible change
* it is only used in tests
* we have good offers to create primitive arrays:
    * from an ArrayData,
    * from a vector or vector of optionals
    * from an iterator

This PR removes it.

Closes apache#8560 from jorgecarleitao/remove_new

Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Signed-off-by: Neville Dipale <nevilledips@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants