Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
ARROW-2744: [C++] Avoid creating list arrays with a null values buffer #2243
@@ Coverage Diff @@ ## master #2243 +/- ## ========================================== + Coverage 84.34% 86.78% +2.44% ========================================== Files 281 237 -44 Lines 43760 41931 -1829 ========================================== - Hits 36909 36390 -519 + Misses 6820 5541 -1279 + Partials 31 0 -31
referenced this pull request
Jul 12, 2018
wesm left a comment
As a matter of usability, I don't know that we should expect all public API users to create a length-0 buffer when there is no data. I believe that code that interacts with the buffers in an array needs to treat length-0 and null equivalently.
A possibly extreme approach to resolve the issue would be to have a length-0 singleton
Allowing the result of
Another side of this is that for validity bitmaps, it would be incorrect to return a length-0 buffer in the event that there are no nulls, but right now we permit that buffer to be null. In Java they allocate an array of all set bits, which I don't think we should do. So any way you slice it, some code will have to deal with the null buffer case.
My gut feeling is that we should allow the null buffers and document the issue well so that users can defend themselves from untrusted data.
I don't think this PR is doing that, except in
I agree. The parquet-cpp issue was already fixed in apache/parquet-cpp#474
However, I think it's also safer to ensure that we don't generate such buffers unwillingly. I don't think it was deliberate for
Yes, I agree for validity bitmaps code will have to deal with it. For actual values it is a bit unexpected, though (at least the person who wrote the parquet-cpp code clearly didn't expect it :-)).
Where would you document it? in ArrayData?
What would you say to adding an option to
I think in the APIs where
I'm OK with doing this later. I'll give this another review and merge since I don't think it does anything problematic