[Rust] [Experiment] [WIP]: Use SmallVec in ArrayData to optimize the common usecase of single-buffer arrays #9330

jhorstmann · 2021-01-26T19:00:48Z

Another experiement for speeding up ArrayData. Most array types only contain a single buffer (except for the validity buffer) and also at most one child data object. I think the only types this does not apply to are struct and union arrays. Using a SmallVec trades one comparison against an allocation and the resulting indirection in those cases.

The ArrayData::new method is left in place unchanged for compatibility.

TODO:

Use the optimized new_smallvec method in kernels whenever applicable
Run all benchmarks on a non-throttling machine

@jorgecarleitao @Dandandan this is also related to the discussion on #9271. I tried this initially some month ago and the benchmarks were not conclusive, but might be worth trying again or in combination with removing the Arc indirection.

…ntaining only one buffer

github-actions · 2021-01-26T19:12:32Z

Thanks for opening a pull request!

Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

See also:

alamb · 2021-01-29T12:24:57Z

rust/arrow/src/array/data.rs

@@ -225,11 +226,11 @@ pub struct ArrayData {
    /// The buffers for this array data. Note that depending on the array types, this
    /// could hold different kinds of buffers (e.g., value buffer, value offset buffer)
    /// at different positions.
-    buffers: Vec<Buffer>,
+    buffers: SmallVec<[Buffer; 1]>,


I wonder if using something like a specific enum might also work (and be more memory efficient) as well:

pub enum ArrayBuffer { One(ArrayDataRef), Two(ArrayDataRef, ArrayDataRef), Many(Vec<ArrayDataRef>) }

I honestly don't know what the maximum number of buffers any specific ArrayData can have,

fyi, max is 3 including the null buffer, so max is 2 in this context. See MutableDataArray, where we use two buffers on the stack to avoid the overhead. When unused, a buffer takes something like a usize + a pointer.

Good idea! I haven't checked the memory layout of both, but in the common case of 1 buffer this could inline better (in the absence of LTO). Since we already have the validity bitmap in a separate Option, the variants are Zero, One and Many.

alamb · 2021-03-03T11:29:27Z

@jhorstmann I am closing this PR for the time being to clean up the Rust/Arrow PR backlog. Please let me know if this is a mistake

Use SmallVec in ArrayData to optimize the common usecase of arrays co…

725cce0

…ntaining only one buffer

github-actions bot added the Component: Rust label Jan 26, 2021

alamb reviewed Jan 29, 2021

View reviewed changes

jhorstmann added 3 commits February 2, 2021 10:17

Benchmark for slicing nested array

083f147

Replace SmallVec with custom enum

f79ee54

Simplify slicing for single element

75ca9fa

jorgecarleitao force-pushed the master branch from d4608a9 to 356c300 Compare February 14, 2021 12:09

alamb closed this Mar 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Rust] [Experiment] [WIP]: Use SmallVec in ArrayData to optimize the common usecase of single-buffer arrays #9330

[Rust] [Experiment] [WIP]: Use SmallVec in ArrayData to optimize the common usecase of single-buffer arrays #9330

jhorstmann commented Jan 26, 2021

github-actions bot commented Jan 26, 2021

alamb Jan 29, 2021

jorgecarleitao Jan 29, 2021

jhorstmann Feb 2, 2021

alamb commented Mar 3, 2021

[Rust] [Experiment] [WIP]: Use SmallVec in ArrayData to optimize the common usecase of single-buffer arrays #9330

[Rust] [Experiment] [WIP]: Use SmallVec in ArrayData to optimize the common usecase of single-buffer arrays #9330

Conversation

jhorstmann commented Jan 26, 2021

github-actions bot commented Jan 26, 2021

alamb Jan 29, 2021

Choose a reason for hiding this comment

jorgecarleitao Jan 29, 2021

Choose a reason for hiding this comment

jhorstmann Feb 2, 2021

Choose a reason for hiding this comment

alamb commented Mar 3, 2021