Huge amount of llvm code generated by comparison kernels, potentially slowing compile times #1858

jhorstmann · 2022-06-13T09:20:32Z

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Incremental compile times feel like they have gotten slower some time ago. While investigating I notices that a large amount of llvm code is generated for the comparison kernels, especially dyn and dict versions. These end up calling the generic BooleanArray::from_iter method which therefore gets duplicated for each possible iterator type.

$ cargo llvm-lines | head -n30
   Compiling arrow v16.0.0 (/home/jhorstmann/Source/github/apache/arrow-rs/arrow)
    Finished dev [unoptimized + debuginfo] target(s) in 35.58s
  Lines           Copies        Function name
  -----           ------        -------------
  3156218 (100%)  73225 (100%)  (TOTAL)
   292166 (9.3%)   1164 (1.6%)  <arrow::array::array_boolean::BooleanArray as core::iter::traits::collect::FromIterator<Ptr>>::from_iter
   112200 (3.6%)   1545 (2.1%)  core::iter::traits::iterator::Iterator::fold
    92112 (2.9%)    912 (1.2%)  arrow::compute::kernels::comparison::cmp_dict
    85538 (2.7%)   1236 (1.7%)  <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}}
    84972 (2.7%)   1164 (1.6%)  <arrow::array::array_boolean::BooleanArray as core::iter::traits::collect::FromIterator<Ptr>>::from_iter::{{closure}}
    83903 (2.7%)   1478 (2.0%)  core::iter::adapters::map::map_fold::{{closure}}
    80639 (2.6%)   1533 (2.1%)  core::option::Option<T>::map
    80304 (2.5%)    912 (1.2%)  arrow::compute::kernels::comparison::cmp_dict::{{closure}}
    78427 (2.5%)   1478 (2.0%)  <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
    65942 (2.1%)   1236 (1.7%)  <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold
    59070 (1.9%)   1517 (2.1%)  core::iter::traits::iterator::Iterator::for_each
    39108 (1.2%)   2360 (3.2%)  core::iter::adapters::map::Map<I,F>::new
    38034 (1.2%)    137 (0.2%)  arrow::util::trusted_len::trusted_len_unzip
    37336 (1.2%)   1511 (2.1%)  core::iter::traits::iterator::Iterator::for_each::call::{{closure}}
    33218 (1.1%)     62 (0.1%)  core::slice::sort::partition_in_blocks
    29970 (0.9%)     81 (0.1%)  arrow::compute::kernels::take::take_primitive
    27904 (0.9%)   2360 (3.2%)  core::iter::traits::iterator::Iterator::map
    27631 (0.9%)    249 (0.3%)  <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::next
    27392 (0.9%)    202 (0.3%)  <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::size_hint
    26906 (0.9%)    140 (0.2%)  arrow::array::array_primitive::PrimitiveArray<T>::from_trusted_len_iter
    26764 (0.8%)    136 (0.2%)  arrow::buffer::mutable::MutableBuffer::try_from_trusted_len_iter
    25676 (0.8%)    277 (0.4%)  core::iter::traits::iterator::Iterator::try_fold
    22117 (0.7%)    257 (0.4%)  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
    19091 (0.6%)    215 (0.3%)  <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::next
    18150 (0.6%)    415 (0.6%)  core::option::Option<T>::ok_or_else
    17976 (0.6%)    168 (0.2%)  arrow::buffer::mutable::MutableBuffer::from_trusted_len_iter_bool
    17430 (0.6%)   1478 (2.0%)  core::iter::adapters::map::map_fold

Removing the dyn and dict comparison kernels gives the following number of lines (but this is of course not a realistic option):

$ cargo llvm-lines | head -n30
   Compiling arrow v16.0.0 (/home/jhorstmann/Source/github/apache/arrow-rs/arrow)

  Lines           Copies        Function name
  -----           ------        -------------
  1737110 (100%)  41764 (100%)  (TOTAL)
    49753 (2.9%)    939 (2.2%)  core::option::Option<T>::map
    38034 (2.2%)    137 (0.3%)  arrow::util::trusted_len::trusted_len_unzip
    33218 (1.9%)     62 (0.1%)  core::slice::sort::partition_in_blocks
    30112 (1.7%)    385 (0.9%)  core::iter::traits::iterator::Iterator::fold
    29970 (1.7%)     81 (0.2%)  arrow::compute::kernels::take::take_primitive
    26906 (1.5%)    140 (0.3%)  arrow::array::array_primitive::PrimitiveArray<T>::from_trusted_len_iter
    26764 (1.5%)    136 (0.3%)  arrow::buffer::mutable::MutableBuffer::try_from_trusted_len_iter
    25676 (1.5%)    277 (0.7%)  core::iter::traits::iterator::Iterator::try_fold
    22117 (1.3%)    257 (0.6%)  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
    19091 (1.1%)    215 (0.5%)  <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::next
    18337 (1.1%)    318 (0.8%)  core::iter::adapters::map::map_fold::{{closure}}
    16947 (1.0%)    318 (0.8%)  <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
    16576 (1.0%)     64 (0.2%)  arrow::compute::kernels::cast::pack_numeric_to_dictionary
    16177 (0.9%)    261 (0.6%)  <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
    15683 (0.9%)     41 (0.1%)  <arrow::array::array_string::GenericStringArray<OffsetSize> as core::iter::traits::collect::FromIterator<core::option::Option<Ptr>>>::from_iter
    14998 (0.9%)    838 (2.0%)  core::iter::adapters::map::Map<I,F>::new
    14586 (0.8%)     86 (0.2%)  arrow::buffer::mutable::MutableBuffer::from_trusted_len_iter
    14551 (0.8%)     96 (0.2%)  arrow::buffer::mutable::MutableBuffer::extend_from_iter
    14145 (0.8%)     77 (0.2%)  <arrow::array::array_primitive::PrimitiveArray<T> as core::iter::traits::collect::FromIterator<Ptr>>::from_iter
    13912 (0.8%)     39 (0.1%)  arrow::array::array::print_long_array
    13830 (0.8%)    357 (0.9%)  core::iter::traits::iterator::Iterator::for_each
    13743 (0.8%)     27 (0.1%)  arrow::compute::kernels::filter::filter_primitive
    13726 (0.8%)     62 (0.1%)  core::slice::sort::partition
    13125 (0.8%)     98 (0.2%)  <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::try_fold::{{closure}}
    12256 (0.7%)     64 (0.2%)  arrow::array::builder::PrimitiveDictionaryBuilder<K,V>::append
    11804 (0.7%)     62 (0.1%)  core::slice::sort::partition_equal
    11215 (0.6%)     88 (0.2%)  <arrow::buffer::immutable::Buffer as core::iter::traits::collect::FromIterator<T>>::from_iter

This seems to indicate that nearly half of the llvm code is caused by the comparison kernels.

Describe the solution you'd like

I'm not sure whether there is a good solution for this, but wanted to document these findings.

The kernels use macros very extensively, but replacing these with generic functions will probably lead to the same amount of llvm lines because of monomorphization. Maybe some of the non-generic common code could be extracted into helper methods.

Describe alternatives you've considered

The dyn kernels could also be put behind a feature flag, but that would again complicate the test matrix.

The text was updated successfully, but these errors were encountered:

nevi-me · 2022-06-13T22:35:54Z

Comparison from the last time this work was done #715 (comment)

tustvold · 2022-07-28T12:10:56Z

It occurs to me that #2038 might actually help with this, as it reduces the amount of logic in the FromIterator implementation, and delegates it to the Builder

tustvold · 2023-01-19T10:36:08Z

I'm closing as these comparison kernels are now gated with feature flags, and crates can also choose not to depend on arrow-ord at all. Feel free to re-open if there is additional work to be done in this area

jhorstmann added the enhancement Any new improvement worthy of a entry in the changelog label Jun 13, 2022

tustvold mentioned this issue Jul 26, 2022

Quite long compilation time #2170

Closed

Marwes mentioned this issue Aug 8, 2022

refactor: Reduce how much code is instantiated for comparisons #2365

Closed

tustvold mentioned this issue Aug 26, 2022

Split up Arrow Crate #2594

Closed

tustvold mentioned this issue Oct 6, 2022

Inconsistent Dyn Scalar Kernels #2837

Closed

tustvold closed this as completed Jan 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Huge amount of llvm code generated by comparison kernels, potentially slowing compile times #1858

Huge amount of llvm code generated by comparison kernels, potentially slowing compile times #1858

jhorstmann commented Jun 13, 2022

nevi-me commented Jun 13, 2022

tustvold commented Jul 28, 2022

tustvold commented Jan 19, 2023

Huge amount of llvm code generated by comparison kernels, potentially slowing compile times #1858

Huge amount of llvm code generated by comparison kernels, potentially slowing compile times #1858

Comments

jhorstmann commented Jun 13, 2022

nevi-me commented Jun 13, 2022

tustvold commented Jul 28, 2022

tustvold commented Jan 19, 2023