Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge amount of llvm code generated by comparison kernels, potentially slowing compile times #1858

Closed
jhorstmann opened this issue Jun 13, 2022 · 3 comments
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@jhorstmann
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Incremental compile times feel like they have gotten slower some time ago. While investigating I notices that a large amount of llvm code is generated for the comparison kernels, especially dyn and dict versions. These end up calling the generic BooleanArray::from_iter method which therefore gets duplicated for each possible iterator type.

$ cargo llvm-lines | head -n30
   Compiling arrow v16.0.0 (/home/jhorstmann/Source/github/apache/arrow-rs/arrow)
    Finished dev [unoptimized + debuginfo] target(s) in 35.58s
  Lines           Copies        Function name
  -----           ------        -------------
  3156218 (100%)  73225 (100%)  (TOTAL)
   292166 (9.3%)   1164 (1.6%)  <arrow::array::array_boolean::BooleanArray as core::iter::traits::collect::FromIterator<Ptr>>::from_iter
   112200 (3.6%)   1545 (2.1%)  core::iter::traits::iterator::Iterator::fold
    92112 (2.9%)    912 (1.2%)  arrow::compute::kernels::comparison::cmp_dict
    85538 (2.7%)   1236 (1.7%)  <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}}
    84972 (2.7%)   1164 (1.6%)  <arrow::array::array_boolean::BooleanArray as core::iter::traits::collect::FromIterator<Ptr>>::from_iter::{{closure}}
    83903 (2.7%)   1478 (2.0%)  core::iter::adapters::map::map_fold::{{closure}}
    80639 (2.6%)   1533 (2.1%)  core::option::Option<T>::map
    80304 (2.5%)    912 (1.2%)  arrow::compute::kernels::comparison::cmp_dict::{{closure}}
    78427 (2.5%)   1478 (2.0%)  <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
    65942 (2.1%)   1236 (1.7%)  <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold
    59070 (1.9%)   1517 (2.1%)  core::iter::traits::iterator::Iterator::for_each
    39108 (1.2%)   2360 (3.2%)  core::iter::adapters::map::Map<I,F>::new
    38034 (1.2%)    137 (0.2%)  arrow::util::trusted_len::trusted_len_unzip
    37336 (1.2%)   1511 (2.1%)  core::iter::traits::iterator::Iterator::for_each::call::{{closure}}
    33218 (1.1%)     62 (0.1%)  core::slice::sort::partition_in_blocks
    29970 (0.9%)     81 (0.1%)  arrow::compute::kernels::take::take_primitive
    27904 (0.9%)   2360 (3.2%)  core::iter::traits::iterator::Iterator::map
    27631 (0.9%)    249 (0.3%)  <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::next
    27392 (0.9%)    202 (0.3%)  <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::size_hint
    26906 (0.9%)    140 (0.2%)  arrow::array::array_primitive::PrimitiveArray<T>::from_trusted_len_iter
    26764 (0.8%)    136 (0.2%)  arrow::buffer::mutable::MutableBuffer::try_from_trusted_len_iter
    25676 (0.8%)    277 (0.4%)  core::iter::traits::iterator::Iterator::try_fold
    22117 (0.7%)    257 (0.4%)  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
    19091 (0.6%)    215 (0.3%)  <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::next
    18150 (0.6%)    415 (0.6%)  core::option::Option<T>::ok_or_else
    17976 (0.6%)    168 (0.2%)  arrow::buffer::mutable::MutableBuffer::from_trusted_len_iter_bool
    17430 (0.6%)   1478 (2.0%)  core::iter::adapters::map::map_fold

Removing the dyn and dict comparison kernels gives the following number of lines (but this is of course not a realistic option):

$ cargo llvm-lines | head -n30
   Compiling arrow v16.0.0 (/home/jhorstmann/Source/github/apache/arrow-rs/arrow)

  Lines           Copies        Function name
  -----           ------        -------------
  1737110 (100%)  41764 (100%)  (TOTAL)
    49753 (2.9%)    939 (2.2%)  core::option::Option<T>::map
    38034 (2.2%)    137 (0.3%)  arrow::util::trusted_len::trusted_len_unzip
    33218 (1.9%)     62 (0.1%)  core::slice::sort::partition_in_blocks
    30112 (1.7%)    385 (0.9%)  core::iter::traits::iterator::Iterator::fold
    29970 (1.7%)     81 (0.2%)  arrow::compute::kernels::take::take_primitive
    26906 (1.5%)    140 (0.3%)  arrow::array::array_primitive::PrimitiveArray<T>::from_trusted_len_iter
    26764 (1.5%)    136 (0.3%)  arrow::buffer::mutable::MutableBuffer::try_from_trusted_len_iter
    25676 (1.5%)    277 (0.7%)  core::iter::traits::iterator::Iterator::try_fold
    22117 (1.3%)    257 (0.6%)  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
    19091 (1.1%)    215 (0.5%)  <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::next
    18337 (1.1%)    318 (0.8%)  core::iter::adapters::map::map_fold::{{closure}}
    16947 (1.0%)    318 (0.8%)  <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
    16576 (1.0%)     64 (0.2%)  arrow::compute::kernels::cast::pack_numeric_to_dictionary
    16177 (0.9%)    261 (0.6%)  <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
    15683 (0.9%)     41 (0.1%)  <arrow::array::array_string::GenericStringArray<OffsetSize> as core::iter::traits::collect::FromIterator<core::option::Option<Ptr>>>::from_iter
    14998 (0.9%)    838 (2.0%)  core::iter::adapters::map::Map<I,F>::new
    14586 (0.8%)     86 (0.2%)  arrow::buffer::mutable::MutableBuffer::from_trusted_len_iter
    14551 (0.8%)     96 (0.2%)  arrow::buffer::mutable::MutableBuffer::extend_from_iter
    14145 (0.8%)     77 (0.2%)  <arrow::array::array_primitive::PrimitiveArray<T> as core::iter::traits::collect::FromIterator<Ptr>>::from_iter
    13912 (0.8%)     39 (0.1%)  arrow::array::array::print_long_array
    13830 (0.8%)    357 (0.9%)  core::iter::traits::iterator::Iterator::for_each
    13743 (0.8%)     27 (0.1%)  arrow::compute::kernels::filter::filter_primitive
    13726 (0.8%)     62 (0.1%)  core::slice::sort::partition
    13125 (0.8%)     98 (0.2%)  <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::try_fold::{{closure}}
    12256 (0.7%)     64 (0.2%)  arrow::array::builder::PrimitiveDictionaryBuilder<K,V>::append
    11804 (0.7%)     62 (0.1%)  core::slice::sort::partition_equal
    11215 (0.6%)     88 (0.2%)  <arrow::buffer::immutable::Buffer as core::iter::traits::collect::FromIterator<T>>::from_iter

This seems to indicate that nearly half of the llvm code is caused by the comparison kernels.

Describe the solution you'd like

I'm not sure whether there is a good solution for this, but wanted to document these findings.

The kernels use macros very extensively, but replacing these with generic functions will probably lead to the same amount of llvm lines because of monomorphization. Maybe some of the non-generic common code could be extracted into helper methods.

Describe alternatives you've considered

The dyn kernels could also be put behind a feature flag, but that would again complicate the test matrix.

@jhorstmann jhorstmann added the enhancement Any new improvement worthy of a entry in the changelog label Jun 13, 2022
@nevi-me
Copy link
Contributor

nevi-me commented Jun 13, 2022

Comparison from the last time this work was done #715 (comment)

@tustvold
Copy link
Contributor

It occurs to me that #2038 might actually help with this, as it reduces the amount of logic in the FromIterator implementation, and delegates it to the Builder

@tustvold
Copy link
Contributor

I'm closing as these comparison kernels are now gated with feature flags, and crates can also choose not to depend on arrow-ord at all. Feel free to re-open if there is additional work to be done in this area

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants