Consider tuning the #[inline] declarations #52

huonw · 2015-11-12T01:11:12Z

Most of the operations on big-integers probably don't benefit much from inlining across crates (the time to run vastly outweights the call itself), so it might be nice to reduce how much is pushed across crates.

This should improving compile times of things that use ramp (such as the quickcheck test runner), and avoiding difficulties where not everything is inlined, e.g. currently bit_length() is inlined into other crates and ends up doing a division because some of the constants it needs are not inlined. (cc #53)

There's some probable exceptions to a no-#[inline] rule:

things that operate on a bigint and a primitive, such as x == 0
simple constructors like Int::zero
very fast/O(1) operations (like bit_length)
possibly, an outer layer of fast-paths for things like addition (e.g. for Int + Int detect if one arg is zero, or a single-limb), since the compiler may be able to deduce this earlier/statically when things are inlined. However, I expect this is mostly in the noise.

It might be nice if there was a way to request inlining at specific call sites, where calls to things in ll are particularly sensitive in ramp itself, so that not everything has to be explicitly marked #[inline] (e.g. inlining num_base_digits into bit_length is helpful, because it allows constant propagation to simplify things a lot). I guess this can be simulated with something like:

pub fn foo(...) {
    foo_inline(...)
}

#[inline]
pub fn foo_inline(...) {
     // implementation here
}

One defaults to calling foo directly, but sensitive call sites can call foo_inline if they really need to.

The text was updated successfully, but these errors were encountered:

The old code would end up with an actual division when inlined into another crate, because the `BASES` symbol isn't inlined and hence the compiler can't tell that the division will be by a power of two. This caused `bit_length` to be very high in the profiles of my `float` crate with the `div` instruction taking the majority of the time. It's especially unfortunate because the divison for `bit_length` is a division by 1. (A simple benchmark of `Int::bit_length` shows the performance dropping from 9 ns/iter to 1-2 ns/iter, with `div` taking ~80% of the time in the former case.) cc Aatch#52

Aatch · 2015-11-12T01:19:42Z

Yeah, I tried to limit it to small functions/methods anyway, most of the stuff marked #[inline] is internal helper stuff (copy_incr, zero, etc.) or trait implementations that do little more than forward to other traits.

huonw mentioned this issue Nov 12, 2015

Optimise ll::base::num_base_digits for 2 & other powers of it. #53

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider tuning the #[inline] declarations #52

Consider tuning the #[inline] declarations #52

huonw commented Nov 12, 2015

Aatch commented Nov 12, 2015

Consider tuning the #[inline] declarations #52

Consider tuning the #[inline] declarations #52

Comments

huonw commented Nov 12, 2015

Aatch commented Nov 12, 2015