You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Most of the operations on big-integers probably don't benefit much from inlining across crates (the time to run vastly outweights the call itself), so it might be nice to reduce how much is pushed across crates.
This should improving compile times of things that use ramp (such as the quickcheck test runner), and avoiding difficulties where not everything is inlined, e.g. currently bit_length() is inlined into other crates and ends up doing a division because some of the constants it needs are not inlined. (cc #53)
There's some probable exceptions to a no-#[inline] rule:
things that operate on a bigint and a primitive, such as x == 0
simple constructors like Int::zero
very fast/O(1) operations (like bit_length)
possibly, an outer layer of fast-paths for things like addition (e.g. for Int + Int detect if one arg is zero, or a single-limb), since the compiler may be able to deduce this earlier/statically when things are inlined. However, I expect this is mostly in the noise.
It might be nice if there was a way to request inlining at specific call sites, where calls to things in ll are particularly sensitive in ramp itself, so that not everything has to be explicitly marked #[inline] (e.g. inlining num_base_digits into bit_length is helpful, because it allows constant propagation to simplify things a lot). I guess this can be simulated with something like:
The old code would end up with an actual division when inlined into
another crate, because the `BASES` symbol isn't inlined and hence the
compiler can't tell that the division will be by a power of two. This
caused `bit_length` to be very high in the profiles of my `float` crate
with the `div` instruction taking the majority of the time. It's
especially unfortunate because the divison for `bit_length` is a
division by 1.
(A simple benchmark of `Int::bit_length` shows the performance dropping
from 9 ns/iter to 1-2 ns/iter, with `div` taking ~80% of the time in the
former case.)
cc Aatch#52
Yeah, I tried to limit it to small functions/methods anyway, most of the stuff marked #[inline] is internal helper stuff (copy_incr, zero, etc.) or trait implementations that do little more than forward to other traits.
Most of the operations on big-integers probably don't benefit much from inlining across crates (the time to run vastly outweights the call itself), so it might be nice to reduce how much is pushed across crates.
This should improving compile times of things that use
ramp
(such as thequickcheck
test runner), and avoiding difficulties where not everything is inlined, e.g. currentlybit_length()
is inlined into other crates and ends up doing a division because some of the constants it needs are not inlined. (cc #53)There's some probable exceptions to a no-
#[inline]
rule:x == 0
Int::zero
bit_length
)Int + Int
detect if one arg is zero, or a single-limb), since the compiler may be able to deduce this earlier/statically when things are inlined. However, I expect this is mostly in the noise.It might be nice if there was a way to request inlining at specific call sites, where calls to things in
ll
are particularly sensitive inramp
itself, so that not everything has to be explicitly marked#[inline]
(e.g. inliningnum_base_digits
intobit_length
is helpful, because it allows constant propagation to simplify things a lot). I guess this can be simulated with something like:One defaults to calling
foo
directly, but sensitive call sites can callfoo_inline
if they really need to.The text was updated successfully, but these errors were encountered: