Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider tuning the #[inline] declarations #52

Open
huonw opened this issue Nov 12, 2015 · 1 comment
Open

Consider tuning the #[inline] declarations #52

huonw opened this issue Nov 12, 2015 · 1 comment

Comments

@huonw
Copy link
Contributor

huonw commented Nov 12, 2015

Most of the operations on big-integers probably don't benefit much from inlining across crates (the time to run vastly outweights the call itself), so it might be nice to reduce how much is pushed across crates.

This should improving compile times of things that use ramp (such as the quickcheck test runner), and avoiding difficulties where not everything is inlined, e.g. currently bit_length() is inlined into other crates and ends up doing a division because some of the constants it needs are not inlined. (cc #53)

There's some probable exceptions to a no-#[inline] rule:

  • things that operate on a bigint and a primitive, such as x == 0
  • simple constructors like Int::zero
  • very fast/O(1) operations (like bit_length)
  • possibly, an outer layer of fast-paths for things like addition (e.g. for Int + Int detect if one arg is zero, or a single-limb), since the compiler may be able to deduce this earlier/statically when things are inlined. However, I expect this is mostly in the noise.

It might be nice if there was a way to request inlining at specific call sites, where calls to things in ll are particularly sensitive in ramp itself, so that not everything has to be explicitly marked #[inline] (e.g. inlining num_base_digits into bit_length is helpful, because it allows constant propagation to simplify things a lot). I guess this can be simulated with something like:

pub fn foo(...) {
    foo_inline(...)
}

#[inline]
pub fn foo_inline(...) {
     // implementation here
}

One defaults to calling foo directly, but sensitive call sites can call foo_inline if they really need to.

huonw added a commit to huonw/ramp that referenced this issue Nov 12, 2015
The old code would end up with an actual division when inlined into
another crate, because the `BASES` symbol isn't inlined and hence the
compiler can't tell that the division will be by a power of two. This
caused `bit_length` to be very high in the profiles of my `float` crate
with the `div` instruction taking the majority of the time. It's
especially unfortunate because the divison for `bit_length` is a
division by 1.

(A simple benchmark of `Int::bit_length` shows the performance dropping
from 9 ns/iter to 1-2 ns/iter, with `div` taking ~80% of the time in the
former case.)

cc Aatch#52
@Aatch
Copy link
Owner

Aatch commented Nov 12, 2015

Yeah, I tried to limit it to small functions/methods anyway, most of the stuff marked #[inline] is internal helper stuff (copy_incr, zero, etc.) or trait implementations that do little more than forward to other traits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants