Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Digit Counting Algorithm #1

Closed
Alexhuszagh opened this issue Aug 23, 2021 · 2 comments
Closed

Optimize Digit Counting Algorithm #1

Alexhuszagh opened this issue Aug 23, 2021 · 2 comments

Comments

@Alexhuszagh
Copy link

Issue

decimal_length_minus_1 uses a very naive digit counting algorithm, when a few faster algorithms exist, which can be fast, or relatively quick and economical. There's a few approaches to speed up decimal_length_minus_1, which currently uses a very naive algorithm. This can be optimized significantly. These algorithms determine the decimal length, so decimal_length_minus_1 would just subtract 1 from these values. The assembly generated for various different algorithms is on Compiler Explorer here.

Implementations

1). Fast, but requires a bit of static storage, is described here.

This computes a fast log2 , and uses a pre-computed table to determine rounding of the value.

#[inline]
pub fn fast_log2(x: u32) -> usize {
    32 - 1 - (x | 1).leading_zeros() as usize
}

#[inline]
pub fn fast_digit_count(x: u32) -> usize {
    const TABLE: [u64; 32] = [
        4294967296,
        8589934582,
        8589934582,
        8589934582,
        12884901788,
        12884901788,
        12884901788,
        17179868184,
        17179868184,
        17179868184,
        21474826480,
        21474826480,
        21474826480,
        21474826480,
        25769703776,
        25769703776,
        25769703776,
        30063771072,
        30063771072,
        30063771072,
        34349738368,
        34349738368,
        34349738368,
        34349738368,
        38554705664,
        38554705664,
        38554705664,
        41949672960,
        41949672960,
        41949672960,
        42949672960,
        42949672960,
    ];
    let shift = unsafe { *TABLE.get_unchecked(fast_log2(x)) };
    let count = (x as u64 + shift) >> 32;
    count as usize
}

This requires a bit of static storage, but computes the value extremely cheaply, and optimizes to an add and shr instruction, along with a table lookup.

2). A more economical solution, but still fast algorithm is the following:

pub fn fast_digit_count_v2(x: u32) -> usize {
    const TABLE: [u32; 9] = [9, 99, 999, 9999, 99999, 999999, 9999999, 99999999, 999999999];
    let mut y = (9 * fast_log2(x)) >> 5;
    y += (x > unsafe { *TABLE.get_unchecked(y) }) as usize;
    y + 1
}

These are all significantly more efficient than the current algorithm (which has been shifted by 1):

fn fast_digit_count_v3(v: u32) -> i32 {
    if v >= 100000000 {
        9
    } else if v >= 10000000 {
        8
    } else if v >= 1000000 {
        7
    } else if v >= 100000 {
        6
    } else if v >= 10000 {
        5
    } else if v >= 1000 {
        4
    } else if v >= 100 {
        3
    } else if v >= 10 {
        2
    } else {
        1
    }
}

A quick look at the optimization results can be found here. The second solution is likely the best, since it's effectively very cheap, requires minimal static storage, the storage requirements can be reduced due to the small number of digits required (never >= 10^9).

Solution

A simple implementation of decimal_length_minus_1 would therefore be:

#[inline]
pub fn fast_log2(x: u32) -> i32 {
    32 - 1 - (x | 1).leading_zeros() as i32
}

pub fn decimal_length_minus_1(x: u32) -> i32 {
    const TABLE: [u32; 9] = [9, 99, 999, 9999, 99999, 999999, 9999999, 99999999, 999999999];
    let mut y = (9 * fast_log2(x)) >> 5;
    y += (x > unsafe { *TABLE.get_unchecked(y) }) as i32;
    y
}

This can never be "unsafe", since the maximum value from fast_log2) is 31, and (9 * 31) >> 5 is 8.

@dtolnay
Copy link
Owner

dtolnay commented Aug 23, 2021

Thanks for flagging this!

This sort of discussion needs to go upstream to https://github.com/jk-jeon/dragonbox. I will periodically pull in batch updates same as I do for the ryu crate. I don't intend to diverge algorithmically from upstream because I don't have the expertise with the algorithm to maintain it, even if the function in this issue is relatively self contained.

@dtolnay dtolnay closed this as completed Aug 23, 2021
@Alexhuszagh
Copy link
Author

Thanks for flagging this!

This sort of discussion needs to go upstream to https://github.com/jk-jeon/dragonbox. I will periodically pull in batch updates same as I do for the ryu crate. I don't intend to diverge algorithmically from upstream because I don't have the expertise with the algorithm to maintain it, even if the function in this issue is relatively self contained.

Thanks for flagging this, I'll do just that then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants