You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
decimal_length_minus_1 uses a very naive digit counting algorithm, when a few faster algorithms exist, which can be fast, or relatively quick and economical. There's a few approaches to speed up decimal_length_minus_1, which currently uses a very naive algorithm. This can be optimized significantly. These algorithms determine the decimal length, so decimal_length_minus_1 would just subtract 1 from these values. The assembly generated for various different algorithms is on Compiler Explorer here.
Implementations
1). Fast, but requires a bit of static storage, is described here.
This computes a fast log2 , and uses a pre-computed table to determine rounding of the value.
This requires a bit of static storage, but computes the value extremely cheaply, and optimizes to an add and shr instruction, along with a table lookup.
2). A more economical solution, but still fast algorithm is the following:
pubfnfast_digit_count_v2(x:u32) -> usize{constTABLE:[u32;9] = [9,99,999,9999,99999,999999,9999999,99999999,999999999];letmut y = (9*fast_log2(x)) >> 5;
y += (x > unsafe{*TABLE.get_unchecked(y)})asusize;
y + 1}
These are all significantly more efficient than the current algorithm (which has been shifted by 1):
fnfast_digit_count_v3(v:u32) -> i32{if v >= 100000000{9}elseif v >= 10000000{8}elseif v >= 1000000{7}elseif v >= 100000{6}elseif v >= 10000{5}elseif v >= 1000{4}elseif v >= 100{3}elseif v >= 10{2}else{1}}
A quick look at the optimization results can be found here. The second solution is likely the best, since it's effectively very cheap, requires minimal static storage, the storage requirements can be reduced due to the small number of digits required (never >= 10^9).
Solution
A simple implementation of decimal_length_minus_1 would therefore be:
#[inline]pubfnfast_log2(x:u32) -> i32{32 - 1 - (x | 1).leading_zeros()asi32}pubfndecimal_length_minus_1(x:u32) -> i32{constTABLE:[u32;9] = [9,99,999,9999,99999,999999,9999999,99999999,999999999];letmut y = (9*fast_log2(x)) >> 5;
y += (x > unsafe{*TABLE.get_unchecked(y)})asi32;
y
}
This can never be "unsafe", since the maximum value from fast_log2) is 31, and (9 * 31) >> 5 is 8.
The text was updated successfully, but these errors were encountered:
This sort of discussion needs to go upstream to https://github.com/jk-jeon/dragonbox. I will periodically pull in batch updates same as I do for the ryu crate. I don't intend to diverge algorithmically from upstream because I don't have the expertise with the algorithm to maintain it, even if the function in this issue is relatively self contained.
This sort of discussion needs to go upstream to https://github.com/jk-jeon/dragonbox. I will periodically pull in batch updates same as I do for the ryu crate. I don't intend to diverge algorithmically from upstream because I don't have the expertise with the algorithm to maintain it, even if the function in this issue is relatively self contained.
Issue
decimal_length_minus_1
uses a very naive digit counting algorithm, when a few faster algorithms exist, which can be fast, or relatively quick and economical. There's a few approaches to speed updecimal_length_minus_1
, which currently uses a very naive algorithm. This can be optimized significantly. These algorithms determine the decimal length, sodecimal_length_minus_1
would just subtract1
from these values. The assembly generated for various different algorithms is on Compiler Explorer here.Implementations
1). Fast, but requires a bit of static storage, is described here.
This computes a fast
log2
, and uses a pre-computed table to determine rounding of the value.This requires a bit of static storage, but computes the value extremely cheaply, and optimizes to an
add
andshr
instruction, along with a table lookup.2). A more economical solution, but still fast algorithm is the following:
These are all significantly more efficient than the current algorithm (which has been shifted by 1):
A quick look at the optimization results can be found here. The second solution is likely the best, since it's effectively very cheap, requires minimal static storage, the storage requirements can be reduced due to the small number of digits required (never >= 10^9).
Solution
A simple implementation of
decimal_length_minus_1
would therefore be:This can never be "unsafe", since the maximum value from
fast_log2)
is 31, and(9 * 31) >> 5
is8
.The text was updated successfully, but these errors were encountered: