Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use a faster log2 > log10 conversion without floats #1555

Open
wants to merge 1 commit into
base: master
from

Conversation

@nolange
Copy link

commented Aug 16, 2019

  • there was a weird rounding logic, a whole number != 0 times log(2) will never be a whole number.
  • the multiplication with log(2) can be easily approximated with fixpoint math in the restricted range of possible numbers.

so I replaced the floating point multiplication with a 32bit fixed point approximation, which should have the biggest gains in CPUs without FPU (without double precision).

I tested the range -1711 <= e < 1092, both old and new algorithm return the same results for this range (which is bigger than the range of possible values).

Results from dtoa-benchmark,
modified version is milo2, cpu is a core-i7

Verifying grisu2               ... OK. Length Avg = 21.975, Max = 25
Verifying milo2                ... OK. Length Avg = 21.975, Max = 25
Verifying sprintf              ... Error: expect 0.1 but actual 0.10000000000000001
Error: expect 1.2345 but actual 1.2344999999999999
OK. Length Avg = 22.935, Max = 24
Verifying floaxie              ... OK. Length Avg = 21.974, Max = 24
Verifying doubleconv           ... OK. Length Avg = 22.420, Max = 25
Verifying milo                 ... OK. Length Avg = 21.975, Max = 25
Verifying ostringstream        ... Error: expect 0.1 but actual 0.10000000000000001
Error: expect 1.2345 but actual 1.2344999999999999
OK. Length Avg = 22.935, Max = 24
Verifying emyg                 ... OK. Length Avg = 21.975, Max = 25
Verifying ostrstream           ... Error: expect 0.1 but actual 0.10000000000000001
Error: expect 1.2345 but actual 1.2344999999999999
OK. Length Avg = 22.935, Max = 24
Verifying fpconv               ... OK. Length Avg = 22.444, Max = 25
Benchmarking randomdigit grisu2               ... [  93.500ns,  131.700ns]
Benchmarking randomdigit milo2                ... [  26.400ns,   62.200ns]
Benchmarking randomdigit sprintf              ... [ 681.600ns,  818.100ns]
Benchmarking randomdigit floaxie              ... [  23.700ns,   85.300ns]
Benchmarking randomdigit doubleconv           ... [  60.700ns,  119.200ns]
Benchmarking randomdigit milo                 ... [  28.800ns,   63.800ns]
Benchmarking randomdigit ostringstream        ... [1044.100ns, 1207.700ns]
Benchmarking randomdigit emyg                 ... [  28.600ns,   63.600ns]
Benchmarking randomdigit ostrstream           ... [ 982.400ns, 1125.600ns]
Benchmarking randomdigit fpconv               ... [ 101.700ns,  162.200ns]
Benchmarking randomdigit null                 ... [   1.500ns,    1.500ns]

Old vs new code on a CPU without FPU via Godbolt

@coveralls

This comment has been minimized.

Copy link

commented Aug 16, 2019

Coverage Status

Coverage increased (+0.003%) to 99.922% when pulling 0bf2947 on nolange:improve_log10_conversion into d87b698 on Tencent:master.

@nolange nolange force-pushed the nolange:improve_log10_conversion branch from 1541e9c to c5793be Aug 19, 2019

use a faster log2 > log10 conversion without floats
replace the floating point multiplication with a 32bit
fixed point approximation.
the results are exact in the required range.

on x86_64 there is a notable speedup in dtoa-benchmark,
there should be a huge speedup on CPUs without fpu.
milo2                ... [  26.400ns,   62.200ns]
milo                 ... [  28.800ns,   63.800ns]

V2 now has further simplificaitons

@nolange nolange force-pushed the nolange:improve_log10_conversion branch from c5793be to 0bf2947 Aug 19, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.