Speed up approximate exponential function by 2.1x #210

rmlarsen · 2023-12-15T21:05:07Z

This change replaces the fast exponential approximation based on the limit exp(x) = lim_{n->inf} (1+x/n)^n, which uses 1 double division, 1 double addition and 12 double multiplications. The new implementation uses Shraudolph's approximation, which uses 1 double multiplication, 1 double addition, and 1 integer shift. The Shraudolph approximation is accurate (maximum absolute error 2.98%) over a much larger interval than the old implementation. In fact, the 2.98% max error was measured on the interval [-707.703272; 707.703272], which is almost the entire interval on which exp(x) doesn't overflow the IEEE double range: [-709.78271; 709.78271]

In comparison, the old approximation has zero correct bits when the absolute value of the argument exceeds ~88.

Pivoted flame graph before:

Pivoted flame graph after:

… = lim_{n->inf} (1+1/n)^n, which uses 1 double division, 1 double addition and 12 double multiplications, with Shraudolph's approximation from https://www.schraudolph.org/pubs/Schraudolph99.pdf, which uses 1 double multiplication, 1 integer addition, 1 integer shift. This makes exp2 disappear from the profile entirely in our application, and reduced the time spent in sta::findRoot by 12%. For our application, this reduces Signed-off-by: Rasmus Munk Larsen <rmlarsen@google.com>

…lph's paper for easier reference. Signed-off-by: Rasmus Munk Larsen <rmlarsen@google.com>

rmlarsen · 2023-12-15T23:05:28Z

@maliberty OK, this is ready for review.

maliberty · 2023-12-16T01:27:11Z

Looks like https://github.com/The-OpenROAD-Project/OpenROAD/blob/1749609708ddfa210d26a18c3746ac6f0cc4764f/src/gpl/src/nesterovBase.cpp#L2690 is similar.

rmlarsen · 2023-12-18T17:26:48Z

@maliberty indeed. Let me update that as well.

maliberty · 2023-12-18T22:22:48Z

I've started a full test run to make sure the approximation is fine.

rmlarsen · 2023-12-18T22:33:37Z

@maliberty I'm thinking of switching the bias to the value kExpC =45799, which minimizes the maximum relative error.

maliberty · 2023-12-18T22:35:02Z

@maliberty I'm thinking of switching the bias to the value kExpC =45799, which minimizes the maximum relative error.

Let me know soon as I'll have to restart the run.

maliberty · 2023-12-18T22:37:29Z

I see no check to ensure " The user must ensure that the argument is in the valid range (roughly, -700 to 700)."

rmlarsen · 2023-12-18T22:50:46Z

@maliberty I'll add that. The old implementation didn't check for large positive arguments either and has worse accuracy for |x| > ~15.8.

rovinski · 2023-12-18T22:55:20Z

Another possibility is to simply fall back to a slower but more accurate method if the input is outside of an acceptable range.

rmlarsen · 2023-12-18T23:01:53Z

@rovinski agreed. exp<double>(x) overflows to infinity at roughly x=709.8. I don't see why Shraudolph's approximation shouldn't hold all the way to that limit, but let me check and set the overflow limit explicitly in the code. Personally I'm a big fan of handling the IEEE special values correctly, so let me ask: Do you care to make sure that exp(NaN) == NaN?

maliberty · 2023-12-18T23:04:13Z

Do you care to make sure that exp(NaN) == NaN?

not especially

…roximation is accurate. Fortunately, this is almost the entire range in which exp(x) is finite. Signed-off-by: Rasmus Munk Larsen <rmlarsen@google.com>

rmlarsen · 2023-12-19T01:08:26Z

OK, the approximation holds to <3% relative error up to |x|<=707.703272. This is pretty close to where exp(x) overflows to infinity anyway, so I just shifted the point at which we return infinity down a bit. Please take another look. This reduces the speedup from 3x to 2x, but we have a better handle on the numerics, so it's probably worth it.

Signed-off-by: Rasmus Munk Larsen <rmlarsen@google.com>

dcalc/DmpCeff.cc

rovinski · 2023-12-19T06:47:57Z

OK, the approximation holds to <3% relative error up to |x|<=707.703272. This is pretty close to where exp(x) overflows to infinity anyway, so I just shifted the point at which we return infinity down a bit. Please take another look. This reduces the speedup from 3x to 2x, but we have a better handle on the numerics, so it's probably worth it.

I'm willing to bet that if you profiled the inputs to this function, you don't get values anywhere near +/-707 because apparently the previous approximation has been working fine. If the accuracy is already improved over the prior method, then you could probably hold onto the runtime improvements.

rovinski · 2023-12-19T06:53:56Z

It could also be that even though you can more accurately represent values, it isn't algorithmically useful to represent, e.g. exp(-100) as 3.720076e-44 vs. 0.0 considering the prior snap point to 0.0 was at exp(-12).

…nitude in the common case. Signed-off-by: Rasmus Munk Larsen <rmlarsen@google.com>

rmlarsen · 2023-12-19T19:08:17Z

@rovinski I was able to recover the speed by only branching on the magnitude in the common case.
@tspyrou I simplified the long comment.

@rovinski I observed something interesting when trying to drop down to std::exp(x) for arguments with magnitudes larger than 707.7. When doing this, the library function for exp __Gl___exp showed up with a significant contribution to the profile:

This must mean that sta::exp2 is actually called with arguments outside the finite range a fair bit. I suspect this implies that some non-trivial improvements could be made in the root finder.

rmlarsen · 2023-12-19T19:19:14Z

@maliberty I think this is ready for another test run. Thank you for your patience.

maliberty · 2023-12-19T19:58:35Z

dcalc/DmpCeff.cc

+    // For arguments with magnitude greater than ln(DBL_MAX/8) ~= 707.703272,
+    // Shraudolphs approximation degrades severely in accuracy, so we return
+    // zero or infinity, depending on the sign of x.
+    return x < 0.0 ? 0.0 : std::numeric_limits<double>::infinity();


Why not return std::exp(x) ? This should be a rare case and let it handle overflow

@maliberty as mentioned above, it seems to not be a rare case, and significantly hurts performance because std::exp is so slow. I think this is something in the root finder that needs to be debugged.

Perhaps the root finder keeps chasing denormal values because the stopping criterion is not very good? Capping at -707.7 means that we do not return denormals. (Just a wild guess).

The prior code had capping for x < -12.0 so it could well be that the calling code assumes such. I wonder if you wouldn't get a benefit by keeping the old limit as it was apparently acceptable. Investigating the solver also makes sense.

I'm guessing >707 doesn't happen much and it could use std::exp.

Unfortunately that would yield significantly slower code because we'd need two conditionals. I doubt that the calling code makes such specific assumptions, since the tests pass. With the current version of this PR, you get much better numerics and a 2.1x speedup. What's not to like?

Also, if we are OK underflowing at -12, why do we care if we overflow at 707.7 instead of 709.8?

maliberty · 2023-12-19T21:39:52Z

I've started the tests

maliberty · 2023-12-20T18:45:23Z

I had a wrong submodule and so had to restart the run this morning.

Signed-off-by: Matt Liberty <mliberty@precisioninno.com>

maliberty · 2023-12-30T16:02:00Z

In the public CI I see aes_lvt asap7 and mock-array asap7 both fail with this change. It might be a case of a small diff leading to a magnified diff in the end result but they need to be investigated.

rmlarsen · 2024-01-08T20:02:13Z

@maliberty thanks for running the tests. I'll investigate. Do you have a link to the test logs?

maliberty · 2024-01-08T23:19:58Z

https://jenkins.openroad.tools/job/OpenROAD-flow-scripts-All-Tests-Private/job/secure-schraudolph/

I'm not sure if you'll be able to see that pipeline or not

rmlarsen · 2024-01-08T23:29:51Z

@maliberty no, I'm afraid not. I'll try to run it locally when I'm back in the office tomorrow.

rmlarsen changed the title ~~Speedup approximate exponential function~~ Speed up approximate exponential function Dec 15, 2023

Use std::memcpy for type punning. Follow naming convention is Shraudo…

a5a7f5c

…lph's paper for easier reference. Signed-off-by: Rasmus Munk Larsen <rmlarsen@google.com>

rmlarsen changed the title ~~Speed up approximate exponential function~~ Speed up approximate exponential function by 3x Dec 15, 2023

Handle arguments that fall outside the range where the Shraudolph app…

b2db72b

…roximation is accurate. Fortunately, this is almost the entire range in which exp(x) is finite. Signed-off-by: Rasmus Munk Larsen <rmlarsen@google.com>

rmlarsen changed the title ~~Speed up approximate exponential function by 3x~~ Speed up approximate exponential function by 2x Dec 19, 2023

format

27a6f73

Signed-off-by: Rasmus Munk Larsen <rmlarsen@google.com>

rmlarsen changed the title ~~Speed up approximate exponential function by 2x~~ Speed up approximate exponential function by 63% Dec 19, 2023

tspyrou reviewed Dec 19, 2023

View reviewed changes

dcalc/DmpCeff.cc Outdated Show resolved Hide resolved

Recover most of the speed lost to range checking by only checking mag…

785bc8e

…nitude in the common case. Signed-off-by: Rasmus Munk Larsen <rmlarsen@google.com>

rmlarsen changed the title ~~Speed up approximate exponential function by 63%~~ Speed up approximate exponential function by 2.1x Dec 19, 2023

maliberty reviewed Dec 19, 2023

View reviewed changes

Merge branch 'master' into schraudolph

210e82e

Signed-off-by: Matt Liberty <mliberty@precisioninno.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up approximate exponential function by 2.1x #210

Speed up approximate exponential function by 2.1x #210

rmlarsen commented Dec 15, 2023 •

edited

Loading

rmlarsen commented Dec 15, 2023

maliberty commented Dec 16, 2023

rmlarsen commented Dec 18, 2023

maliberty commented Dec 18, 2023

rmlarsen commented Dec 18, 2023

maliberty commented Dec 18, 2023

maliberty commented Dec 18, 2023

rmlarsen commented Dec 18, 2023

rovinski commented Dec 18, 2023

rmlarsen commented Dec 18, 2023

maliberty commented Dec 18, 2023 •

edited

Loading

rmlarsen commented Dec 19, 2023 •

edited

Loading

rovinski commented Dec 19, 2023

rovinski commented Dec 19, 2023

rmlarsen commented Dec 19, 2023 •

edited

Loading

rmlarsen commented Dec 19, 2023

maliberty Dec 19, 2023

rmlarsen Dec 19, 2023

rmlarsen Dec 19, 2023 •

edited

Loading

maliberty Dec 19, 2023

rmlarsen Dec 19, 2023

rmlarsen Dec 19, 2023

maliberty commented Dec 19, 2023

maliberty commented Dec 20, 2023

maliberty commented Dec 30, 2023

rmlarsen commented Jan 8, 2024

maliberty commented Jan 8, 2024

rmlarsen commented Jan 8, 2024

Speed up approximate exponential function by 2.1x #210

Are you sure you want to change the base?

Speed up approximate exponential function by 2.1x #210

Conversation

rmlarsen commented Dec 15, 2023 • edited Loading

rmlarsen commented Dec 15, 2023

maliberty commented Dec 16, 2023

rmlarsen commented Dec 18, 2023

maliberty commented Dec 18, 2023

rmlarsen commented Dec 18, 2023

maliberty commented Dec 18, 2023

maliberty commented Dec 18, 2023

rmlarsen commented Dec 18, 2023

rovinski commented Dec 18, 2023

rmlarsen commented Dec 18, 2023

maliberty commented Dec 18, 2023 • edited Loading

rmlarsen commented Dec 19, 2023 • edited Loading

rovinski commented Dec 19, 2023

rovinski commented Dec 19, 2023

rmlarsen commented Dec 19, 2023 • edited Loading

rmlarsen commented Dec 19, 2023

maliberty Dec 19, 2023

Choose a reason for hiding this comment

rmlarsen Dec 19, 2023

Choose a reason for hiding this comment

rmlarsen Dec 19, 2023 • edited Loading

Choose a reason for hiding this comment

maliberty Dec 19, 2023

Choose a reason for hiding this comment

rmlarsen Dec 19, 2023

Choose a reason for hiding this comment

rmlarsen Dec 19, 2023

Choose a reason for hiding this comment

maliberty commented Dec 19, 2023

maliberty commented Dec 20, 2023

maliberty commented Dec 30, 2023

rmlarsen commented Jan 8, 2024

maliberty commented Jan 8, 2024

rmlarsen commented Jan 8, 2024

rmlarsen commented Dec 15, 2023 •

edited

Loading

maliberty commented Dec 18, 2023 •

edited

Loading

rmlarsen commented Dec 19, 2023 •

edited

Loading

rmlarsen commented Dec 19, 2023 •

edited

Loading

rmlarsen Dec 19, 2023 •

edited

Loading