Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up approximate exponential function by 2.1x #210

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

rmlarsen
Copy link
Contributor

@rmlarsen rmlarsen commented Dec 15, 2023

This change replaces the fast exponential approximation based on the limit exp(x) = lim_{n->inf} (1+x/n)^n, which uses 1 double division, 1 double addition and 12 double multiplications. The new implementation uses Shraudolph's approximation, which uses 1 double multiplication, 1 double addition, and 1 integer shift. The Shraudolph approximation is accurate (maximum absolute error 2.98%) over a much larger interval than the old implementation. In fact, the 2.98% max error was measured on the interval [-707.703272; 707.703272], which is almost the entire interval on which exp(x) doesn't overflow the IEEE double range: [-709.78271; 709.78271]

In comparison, the old approximation has zero correct bits when the absolute value of the argument exceeds ~88.

Pivoted flame graph before:
image

Pivoted flame graph after:
image

… = lim_{n->inf} (1+1/n)^n, which uses 1 double division, 1 double addition and 12 double multiplications, with Shraudolph's approximation from https://www.schraudolph.org/pubs/Schraudolph99.pdf, which uses 1 double multiplication, 1 integer addition, 1 integer shift.

This makes exp2 disappear from the profile entirely in our application, and reduced the time spent in sta::findRoot by 12%.

For our application, this reduces

Signed-off-by: Rasmus Munk Larsen <rmlarsen@google.com>
@rmlarsen rmlarsen changed the title Speedup approximate exponential function Speed up approximate exponential function Dec 15, 2023
…lph's

paper for easier reference.

Signed-off-by: Rasmus Munk Larsen <rmlarsen@google.com>
@rmlarsen rmlarsen changed the title Speed up approximate exponential function Speed up approximate exponential function by 3x Dec 15, 2023
@rmlarsen
Copy link
Contributor Author

@maliberty OK, this is ready for review.

@rmlarsen
Copy link
Contributor Author

@maliberty indeed. Let me update that as well.

@maliberty
Copy link
Member

I've started a full test run to make sure the approximation is fine.

@rmlarsen
Copy link
Contributor Author

@maliberty I'm thinking of switching the bias to the value kExpC =45799, which minimizes the maximum relative error.

@maliberty
Copy link
Member

@maliberty I'm thinking of switching the bias to the value kExpC =45799, which minimizes the maximum relative error.

Let me know soon as I'll have to restart the run.

@maliberty
Copy link
Member

I see no check to ensure " The user must ensure that the argument is in the valid range (roughly, -700 to 700)."

@rmlarsen
Copy link
Contributor Author

@maliberty I'll add that. The old implementation didn't check for large positive arguments either and has worse accuracy for |x| > ~15.8.

@rovinski
Copy link
Member

Another possibility is to simply fall back to a slower but more accurate method if the input is outside of an acceptable range.

@rmlarsen
Copy link
Contributor Author

@rovinski agreed. exp<double>(x) overflows to infinity at roughly x=709.8. I don't see why Shraudolph's approximation shouldn't hold all the way to that limit, but let me check and set the overflow limit explicitly in the code. Personally I'm a big fan of handling the IEEE special values correctly, so let me ask: Do you care to make sure that exp(NaN) == NaN?

@maliberty
Copy link
Member

maliberty commented Dec 18, 2023

Do you care to make sure that exp(NaN) == NaN?

not especially

…roximation is accurate. Fortunately, this is almost the entire range in which exp(x) is finite.

Signed-off-by: Rasmus Munk Larsen <rmlarsen@google.com>
@rmlarsen rmlarsen changed the title Speed up approximate exponential function by 3x Speed up approximate exponential function by 2x Dec 19, 2023
@rmlarsen
Copy link
Contributor Author

rmlarsen commented Dec 19, 2023

OK, the approximation holds to <3% relative error up to |x|<=707.703272. This is pretty close to where exp(x) overflows to infinity anyway, so I just shifted the point at which we return infinity down a bit. Please take another look. This reduces the speedup from 3x to 2x, but we have a better handle on the numerics, so it's probably worth it.

Signed-off-by: Rasmus Munk Larsen <rmlarsen@google.com>
@rmlarsen rmlarsen changed the title Speed up approximate exponential function by 2x Speed up approximate exponential function by 63% Dec 19, 2023
dcalc/DmpCeff.cc Outdated Show resolved Hide resolved
@rovinski
Copy link
Member

OK, the approximation holds to <3% relative error up to |x|<=707.703272. This is pretty close to where exp(x) overflows to infinity anyway, so I just shifted the point at which we return infinity down a bit. Please take another look. This reduces the speedup from 3x to 2x, but we have a better handle on the numerics, so it's probably worth it.

I'm willing to bet that if you profiled the inputs to this function, you don't get values anywhere near +/-707 because apparently the previous approximation has been working fine. If the accuracy is already improved over the prior method, then you could probably hold onto the runtime improvements.

@rovinski
Copy link
Member

It could also be that even though you can more accurately represent values, it isn't algorithmically useful to represent, e.g. exp(-100) as 3.720076e-44 vs. 0.0 considering the prior snap point to 0.0 was at exp(-12).

…nitude in the common case.

Signed-off-by: Rasmus Munk Larsen <rmlarsen@google.com>
@rmlarsen rmlarsen changed the title Speed up approximate exponential function by 63% Speed up approximate exponential function by 2.1x Dec 19, 2023
@rmlarsen
Copy link
Contributor Author

rmlarsen commented Dec 19, 2023

@rovinski I was able to recover the speed by only branching on the magnitude in the common case.
@tspyrou I simplified the long comment.

@rovinski I observed something interesting when trying to drop down to std::exp(x) for arguments with magnitudes larger than 707.7. When doing this, the library function for exp __Gl___exp showed up with a significant contribution to the profile:

image

This must mean that sta::exp2 is actually called with arguments outside the finite range a fair bit. I suspect this implies that some non-trivial improvements could be made in the root finder.

@rmlarsen
Copy link
Contributor Author

@maliberty I think this is ready for another test run. Thank you for your patience.

// For arguments with magnitude greater than ln(DBL_MAX/8) ~= 707.703272,
// Shraudolphs approximation degrades severely in accuracy, so we return
// zero or infinity, depending on the sign of x.
return x < 0.0 ? 0.0 : std::numeric_limits<double>::infinity();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not return std::exp(x) ? This should be a rare case and let it handle overflow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maliberty as mentioned above, it seems to not be a rare case, and significantly hurts performance because std::exp is so slow. I think this is something in the root finder that needs to be debugged.

Copy link
Contributor Author

@rmlarsen rmlarsen Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the root finder keeps chasing denormal values because the stopping criterion is not very good? Capping at -707.7 means that we do not return denormals. (Just a wild guess).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prior code had capping for x < -12.0 so it could well be that the calling code assumes such. I wonder if you wouldn't get a benefit by keeping the old limit as it was apparently acceptable. Investigating the solver also makes sense.

I'm guessing >707 doesn't happen much and it could use std::exp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately that would yield significantly slower code because we'd need two conditionals. I doubt that the calling code makes such specific assumptions, since the tests pass. With the current version of this PR, you get much better numerics and a 2.1x speedup. What's not to like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we are OK underflowing at -12, why do we care if we overflow at 707.7 instead of 709.8?

@maliberty
Copy link
Member

I've started the tests

@maliberty
Copy link
Member

I had a wrong submodule and so had to restart the run this morning.

Signed-off-by: Matt Liberty <mliberty@precisioninno.com>
@maliberty
Copy link
Member

In the public CI I see aes_lvt asap7 and mock-array asap7 both fail with this change. It might be a case of a small diff leading to a magnified diff in the end result but they need to be investigated.

@rmlarsen
Copy link
Contributor Author

rmlarsen commented Jan 8, 2024

@maliberty thanks for running the tests. I'll investigate. Do you have a link to the test logs?

@maliberty
Copy link
Member

https://jenkins.openroad.tools/job/OpenROAD-flow-scripts-All-Tests-Private/job/secure-schraudolph/

I'm not sure if you'll be able to see that pipeline or not

@rmlarsen
Copy link
Contributor Author

rmlarsen commented Jan 8, 2024

@maliberty no, I'm afraid not. I'll try to run it locally when I'm back in the office tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants