-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up approximate exponential function by 2.1x #210
base: master
Are you sure you want to change the base?
Conversation
… = lim_{n->inf} (1+1/n)^n, which uses 1 double division, 1 double addition and 12 double multiplications, with Shraudolph's approximation from https://www.schraudolph.org/pubs/Schraudolph99.pdf, which uses 1 double multiplication, 1 integer addition, 1 integer shift. This makes exp2 disappear from the profile entirely in our application, and reduced the time spent in sta::findRoot by 12%. For our application, this reduces Signed-off-by: Rasmus Munk Larsen <rmlarsen@google.com>
…lph's paper for easier reference. Signed-off-by: Rasmus Munk Larsen <rmlarsen@google.com>
@maliberty OK, this is ready for review. |
@maliberty indeed. Let me update that as well. |
I've started a full test run to make sure the approximation is fine. |
@maliberty I'm thinking of switching the bias to the value |
Let me know soon as I'll have to restart the run. |
I see no check to ensure " The user must ensure that the argument is in the valid range (roughly, -700 to 700)." |
@maliberty I'll add that. The old implementation didn't check for large positive arguments either and has worse accuracy for |x| > ~15.8. |
Another possibility is to simply fall back to a slower but more accurate method if the input is outside of an acceptable range. |
@rovinski agreed. |
not especially |
…roximation is accurate. Fortunately, this is almost the entire range in which exp(x) is finite. Signed-off-by: Rasmus Munk Larsen <rmlarsen@google.com>
OK, the approximation holds to <3% relative error up to |
I'm willing to bet that if you profiled the inputs to this function, you don't get values anywhere near +/-707 because apparently the previous approximation has been working fine. If the accuracy is already improved over the prior method, then you could probably hold onto the runtime improvements. |
It could also be that even though you can more accurately represent values, it isn't algorithmically useful to represent, e.g. exp(-100) as 3.720076e-44 vs. 0.0 considering the prior snap point to 0.0 was at exp(-12). |
…nitude in the common case. Signed-off-by: Rasmus Munk Larsen <rmlarsen@google.com>
@rovinski I was able to recover the speed by only branching on the magnitude in the common case. @rovinski I observed something interesting when trying to drop down to This must mean that |
@maliberty I think this is ready for another test run. Thank you for your patience. |
// For arguments with magnitude greater than ln(DBL_MAX/8) ~= 707.703272, | ||
// Shraudolphs approximation degrades severely in accuracy, so we return | ||
// zero or infinity, depending on the sign of x. | ||
return x < 0.0 ? 0.0 : std::numeric_limits<double>::infinity(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not return std::exp(x) ? This should be a rare case and let it handle overflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maliberty as mentioned above, it seems to not be a rare case, and significantly hurts performance because std::exp
is so slow. I think this is something in the root finder that needs to be debugged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the root finder keeps chasing denormal values because the stopping criterion is not very good? Capping at -707.7 means that we do not return denormals. (Just a wild guess).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prior code had capping for x < -12.0
so it could well be that the calling code assumes such. I wonder if you wouldn't get a benefit by keeping the old limit as it was apparently acceptable. Investigating the solver also makes sense.
I'm guessing >707 doesn't happen much and it could use std::exp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately that would yield significantly slower code because we'd need two conditionals. I doubt that the calling code makes such specific assumptions, since the tests pass. With the current version of this PR, you get much better numerics and a 2.1x speedup. What's not to like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if we are OK underflowing at -12, why do we care if we overflow at 707.7 instead of 709.8?
I've started the tests |
I had a wrong submodule and so had to restart the run this morning. |
Signed-off-by: Matt Liberty <mliberty@precisioninno.com>
In the public CI I see aes_lvt asap7 and mock-array asap7 both fail with this change. It might be a case of a small diff leading to a magnified diff in the end result but they need to be investigated. |
@maliberty thanks for running the tests. I'll investigate. Do you have a link to the test logs? |
https://jenkins.openroad.tools/job/OpenROAD-flow-scripts-All-Tests-Private/job/secure-schraudolph/ I'm not sure if you'll be able to see that pipeline or not |
@maliberty no, I'm afraid not. I'll try to run it locally when I'm back in the office tomorrow. |
This change replaces the fast exponential approximation based on the limit exp(x) = lim_{n->inf} (1+x/n)^n, which uses 1 double division, 1 double addition and 12 double multiplications. The new implementation uses Shraudolph's approximation, which uses 1 double multiplication, 1 double addition, and 1 integer shift. The Shraudolph approximation is accurate (maximum absolute error 2.98%) over a much larger interval than the old implementation. In fact, the 2.98% max error was measured on the interval
[-707.703272; 707.703272]
, which is almost the entire interval on whichexp(x)
doesn't overflow the IEEEdouble
range:[-709.78271; 709.78271]
In comparison, the old approximation has zero correct bits when the absolute value of the argument exceeds ~88.
Pivoted flame graph before:
Pivoted flame graph after: