Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up approximate exponential function by 2.1x #210

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 17 additions & 20 deletions dcalc/DmpCeff.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1766,30 +1766,27 @@ DmpError::DmpError(const char *what) :
//printf("DmpError %s\n", what);
}

// This saves about 2.5% in overall run time on designs with SPEF.
// https://codingforspeed.com/using-faster-exponential-approximation
// Fast approximate exponential. See
// https://www.schraudolph.org/pubs/Schraudolph99.pdf
static double
exp2(double x)
{
if (x < -12.0)
// exp(-12) = 6.1e-6
return 0.0;
else {
double y = 1.0 + x / 4096.0;
y *= y;
y *= y;
y *= y;
y *= y;
y *= y;
y *= y;
y *= y;
y *= y;
y *= y;
y *= y;
y *= y;
y *= y;
return y;
if (std::abs(x) > 707.703272) {
// For arguments with magnitude greater than ln(DBL_MAX/8) ~= 707.703272,
// Shraudolphs approximation degrades severely in accuracy, so we return
// zero or infinity, depending on the sign of x.
return x < 0.0 ? 0.0 : std::numeric_limits<double>::infinity();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not return std::exp(x) ? This should be a rare case and let it handle overflow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maliberty as mentioned above, it seems to not be a rare case, and significantly hurts performance because std::exp is so slow. I think this is something in the root finder that needs to be debugged.

Copy link
Contributor Author

@rmlarsen rmlarsen Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the root finder keeps chasing denormal values because the stopping criterion is not very good? Capping at -707.7 means that we do not return denormals. (Just a wild guess).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prior code had capping for x < -12.0 so it could well be that the calling code assumes such. I wonder if you wouldn't get a benefit by keeping the old limit as it was apparently acceptable. Investigating the solver also makes sense.

I'm guessing >707 doesn't happen much and it could use std::exp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately that would yield significantly slower code because we'd need two conditionals. I doubt that the calling code makes such specific assumptions, since the tests pass. With the current version of this PR, you get much better numerics and a 2.1x speedup. What's not to like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we are OK underflowing at -12, why do we care if we overflow at 707.7 instead of 709.8?

}

// This approximation has a maximum relative error of 3%.
constexpr int32_t kExpA = (1 << 20) / M_LN2; // 2^20 / ln(2)
constexpr int32_t kExpB = 1023 * (1 << 20); // 1023 * 2^20
constexpr int32_t kExpC = 45799; // heuristic to minimize maximum relative error

int64_t exp_bits = static_cast<int64_t>(kExpA * x + (kExpB - kExpC)) << 32;
double exp_double;
std::memcpy(&exp_double, &exp_bits, sizeof(double));
return exp_double;
}

} // namespace