-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Remove x87 elliptic integral in solenoid B-field #1469
perf: Remove x87 elliptic integral in solenoid B-field #1469
Conversation
As it stands, the solenoid B-field code uses the `boost::math::ellint_1` and `boost::math::ellint_2` methods. For reasons which I do not understand, these functions use 80-bit x87 floating point numbers, which leads to a lot of conversion between x87 and SSE floating points. It also has the downside that x87 operations are generally slower then their SSE counterparts on modern CPUs. The interesting thing is that Boost's method for deciding on the floating point precision seems to be quite arbitrary; even when the functions are explicitly templated to use 64-bit `double` types, the internals of these functions still use 80-bit floating points. The best solution I have found is to use a compiler preprocessor flag, `BOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS`, which tells Boost not to use x87 floating points at all. Enabling this flag only in the solenoid B-field translation unit should be safe and improve performance. Very unscientifically benchmarked on my Intel i5-7300U CPU, the solenoid B-field benchmark with the old code takes 615.7±7.3µs per iteration, while the new code takes 353.7±7.0µs, a speedup of about 75%.
Codecov Report
@@ Coverage Diff @@
## main #1469 +/- ##
=======================================
Coverage 48.59% 48.59%
=======================================
Files 380 380
Lines 20589 20589
Branches 9431 9431
=======================================
Hits 10005 10005
Misses 4097 4097
Partials 6487 6487
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! lgtm
Seeing this only now, I'm guessing you checked that the solenoid field output is approximately compatible @stephenswat? (not identical of course) |
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-develop/v19.x develop/v19.x
# Navigate to the new working tree
cd .worktrees/backport-develop/v19.x
# Create a new branch
git switch --create backport-1469-to-develop/v19.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 3795148974ec2682f84218b7f8c14d1f3be121c1
# Push it to GitHub
git push --set-upstream origin backport-1469-to-develop/v19.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-develop/v19.x Then, create a pull request where the |
…1469) As it stands, the solenoid B-field code uses the `boost::math::ellint_1` and `boost::math::ellint_2` methods. For reasons which I do not understand, these functions use 80-bit x87 floating point numbers, which leads to a lot of conversion between x87 and SSE floating points. It also has the downside that x87 operations are generally slower then their SSE counterparts on modern CPUs. The interesting thing is that Boost's method for deciding on the floating point precision seems to be quite arbitrary; even when the functions are explicitly templated to use 64-bit `double` types, the internals of these functions still use 80-bit floating points. The best solution I have found is to use a compiler preprocessor flag, `BOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS`, which tells Boost not to use x87 floating points at all. Enabling this flag only in the solenoid B-field translation unit should be safe and improve performance. Very unscientifically benchmarked on my Intel i5-7300U CPU, the solenoid B-field benchmark with the old code takes 615.7±7.3µs per iteration, while the new code takes 353.7±7.0µs, a speedup of about 75%.
…to develop/v19.x] (#1487) Backport 3795148 from #1469 --- As it stands, the solenoid B-field code uses the `boost::math::ellint_1` and `boost::math::ellint_2` methods. For reasons which I do not understand, these functions use 80-bit x87 floating point numbers, which leads to a lot of conversion between x87 and SSE floating points. It also has the downside that x87 operations are generally slower then their SSE counterparts on modern CPUs. The interesting thing is that Boost's method for deciding on the floating point precision seems to be quite arbitrary; even when the functions are explicitly templated to use 64-bit `double` types, the internals of these functions still use 80-bit floating points. The best solution I have found is to use a compiler preprocessor flag, `BOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS`, which tells Boost not to use x87 floating points at all. Enabling this flag only in the solenoid B-field translation unit should be safe and improve performance. Very unscientifically benchmarked on my Intel i5-7300U CPU, the solenoid B-field benchmark with the old code takes 615.7±7.3µs per iteration, while the new code takes 353.7±7.0µs, a speedup of about 75%.
As it stands, the solenoid B-field code uses the
boost::math::ellint_1
andboost::math::ellint_2
methods. For reasons which I do not understand, these functions use 80-bit x87 floating point numbers, which leads to a lot of conversion between x87 and SSE floating points. It also has the downside that x87 operations are generally slower then their SSE counterparts on modern CPUs.The interesting thing is that Boost's method for deciding on the floating point precision seems to be quite arbitrary; even when the
functions are explicitly templated to use 64-bit
double
types, the internals of these functions still use 80-bit floating points.The best solution I have found is to use a compiler preprocessor flag,
BOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS
, which tells Boost not to use x87 floating points at all. Enabling this flag only in the solenoid B-field translation unit should be safe and improve performance. Very unscientifically benchmarked on my Intel i5-7300U CPU, the solenoid B-field benchmark with the old code takes 615.7±7.3µs per iteration, while the new code takes 353.7±7.0µs, a speedup of about 75%.