Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve fmath.h madd function #2492

Merged
merged 1 commit into from
Feb 23, 2020

Conversation

lgritz
Copy link
Collaborator

@lgritz lgritz commented Feb 20, 2020

I was previously trying to use std::fma, but upon further investigation
it was turning into a single vfmadd instruction a lot less than I thought,
in many cases ending up as a function call (which itself was extra slow
when emulating fma on hardware that lacked it).

The better strategy seems to be just saying a*b+c, which on gcc &
icc compilers automatically turns into vfmadd when available on the
hardware, and adding a clang-specific pragma ensures this behavior for
clang as well.

Thanks to Alex Wells of Intel for pointing this out to me (in OSL land).

@lgritz
Copy link
Collaborator Author

lgritz commented Feb 20, 2020

@AlexMWells

I was previously trying to use std::fma, but upon further investigation
it was turning into a single vfmadd instruction a lot less than I thought,
in many cases ending up as a function call (which itself was extra slow
when emulating fma on hardware that lacked it).

The better strategy seems to be just saying `a*b+c`, which on gcc &
icc compilers automatically turns into vfmadd when available on the
hardware, and adding a clang-specific pragma ensures this behavior for
clang as well.

Thanks to Alex Wells of Intel for pointing this out to me (in OSL land).
@lgritz lgritz merged commit b008da8 into AcademySoftwareFoundation:master Feb 23, 2020
@lgritz lgritz deleted the lg-madd branch February 23, 2020 20:03
lgritz added a commit to lgritz/OpenImageIO that referenced this pull request Feb 27, 2020
I was previously trying to use std::fma, but upon further investigation
it was turning into a single vfmadd instruction a lot less than I thought,
in many cases ending up as a function call (which itself was extra slow
when emulating fma on hardware that lacked it).

The better strategy seems to be just saying `a*b+c`, which on gcc &
icc compilers automatically turns into vfmadd when available on the
hardware, and adding a clang-specific pragma ensures this behavior for
clang as well.

Thanks to Alex Wells of Intel for pointing this out to me (in OSL land).
lgritz added a commit to lgritz/OpenImageIO that referenced this pull request Feb 27, 2020
I was previously trying to use std::fma, but upon further investigation
it was turning into a single vfmadd instruction a lot less than I thought,
in many cases ending up as a function call (which itself was extra slow
when emulating fma on hardware that lacked it).

The better strategy seems to be just saying `a*b+c`, which on gcc &
icc compilers automatically turns into vfmadd when available on the
hardware, and adding a clang-specific pragma ensures this behavior for
clang as well.

Thanks to Alex Wells of Intel for pointing this out to me (in OSL land).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant