Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport fmath performance and other fixes from OSL #2495

Merged
merged 1 commit into from
Feb 27, 2020

Conversation

lgritz
Copy link
Collaborator

@lgritz lgritz commented Feb 23, 2020

A variety of minor rewrites of certain math functions to ensure they
generate better machine code and auto-vectorize more cleanly.

  • Lots of inline -> OIIO_FORCEINLINE

  • Improve clamp used in fast_sin and cos

  • fast_safe_pow improve code gen with OIIO_UNLIKELY

  • Introduce OIIO_FMATH_SIMD_FRIENDLY to allow application switching
    between implementations that give the best scalar performance versus
    sacrificing scalar perf to have the best SIMD vectorization of loops
    containing the fmath function. This is anticipated to be very rare,
    and of course we strive to be simultaneously fastest on scalar & simd,
    but we have one or two cases where such a tradeoff exists.

  • Lots of additional fmath benchmarks to help us judge how good they are
    compared to std functions.

  • New safe_fmod which not only prevents division by zero but even in other
    cases is much faster than std::fmod.

  • New fast_neg is faster than -float, in cases where you are ok with
    -(0.0f) being 0.0f instead of actual floating point -0.0f. If you have
    no idea what I'm talking about or why it matters, you definitely will
    like this function!

  • Most of these improvements were backported from OSL, made by Alex Wells,
    Intel.

@lgritz
Copy link
Collaborator Author

lgritz commented Feb 24, 2020

@AlexMWells

@lgritz
Copy link
Collaborator Author

lgritz commented Feb 25, 2020

Any objections to any of this?

A variety of minor rewrites of certain math functions to ensure they
generate better machine code and auto-vectorize more cleanly.

* Lots of inline -> OIIO_FORCEINLINE

* Improve clamp used in fast_sin and cos

* fast_safe_pow improve code gen with OIIO_UNLIKELY

* Introduce `OIIO_FMATH_SIMD_FRIENDLY` to allow application switching
  between implementations that give the best scalar performance versus
  sacrificing scalar perf to have the best SIMD vectorization of loops
  containing the fmath function. This is anticipated to be very rare,
  and of course we strive to be simultaneously fastest on scalar & simd,
  but we have one or two cases where such a tradeoff exists.

* Lots of additional fmath benchmarks to help us judge how good they are
  compared to std functions.

* New safe_fmod which not only prevents division by zero but even in other
  cases is much faster than std::fmod.

* New fast_neg is faster than `-float`, in cases where you are ok with
  -(0.0f) being 0.0f instead of actual floating point -0.0f. If you have
  no idea what I'm talking about or why it matters, you definitely will
  like this function!

* Most of these improvements were backported from OSL, made by Alex Wells,
  Intel.
@lgritz
Copy link
Collaborator Author

lgritz commented Feb 27, 2020

Merging. If anything turns up to be an issue later, we can always amend.

@lgritz lgritz merged commit 88feb65 into AcademySoftwareFoundation:master Feb 27, 2020
@lgritz lgritz deleted the lg-fmath branch February 27, 2020 07:57
lgritz added a commit to lgritz/OpenImageIO that referenced this pull request Feb 28, 2020
…oundation#2495)

A variety of minor rewrites of certain math functions to ensure they
generate better machine code and auto-vectorize more cleanly.

* Lots of inline -> OIIO_FORCEINLINE

* Improve clamp used in fast_sin and cos

* fast_safe_pow improve code gen with OIIO_UNLIKELY

* Introduce `OIIO_FMATH_SIMD_FRIENDLY` to allow application switching
  between implementations that give the best scalar performance versus
  sacrificing scalar perf to have the best SIMD vectorization of loops
  containing the fmath function. This is anticipated to be very rare,
  and of course we strive to be simultaneously fastest on scalar & simd,
  but we have one or two cases where such a tradeoff exists.

* Lots of additional fmath benchmarks to help us judge how good they are
  compared to std functions.

* New safe_fmod which not only prevents division by zero but even in other
  cases is much faster than std::fmod.

* New fast_neg is faster than `-float`, in cases where you are ok with
  -(0.0f) being 0.0f instead of actual floating point -0.0f. If you have
  no idea what I'm talking about or why it matters, you definitely will
  like this function!

* Most of these improvements were backported from OSL, made by Alex Wells,
  Intel.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant