Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable C and lighter weight half <-> float conversion #141

Merged
merged 12 commits into from
May 31, 2021

Conversation

kdt3rd
Copy link
Contributor

@kdt3rd kdt3rd commented May 9, 2021

This enables half to be used in C code.

Further, adds a more flexible half, with optimizations for F16C as well
as a light weight table improvement for embedded systems. Additionally
optimize the two conversion routines, at least for x86, and offer
preprocessor control for some behavior.

Signed-off-by: Kimball Thurston kdt3rd@gmail.com

This enables half to be used in C code.

Further, adds a more flexible half, with optimizations for F16C as well
as a light weight table improvement for embedded systems. Additionally
optimize the two conversion routines, at least for x86, and offer
preprocessor control for some behavior.

Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented May 9, 2021

CLA Signed

The committers are authorized under a signed CLA.

kdt3rd added 6 commits May 9, 2021 21:28
Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>
Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>
…patterns

Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>
Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>
Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>
Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>
Copy link
Contributor

@lgritz lgritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

I don't quite see how one, at build time, can determine whether or not f16c is enabled (without editing the cmake files themselves). Are you still editing?

@kdt3rd
Copy link
Contributor Author

kdt3rd commented May 10, 2021

Nice!

I don't quite see how one, at build time, can determine whether or not f16c is enabled (without editing the cmake files themselves). Are you still editing?

I am not actively editing - thanks for taking a look! There is some "dead" code that can be eliminated once we're happy, although makes the performance tests a bit harder to validate the old behavior. But maybe that's ok.

Anyway, it seems appropriate to defer whether F16C was enabled to the downstream user, since we largely want this stuff inlined for performance reasons, so just use the seemingly common #define that is set (F16C). At some point, may want to refactor / split the header file in two if we start getting a bunch of different versions of these two functions for the various hardware (neon, etc) out there. So the function will be adaptive based on what environment it is being compiled in and any specified overrides.

I haven't found a good way to manage SSE / neon / whatever flags inside of "modern" cmake, have you? I've seen people doing generator expressions for some things based on gcc / clang / msvc, but that doesn't deal with arm vs x86 easily.

@kdt3rd
Copy link
Contributor Author

kdt3rd commented May 10, 2021

One comment: I have not been able to find a way to do this f16c switch "automatically" so far, as with function multi-versioning under gcc or similar - those happen after the preprocessor, so you've already lost the other code :( Those constructs only seem to work to do (either) custom architecture functions that you then compile and dispatch to yourself, or auto-vectorization is a thing that works for you...

@lgritz
Copy link
Contributor

lgritz commented May 10, 2021

OH, of course, I get it. It's all inline, so we can push all the decisions downstream. Perfect.

I don't recommend ever doing it automatically. It's just too common to build on a machine that has f16c (say) but deploy on a set of machines that can't be counted on having it.

@kdt3rd
Copy link
Contributor Author

kdt3rd commented May 10, 2021

I agree, for a while yet anyway - until ivybridge, or whatever architecture, is the minimum spec supported. That's why I have been looking into FMV, which both clang and gcc have, albeit via different mechanisms. All of clang/gcc/icc/msvc have multiple with different function targets (i.e. functionA is compiled w/ f16c, functionB is not) in the same .c file, but the common denominator there involves you making the dispatch routine yourself. However, as per what I wrote last night, this is fine if you are typing the sse/whatever code yourself but unfortunately that mechanism happens after the preprocessor. So in reality, it is more convenient if you do this via multiple translation units (i.e. functionA.c and functionB.c and compile the entire file with different arguments).

The CPU detection and dispatch code is easy. I was planning on doing some form of the above in the EXR library in the (new) revamp of utilities for converting buffers (i.e. where you might use cuda, f16c, etc).

But I have not found a good way of managing the flags for doing that via cmake other than generator expressions and hard coded compile flags for each compiler, have you seen any good means of doing this lying around?

Copy link
Contributor

@meshula meshula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me. Reasonable defaults throughout, I don't spot any problems. I am amused by the decisiveness of the naming of IMATH_HALF_NO_TABLES_AT_ALL

@lgritz
Copy link
Contributor

lgritz commented May 11, 2021

src/Imath/half.h Outdated Show resolved Hide resolved
src/Imath/half.h Outdated Show resolved Hide resolved
src/Imath/half.h Outdated Show resolved Hide resolved
src/Imath/half.h Outdated Show resolved Hide resolved
src/Imath/half.h Outdated Show resolved Hide resolved
Copy link
Contributor

@meshula meshula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a suggestion that we use the new suggestion feature :)

src/Imath/half.h Outdated Show resolved Hide resolved
src/Imath/half.h Outdated Show resolved Hide resolved
src/Imath/half.h Outdated Show resolved Hide resolved
src/Imath/half.h Outdated Show resolved Hide resolved
src/Imath/half.h Outdated Show resolved Hide resolved
src/Imath/half.h Outdated Show resolved Hide resolved
Copy link
Member

@cary-ilm cary-ilm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left several formatting/terminology comments, but the substance is fine.

@kthurston
Copy link

fyi, couple of things:

  • the macros I just moved so they weren't in an ifdef __cplusplus block, but then I ran clang-format on the file, and it did that, so I hadn't done that myself :)
  • I was assuming to actually remove the IMATH_USE_ORIGINAL_HALF_IMPLEMENTATION - I was only doing that for performance testing
  • I have been playing a lot with the code, and have the half to float non-table implementation running faster than what is pushed, and in fact faster than the partial table

will write more when I get home and can look at the changes you all suggest in more depth

@cary-ilm
Copy link
Member

Take my comments as suggestions,

@cary-ilm cary-ilm closed this May 17, 2021
@cary-ilm cary-ilm reopened this May 17, 2021
@cary-ilm
Copy link
Member

sorry, hit the wrong button.

@meshula
Copy link
Contributor

meshula commented May 17, 2021

I wonder if there's some tweaks required for the clang-format file? https://clang.llvm.org/docs/ClangFormatStyleOptions.html I couldn't find anything specifically about formatting macros besides some alignment rules, unfortunately. Maybe those blocks need to be wrapped with a "no format"...

Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>
Copy link
Contributor

@meshula meshula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good, I'm jazzed for this new implementation!

src/Imath/half.h Outdated Show resolved Hide resolved
src/ImathTest/CMakeLists.txt Show resolved Hide resolved
src/ImathTest/half_perf_test.cpp Outdated Show resolved Hide resolved
@kdt3rd
Copy link
Contributor Author

kdt3rd commented May 20, 2021

Bah, important safety tip - accepting the suggestions and adding that as a commit does not add a DCO signature, just a co-author comment :(

Good suggestions for better comment descriptions

Co-authored-by: Nick Porcino <meshula@hotmail.com>
Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>
Copy link
Contributor

@meshula meshula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@kdt3rd kdt3rd merged commit ee192dd into AcademySoftwareFoundation:master May 31, 2021
@kdt3rd kdt3rd deleted the enable_c_half_optimize branch May 31, 2021 07:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants