Enable C and lighter weight half <-> float conversion #141

kdt3rd · 2021-05-09T09:19:53Z

This enables half to be used in C code.

Further, adds a more flexible half, with optimizations for F16C as well
as a light weight table improvement for embedded systems. Additionally
optimize the two conversion routines, at least for x86, and offer
preprocessor control for some behavior.

Signed-off-by: Kimball Thurston kdt3rd@gmail.com

This enables half to be used in C code. Further, adds a more flexible half, with optimizations for F16C as well as a light weight table improvement for embedded systems. Additionally optimize the two conversion routines, at least for x86, and offer preprocessor control for some behavior. Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>

linux-foundation-easycla · 2021-05-09T09:20:00Z

The committers are authorized under a signed CLA.

✅ Kimball Thurston (568b904)

Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>

…patterns Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>

Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>

lgritz

Nice!

I don't quite see how one, at build time, can determine whether or not f16c is enabled (without editing the cmake files themselves). Are you still editing?

kdt3rd · 2021-05-10T07:51:11Z

Nice!

I don't quite see how one, at build time, can determine whether or not f16c is enabled (without editing the cmake files themselves). Are you still editing?

I am not actively editing - thanks for taking a look! There is some "dead" code that can be eliminated once we're happy, although makes the performance tests a bit harder to validate the old behavior. But maybe that's ok.

Anyway, it seems appropriate to defer whether F16C was enabled to the downstream user, since we largely want this stuff inlined for performance reasons, so just use the seemingly common #define that is set (F16C). At some point, may want to refactor / split the header file in two if we start getting a bunch of different versions of these two functions for the various hardware (neon, etc) out there. So the function will be adaptive based on what environment it is being compiled in and any specified overrides.

I haven't found a good way to manage SSE / neon / whatever flags inside of "modern" cmake, have you? I've seen people doing generator expressions for some things based on gcc / clang / msvc, but that doesn't deal with arm vs x86 easily.

kdt3rd · 2021-05-10T08:35:33Z

One comment: I have not been able to find a way to do this f16c switch "automatically" so far, as with function multi-versioning under gcc or similar - those happen after the preprocessor, so you've already lost the other code :( Those constructs only seem to work to do (either) custom architecture functions that you then compile and dispatch to yourself, or auto-vectorization is a thing that works for you...

lgritz · 2021-05-10T16:25:18Z

OH, of course, I get it. It's all inline, so we can push all the decisions downstream. Perfect.

I don't recommend ever doing it automatically. It's just too common to build on a machine that has f16c (say) but deploy on a set of machines that can't be counted on having it.

kdt3rd · 2021-05-10T20:13:37Z

I agree, for a while yet anyway - until ivybridge, or whatever architecture, is the minimum spec supported. That's why I have been looking into FMV, which both clang and gcc have, albeit via different mechanisms. All of clang/gcc/icc/msvc have multiple with different function targets (i.e. functionA is compiled w/ f16c, functionB is not) in the same .c file, but the common denominator there involves you making the dispatch routine yourself. However, as per what I wrote last night, this is fine if you are typing the sse/whatever code yourself but unfortunately that mechanism happens after the preprocessor. So in reality, it is more convenient if you do this via multiple translation units (i.e. functionA.c and functionB.c and compile the entire file with different arguments).

The CPU detection and dispatch code is easy. I was planning on doing some form of the above in the EXR library in the (new) revamp of utilities for converting buffers (i.e. where you might use cuda, f16c, etc).

But I have not found a good way of managing the flags for doing that via cmake other than generator expressions and hard coded compile flags for each compiler, have you seen any good means of doing this lying around?

meshula

This all looks good to me. Reasonable defaults throughout, I don't spot any problems. I am amused by the decisiveness of the naming of IMATH_HALF_NO_TABLES_AT_ALL

lgritz · 2021-05-11T05:14:00Z

@kdt3rd Here's the best I've come up with: https://github.com/OpenImageIO/oiio/blob/master/src/cmake/compiler.cmake#L241

src/Imath/half.h

src/ImathTest/half_c_main.c

meshula

Added a suggestion that we use the new suggestion feature :)

src/Imath/half.h

src/Imath/half.cpp

cary-ilm

LGTM. I left several formatting/terminology comments, but the substance is fine.

kthurston · 2021-05-17T03:02:25Z

fyi, couple of things:

the macros I just moved so they weren't in an ifdef __cplusplus block, but then I ran clang-format on the file, and it did that, so I hadn't done that myself :)
I was assuming to actually remove the IMATH_USE_ORIGINAL_HALF_IMPLEMENTATION - I was only doing that for performance testing
I have been playing a lot with the code, and have the half to float non-table implementation running faster than what is pushed, and in fact faster than the partial table

will write more when I get home and can look at the changes you all suggest in more depth

cary-ilm · 2021-05-17T03:13:34Z

Take my comments as suggestions,

cary-ilm · 2021-05-17T03:14:01Z

sorry, hit the wrong button.

meshula · 2021-05-17T04:52:35Z

I wonder if there's some tweaks required for the clang-format file? https://clang.llvm.org/docs/ClangFormatStyleOptions.html I couldn't find anything specifically about formatting macros besides some alignment rules, unfortunately. Maybe those blocks need to be wrapped with a "no format"...

Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>

meshula

looking good, I'm jazzed for this new implementation!

src/Imath/half.h

src/ImathTest/CMakeLists.txt

src/ImathTest/half_perf_test.cpp

kdt3rd · 2021-05-20T08:16:31Z

Bah, important safety tip - accepting the suggestions and adding that as a commit does not add a DCO signature, just a co-author comment :(

Good suggestions for better comment descriptions Co-authored-by: Nick Porcino <meshula@hotmail.com> Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>

meshula

lgtm!

kdt3rd added 6 commits May 9, 2021 21:28

Fix flag which was set to the wrong default

2320265

Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>

Keep extern "C" from adding an extra indent

0d3d037

Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>

Make half perf test compile for win32, fix nan handling in funky bit …

b0b6a8c

…patterns Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>

Fix includes, types for MacOS

d42afe3

Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>

re-add missing exception spec for windows, indent, use generic function

51b3d83

Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>

Fix missing brace, accidental flag checkin

a50311d

Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>

lgritz reviewed May 9, 2021

View reviewed changes

meshula approved these changes May 10, 2021

View reviewed changes

Merge branch 'master' into enable_c_half_optimize

1253a33