-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable C and lighter weight half <-> float conversion #141
Enable C and lighter weight half <-> float conversion #141
Conversation
This enables half to be used in C code. Further, adds a more flexible half, with optimizations for F16C as well as a light weight table improvement for embedded systems. Additionally optimize the two conversion routines, at least for x86, and offer preprocessor control for some behavior. Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>
|
Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>
Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>
…patterns Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>
Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>
Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>
Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
I don't quite see how one, at build time, can determine whether or not f16c is enabled (without editing the cmake files themselves). Are you still editing?
I am not actively editing - thanks for taking a look! There is some "dead" code that can be eliminated once we're happy, although makes the performance tests a bit harder to validate the old behavior. But maybe that's ok. Anyway, it seems appropriate to defer whether F16C was enabled to the downstream user, since we largely want this stuff inlined for performance reasons, so just use the seemingly common #define that is set (F16C). At some point, may want to refactor / split the header file in two if we start getting a bunch of different versions of these two functions for the various hardware (neon, etc) out there. So the function will be adaptive based on what environment it is being compiled in and any specified overrides. I haven't found a good way to manage SSE / neon / whatever flags inside of "modern" cmake, have you? I've seen people doing generator expressions for some things based on gcc / clang / msvc, but that doesn't deal with arm vs x86 easily. |
One comment: I have not been able to find a way to do this f16c switch "automatically" so far, as with function multi-versioning under gcc or similar - those happen after the preprocessor, so you've already lost the other code :( Those constructs only seem to work to do (either) custom architecture functions that you then compile and dispatch to yourself, or auto-vectorization is a thing that works for you... |
OH, of course, I get it. It's all inline, so we can push all the decisions downstream. Perfect. I don't recommend ever doing it automatically. It's just too common to build on a machine that has f16c (say) but deploy on a set of machines that can't be counted on having it. |
I agree, for a while yet anyway - until ivybridge, or whatever architecture, is the minimum spec supported. That's why I have been looking into FMV, which both clang and gcc have, albeit via different mechanisms. All of clang/gcc/icc/msvc have multiple with different function targets (i.e. functionA is compiled w/ f16c, functionB is not) in the same .c file, but the common denominator there involves you making the dispatch routine yourself. However, as per what I wrote last night, this is fine if you are typing the sse/whatever code yourself but unfortunately that mechanism happens after the preprocessor. So in reality, it is more convenient if you do this via multiple translation units (i.e. functionA.c and functionB.c and compile the entire file with different arguments). The CPU detection and dispatch code is easy. I was planning on doing some form of the above in the EXR library in the (new) revamp of utilities for converting buffers (i.e. where you might use cuda, f16c, etc). But I have not found a good way of managing the flags for doing that via cmake other than generator expressions and hard coded compile flags for each compiler, have you seen any good means of doing this lying around? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good to me. Reasonable defaults throughout, I don't spot any problems. I am amused by the decisiveness of the naming of IMATH_HALF_NO_TABLES_AT_ALL
@kdt3rd Here's the best I've come up with: https://github.com/OpenImageIO/oiio/blob/master/src/cmake/compiler.cmake#L241 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a suggestion that we use the new suggestion feature :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I left several formatting/terminology comments, but the substance is fine.
fyi, couple of things:
will write more when I get home and can look at the changes you all suggest in more depth |
Take my comments as suggestions, |
sorry, hit the wrong button. |
I wonder if there's some tweaks required for the clang-format file? https://clang.llvm.org/docs/ClangFormatStyleOptions.html I couldn't find anything specifically about formatting macros besides some alignment rules, unfortunately. Maybe those blocks need to be wrapped with a "no format"... |
Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good, I'm jazzed for this new implementation!
Bah, important safety tip - accepting the suggestions and adding that as a commit does not add a DCO signature, just a co-author comment :( |
4240cd7
to
3a723df
Compare
Good suggestions for better comment descriptions Co-authored-by: Nick Porcino <meshula@hotmail.com> Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
This enables half to be used in C code.
Further, adds a more flexible half, with optimizations for F16C as well
as a light weight table improvement for embedded systems. Additionally
optimize the two conversion routines, at least for x86, and offer
preprocessor control for some behavior.
Signed-off-by: Kimball Thurston kdt3rd@gmail.com