-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Imath docs for 3.1 #178
Update Imath docs for 3.1 #178
Conversation
* Improve wording in README.md; mention C half, and slack channel. * INSTALL.md: - IMATH_HALF_NO_TABLES_AT_ALL - Change cmake to CMake where appropriate - Update example cmake toolchain file from VFX ref platform 2015 -> 2021 * Fix formatting of code examples in doxygen comments in various headers. * Add reference to operator<< for the classes that have one. * Euler::operator<< was incorrectly inside a @Doxygen_Suppress block * Rearrange introductory doxygen @file comment in half.h * Add new half sections to rst/readthedocs docs, include new C-language conversion functions. Signed-off-by: Cary Phillips <cary@ilm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. slack URL works for me.
* Better description of table-related compile options * Fix formatting of doxygen comments for half limits Signed-off-by: Cary Phillips <cary@ilm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to work out the details for msvc f16c, it's not obvious how to set it up...
docs/half_conversion.rst
Outdated
To use the F16C instruction set on an architecture that supports it, | ||
simply provide the appropriate compiler flags. For gcc/g++, for example: | ||
|
||
% cmake -dCMAKE_CXX_FLAGS="-m16fc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should lay out explicitly at least gcc/clang/msvc. The first two are easy, however, I'm not sure 'simply' is the right word -
To use the F16C instruction set on an architecture that supports it, | |
simply provide the appropriate compiler flags. For gcc/g++, for example: | |
% cmake -dCMAKE_CXX_FLAGS="-m16fc" | |
To use the F16C instruction set on an architecture that supports it, | |
provide the appropriate compiler flags. | |
- gcc/g++/ clang: | |
% cmake -dCMAKE_CXX_FLAGS="-m16fc" | |
- msvc: | |
???? |
@kdt3rd I'm puzzling through the MSVC docs and blogs (e.g. https://walbourn.github.io/directxmath-f16c-and-fma/) Do you know if there's a corresponding /A setting or something for msvc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's an interesting question. Technically it's an extension to the core SSE functionality on it's own added after AVX (that's how both Intel and AMD talk about it anyway), but MSVC doesn't seem to do that particularly clearly. Per that blog post, they indicate that MSFT will emit the intrinsic regardless of settings, and it is unclear if they set the __F16C__
#define so maybe it is not as clean as I hoped (I presumed they would have the same *mmintrin.h behavior as others). When you look in the preprocessor checks in that directx math project, it force F16C on when AVX2 is enabled, so certainly /arch:AVX2 would be safe to mention. But I'm not sure the existing preprocessor checks are sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, is setting CMAKE_CXX_FLAGS="-m16fc" indeed the proper advice for g++ and clang?
@kdt3rd, have you been testing on MSVC, and if so, how?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In response to Kimball's comment, I searched a bit. I think clang and gcc in addition to having -m16fc (links below) also automatically enable the intrinsics for architectures that have it (gcc link below, search 16fc) like msvc. I bet, but haven't found a reference, that clang does as well.
clang and gcc have the switch. gcc documents that many of the architecture targets enable f16c automatically.
clang: https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-mf16c
gcc: https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
maybe it's enough for us to summarize the flag, and to note that it's automatically available when targeting processors that have it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying the intrinsics are enabled by default if available? That's not my experience, I'm building on an Unbuntu x86_64 machine and the __F16C__
cpp symbol is only defined when I include "-m16c" on the command line, for both g++ and clang++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NO, they are not enabled by default, because you don't know what kind of machine you'll be running the code on. But they are enabled by a variety of -mfoo flags (not just -mf16c), if the foo architecture is known to always have f16c capabilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was responding to Nick's questions around MSVC - Microsoft generates the instruction regardless and doesn't have an explicit switch or seemingly good tests if it's around. I have not been testing this part with windows other than validating that the code is correct using godbolt, but rather, I have been testing the windows filesystem stuff for EXR using the low level win32 api to do the pread/pwrite equivalents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for helping out with all the doc updates and cleaning up my half-baked comments!
require lookup tables. Performance of both options is generally | ||
float-to-half conversion, using the F16C SSE extension if available, | ||
or otherwise using an optimized bit-manipulation algorithm that does | ||
not require lookup tables. Performance of both options is generally | ||
significantly faster than the lookup-table based conversions that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, this is 100% true for the float->half using eLut. However, for the half->float, the table is still generally faster than the bit manipulation on x86 hardware, which is why it is still the default. The optimized code is provided for scenarios that do not want to link in a table of that size to their application, or for hardware where the table is slower.
float-to-half conversion, using the F16C SSE extension if available, | ||
or otherwise using an optimized bit-manipulation algorithm that does | ||
not require lookup tables. Performance of both options is generally | ||
significantly faster than the lookup-table based conversions that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above
/// | ||
/// Converting from a float to a half requires some non-trivial bit | ||
/// manipulations. In some cases, this makes conversion relatively | ||
/// slow, but the most common case is accelerated via table lookups. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably change this. I might suggest something like:
Converting from a float to a half requires some non-trivial bit manipulations, and quite commonly has to deal with denormalized numbers, and has to always handle rounding. The rounding is always round to even. If possible on modern hardware, it is suggested to use the hardware intrinsics.
/// original: 5.2 ns / call | ||
/// no exp table + opt: 1.27 ns / call | ||
/// f16c: 0.45 ns / call | ||
/// - original: 5.2 ns / call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have a header on it to indicate it is for float to half conversion, thanks! (I missed that originally, appreciate you touching all this up)
Signed-off-by: Cary Phillips <cary@ilm.com>
In the process of improving the language of the docs, I've found a better set of names for the options, but it's hard to separate changing the variable names from describing the names in the docs, and it would be better to change the names in a separate PR. I'm going to merge this PR as is and submit a follow-up with the proposed name changes and associated explanation. |
Are you saying MSVC uses the instructions if they’re available, without the
need for special flags? Should the doc say just that?
I’m fine leaving the doc as is for now, mentioning g++ and clang++ but
saying nothing about MSVC until someone with more knowledge explains it
more precisely.
…--
Cary Phillips | R&D Supervisor | ILM | San Francisco
|
no, I'm saying they don't seem to offer a good ifdef and if you call the intrinsic they emit it, which doesn't seem to be a good thing (where gcc / clang will error unless you add -mf16c). But it also could be I don't know something about MSVC - I think what you have is good, until someone who knows more can look at it |
You have to opt in to get non-baseline instruction sets. Those instructions being available on the machine that is compiling doesn't guarantee that they will be available anyplace that the program will be deployed. The executable would crash if run on a computer whose CPU didn't support the instructions. |
C-language conversion functions.
Docs are previewable at: https://cary-ilm-imath.readthedocs.io/en/latest/
Signed-off-by: Cary Phillips cary@ilm.com