Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Imath docs for 3.1 #178

Merged
merged 3 commits into from
Jul 8, 2021

Conversation

cary-ilm
Copy link
Member

@cary-ilm cary-ilm commented Jul 6, 2021

  • Improve wording in README.md; mention C half, and slack channel.
  • INSTALL.md:
    • IMATH_HALF_NO_TABLES_AT_ALL
    • Change cmake to CMake where appropriate
    • Update example cmake toolchain file from VFX ref platform 2015 -> 2021
  • Fix formatting of code examples in doxygen comments in various headers.
  • Add reference to operator<< for the classes that have one.
  • Euler::operator<< was incorrectly inside a @Doxygen_Suppress block
  • Rearrange introductory doxygen @file comment in half.h
  • Add new half sections to rst/readthedocs docs, include new
    C-language conversion functions.

Docs are previewable at: https://cary-ilm-imath.readthedocs.io/en/latest/

Signed-off-by: Cary Phillips cary@ilm.com

* Improve wording in README.md; mention C half, and slack channel.
* INSTALL.md:
  - IMATH_HALF_NO_TABLES_AT_ALL
  - Change cmake to CMake where appropriate
  - Update example cmake toolchain file from VFX ref platform 2015 -> 2021
* Fix formatting of code examples in doxygen comments in various headers.
* Add reference to operator<< for the classes that have one.
* Euler::operator<< was incorrectly inside a @Doxygen_Suppress block
* Rearrange introductory doxygen @file comment in half.h
* Add new half sections to rst/readthedocs docs, include new
  C-language conversion functions.

Signed-off-by: Cary Phillips <cary@ilm.com>
Copy link
Contributor

@meshula meshula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. slack URL works for me.

* Better description of table-related compile options
* Fix formatting of doxygen comments for half limits

Signed-off-by: Cary Phillips <cary@ilm.com>
Copy link
Contributor

@meshula meshula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to work out the details for msvc f16c, it's not obvious how to set it up...

Comment on lines 16 to 19
To use the F16C instruction set on an architecture that supports it,
simply provide the appropriate compiler flags. For gcc/g++, for example:

% cmake -dCMAKE_CXX_FLAGS="-m16fc"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should lay out explicitly at least gcc/clang/msvc. The first two are easy, however, I'm not sure 'simply' is the right word -

Suggested change
To use the F16C instruction set on an architecture that supports it,
simply provide the appropriate compiler flags. For gcc/g++, for example:
% cmake -dCMAKE_CXX_FLAGS="-m16fc"
To use the F16C instruction set on an architecture that supports it,
provide the appropriate compiler flags.
- gcc/g++/ clang:
% cmake -dCMAKE_CXX_FLAGS="-m16fc"
- msvc:
????

@kdt3rd I'm puzzling through the MSVC docs and blogs (e.g. https://walbourn.github.io/directxmath-f16c-and-fma/) Do you know if there's a corresponding /A setting or something for msvc?

Copy link
Contributor

@kdt3rd kdt3rd Jul 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's an interesting question. Technically it's an extension to the core SSE functionality on it's own added after AVX (that's how both Intel and AMD talk about it anyway), but MSVC doesn't seem to do that particularly clearly. Per that blog post, they indicate that MSFT will emit the intrinsic regardless of settings, and it is unclear if they set the __F16C__ #define so maybe it is not as clean as I hoped (I presumed they would have the same *mmintrin.h behavior as others). When you look in the preprocessor checks in that directx math project, it force F16C on when AVX2 is enabled, so certainly /arch:AVX2 would be safe to mention. But I'm not sure the existing preprocessor checks are sufficient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, is setting CMAKE_CXX_FLAGS="-m16fc" indeed the proper advice for g++ and clang?
@kdt3rd, have you been testing on MSVC, and if so, how?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In response to Kimball's comment, I searched a bit. I think clang and gcc in addition to having -m16fc (links below) also automatically enable the intrinsics for architectures that have it (gcc link below, search 16fc) like msvc. I bet, but haven't found a reference, that clang does as well.

clang and gcc have the switch. gcc documents that many of the architecture targets enable f16c automatically.

clang: https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-mf16c
gcc: https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html

maybe it's enough for us to summarize the flag, and to note that it's automatically available when targeting processors that have it?

Copy link
Member Author

@cary-ilm cary-ilm Jul 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying the intrinsics are enabled by default if available? That's not my experience, I'm building on an Unbuntu x86_64 machine and the __F16C__ cpp symbol is only defined when I include "-m16c" on the command line, for both g++ and clang++.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NO, they are not enabled by default, because you don't know what kind of machine you'll be running the code on. But they are enabled by a variety of -mfoo flags (not just -mf16c), if the foo architecture is known to always have f16c capabilities.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was responding to Nick's questions around MSVC - Microsoft generates the instruction regardless and doesn't have an explicit switch or seemingly good tests if it's around. I have not been testing this part with windows other than validating that the code is correct using godbolt, but rather, I have been testing the windows filesystem stuff for EXR using the low level win32 api to do the pread/pwrite equivalents

Copy link
Contributor

@kdt3rd kdt3rd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for helping out with all the doc updates and cleaning up my half-baked comments!

require lookup tables. Performance of both options is generally
float-to-half conversion, using the F16C SSE extension if available,
or otherwise using an optimized bit-manipulation algorithm that does
not require lookup tables. Performance of both options is generally
significantly faster than the lookup-table based conversions that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, this is 100% true for the float->half using eLut. However, for the half->float, the table is still generally faster than the bit manipulation on x86 hardware, which is why it is still the default. The optimized code is provided for scenarios that do not want to link in a table of that size to their application, or for hardware where the table is slower.

float-to-half conversion, using the F16C SSE extension if available,
or otherwise using an optimized bit-manipulation algorithm that does
not require lookup tables. Performance of both options is generally
significantly faster than the lookup-table based conversions that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

///
/// Converting from a float to a half requires some non-trivial bit
/// manipulations. In some cases, this makes conversion relatively
/// slow, but the most common case is accelerated via table lookups.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably change this. I might suggest something like:

Converting from a float to a half requires some non-trivial bit manipulations, and quite commonly has to deal with denormalized numbers, and has to always handle rounding. The rounding is always round to even. If possible on modern hardware, it is suggested to use the hardware intrinsics.

/// original: 5.2 ns / call
/// no exp table + opt: 1.27 ns / call
/// f16c: 0.45 ns / call
/// - original: 5.2 ns / call
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have a header on it to indicate it is for float to half conversion, thanks! (I missed that originally, appreciate you touching all this up)

Signed-off-by: Cary Phillips <cary@ilm.com>
@cary-ilm
Copy link
Member Author

cary-ilm commented Jul 8, 2021

In the process of improving the language of the docs, I've found a better set of names for the options, but it's hard to separate changing the variable names from describing the names in the docs, and it would be better to change the names in a separate PR.

I'm going to merge this PR as is and submit a follow-up with the proposed name changes and associated explanation.

@cary-ilm cary-ilm merged commit 478f8fa into AcademySoftwareFoundation:master Jul 8, 2021
@cary-ilm
Copy link
Member Author

cary-ilm commented Jul 9, 2021 via email

@kthurston
Copy link

kthurston commented Jul 9, 2021

no, I'm saying they don't seem to offer a good ifdef and if you call the intrinsic they emit it, which doesn't seem to be a good thing (where gcc / clang will error unless you add -mf16c).

But it also could be I don't know something about MSVC - I think what you have is good, until someone who knows more can look at it

@lgritz
Copy link
Contributor

lgritz commented Jul 9, 2021

You have to opt in to get non-baseline instruction sets. Those instructions being available on the machine that is compiling doesn't guarantee that they will be available anyplace that the program will be deployed. The executable would crash if run on a computer whose CPU didn't support the instructions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants