Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cl_half.h header #60

Merged
merged 10 commits into from
May 12, 2020
Merged

Add cl_half.h header #60

merged 10 commits into from
May 12, 2020

Conversation

jrprice
Copy link
Contributor

@jrprice jrprice commented Dec 23, 2019

This is going to address #57 when complete. I'm posting it now to get early feedback on the approach before I go on to do the half->float direction and FP64.

This introduces a function to convert from FP32 to FP16. I've tested this by brute-forcing all FP32
values for all four rounding modes, and comparing against the functions in the CTS. I've also run the half suite from CTS using this function instead.

I've ended up essentially rewriting the function from scratch rather than copying the CTS versions, which is why I wanted to check with the group before pushing forwards. This new version does not depend on the hex float macros used in CTS, or on math.h. Also, while testing this I discovered that the versions in CTS do not always work correctly if certain compiler optimizations are enabled (due to the use of floating point arithmetic). This new version solely uses bitwise integer operations, so does not have this issue.

I've also opted to merge all the rounding modes into a single function, to avoid duplicating much of the code as is done in CTS. There's an enum to select the rounding mode. We could provide separate suffixed functions as well if that's desirable (e.g. cl_float_to_half_rte()).

I do not have any strong feelings about the function names, so feel free to suggest something different if desired.

@jrprice jrprice changed the title Add cl_half.h header [WIP] Add cl_half.h header Dec 23, 2019
@jrprice jrprice changed the title [WIP] Add cl_half.h header Add cl_half.h header Jan 2, 2020
@jrprice jrprice requested a review from kpet January 2, 2020 22:46
@jrprice
Copy link
Contributor Author

jrprice commented Jan 2, 2020

This now has the following three conversion routines:

  • FP32 -> FP16
  • FP16 -> FP32
  • FP64 -> FP16

I've tested all three in the half CTS suite on an NVIDIA GPU. I've also brute-forced all inputs for the first two, comparing bitwise to the same routines from CTS.

Copy link
Contributor

@kpet kpet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've finally had a chance to look at this. This is great work!

I've also opted to merge all the rounding modes into a single function

I think this is better. The only potential issue would be that the code might be a bit slower but since the function is going to be inlined and the rounding mode known at compilation time in the vast majority of cases, I don't think this is going to be a problem in practice.

I do not have any strong feelings about the function names,

FWIW, I think they're good.

There is quite a bit of code duplication between the float and double routines but there isn't a single obvious way of resolving this that I've been able to find. A few ideas that might be worth exploring:

  • Introduce a common function that would take all the CL_{DBL,FLT}_ values as parameters. You'd need to either use 64-bit arithmetic in the float case or maybe come up with some way of moving to 32-bit arithmetic when possible.
  • Introduce a few utility functions for common blocks of code (thinking of the overflow checks for example).

I'm not sure either of these ideas would make the code really better and I don't think it's strictly required to spend time on this for this change to go in.

CL/cl_half.h Show resolved Hide resolved
CL/cl_half.h Outdated Show resolved Hide resolved
CL/cl_half.h Outdated Show resolved Hide resolved
CL/cl_half.h Outdated Show resolved Hide resolved
CL/cl_half.h Outdated Show resolved Hide resolved
CL/cl_half.h Outdated Show resolved Hide resolved
CL/cl_half.h Show resolved Hide resolved
Copy link
Contributor Author

@jrprice jrprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! I agree with the comments about performance - I don't think it's a big issue due to inlining as you mention. I also benchmarked the first drop of this code against the routines in CTS and it was faster, FWIW. I don't think performance is a particular priority for this header however (IMO).

I've pulled out some common routines for handling overflow and underflow. I agree that there's still a bunch of similar code between the FP32 and FP64 routines which could potentially be addressed, but I've left that for future work if somebody thinks it's important.

Copy link
Contributor

@bashbaug bashbaug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments are mostly cosmetic. I don't think any should necessarily hold up merging the changes, but given that this is the first "header-only utility library" we're considering, I think they are worth considering.

CL/cl_half.h Show resolved Hide resolved
CL/cl_half.h Show resolved Hide resolved
CL/cl_half.h Show resolved Hide resolved
@jrprice
Copy link
Contributor Author

jrprice commented Jan 9, 2020

The comment about Windows reminds me to mention that I have not tested this on Windows yet. It'd be great if somebody with a Windows platform is able to do this before we merge this.

@bashbaug
Copy link
Contributor

bashbaug commented Jan 9, 2020

I have not tested this on Windows yet.

Windows compiled but generated some warnings (tested VS2019):

CL/cl_half.h(266,59): warning C4293:  '<<': shift count negative or too big, undefined behavior
CL/cl_half.h(272,18): warning C4244:  'initializing': conversion from 'int64_t' to 'uint16_t', possible loss of data
CL/cl_half.h(280,23): warning C4244:  'initializing': conversion from 'uint64_t' to 'uint16_t', possible loss of data
CL/cl_half.h(316,43): warning C4293:  '<<': shift count negative or too big, undefined behavior
CL/cl_half.h(319,20): warning C4244:  '=': conversion from 'int64_t' to 'uint32_t', possible loss of data
CL/cl_half.h(328,19): warning C4244:  'initializing': conversion from 'uint64_t' to 'uint16_t', possible loss of data
CL/cl_half.h(331,42): warning C4334:  '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)

I believe the shift warnings are because unsigned long is still 32-bits on Windows, and can be fixed by replacing 1ul with (uint64_t)1. The remainder can be fixed with explicit casts.

CL/cl_half.h Outdated Show resolved Hide resolved
@kpet
Copy link
Contributor

kpet commented Jan 10, 2020

I also benchmarked the first drop of this code against the routines in CTS and it was faster, FWIW. I don't think performance is a particular priority for this header however (IMO).

That's great to hear! Agree performance is not a primary concern.

I've pulled out some common routines for handling overflow and underflow.

Thanks! Looks good.

On naming generally:

  • Having said that I liked the function names, I'm now wondering whether we shouldn't prefix all of them with cl_half (or whatever we agree should be the prefix): cl_half_to_float, cl_half_from_double and cl_half_from_float.
  • @bashbaug I understand the concern but I really think this is functionality that we should ultimately provide in the cl namespace. I think what I'm saying is that I'd rather we document whatever additions we make (a spec appendix?) rather than move them to a separate namespace. One of the things I had in mind for future changes was to provide a better definition for cl_half for C++ applications where conversions and operations do the right thing automatically. If it looks like the discussions wouldn't converge reasonably quickly though, it might be better to just change the namespace for now (agree clu is reasonable) so we get forward progress. On the case, I think using snake case for these is a plus as it makes it clear these are not API functions.

@bashbaug
Copy link
Contributor

what I'm saying is that I'd rather we document whatever additions we make (a spec appendix?) rather than move them to a separate namespace.

This seems like a reasonable solution too, if we would prefer to keep the cl prefix.

@jrprice
Copy link
Contributor Author

jrprice commented Jan 13, 2020

I've fixed the Windows warnings (thanks @bashbaug!), and renamed the functions to cl_half_from_float() as suggested by @kpet .

I've left things in the cl namespace and using snake-case for now until we have consensus on the desired convention for these functions. I'm leaning (slightly) away from snake-case now since it makes them look a bit like the type definitions we have in the API. If we document them in a spec appendix, I'd be happy with clHalfToFloat(...) and clHalfFromFloat(...).

@bashbaug
Copy link
Contributor

Thanks - the latest changes work for me, and I confirm that the Windows warnings are fixed.

Copy link
Contributor

@kpet kpet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kpet
Copy link
Contributor

kpet commented Mar 20, 2020

We've got two approvals from two different implementers. This can't break existing functionatlity. This has been discussed several times in the working group; nobody opposed the change. I'm suggesting we just merge this, any objections?

@bashbaug
Copy link
Contributor

any objections?

No objections, but do note that this file has the old license header and will need to be updated (see #76).

@kpet
Copy link
Contributor

kpet commented Apr 1, 2020

@jrprice With updated license headers, this could go in straight away ;).

Copy link
Contributor

@Jeremy-Kemp Jeremy-Kemp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Late to the party, but this looks to be a very useful addition to the headers. Thanks for doing this.

@kpet
Copy link
Contributor

kpet commented Apr 28, 2020

@jrprice Ping! This just needs a change of license headers and then we can pop the champagne :).

@jrprice
Copy link
Contributor Author

jrprice commented Apr 28, 2020

Sorry for the radio silence. The reason I've been holding off on doing this is because I'm still not clear that this header is best suited for this repository as opposed to the utility library in the OpenCL SDK, and we haven't had a chance to discuss this yet. Now that the fact we're doing an SDK is public maybe we can discuss this here.

So, given that this is a header providing utility functions, shouldn't it be in the SDK's utility library instead? I'm not sure I understand @kpet's previous point that having these here means we could have automatic conversion for C++ - you'd still need to include an extra header (i.e. not the same one that provides the cl_half definition), so surely that's no different from providing the functionality in the SDK instead?

@kpet
Copy link
Contributor

kpet commented Apr 28, 2020

I had a suspicion that was the case :). The argument was that we could with further work do that provided the conversion routines live in the headers repo. There are a number of approaches that come to mind:

  • Include cl_half.h from cl_platform.h, always or only for C++. Doesn't sound quite right but can be made fully backwards compatible.
  • Remove the existing definition in cl_platform.h and include cl_half.h from cl.h. Might break some code that includes cl_platform.h directly.
  • Any other?

If we're giving up on trying to do that, then I agree that these routines are definitely better placed in the SDK. The question of the type remains though: would we want to provide a better definition for cl_half in the SDK and make the one in headers opt-out? Would we give up altogether on trying to make the right conversions automatic?

@CLAassistant
Copy link

CLAassistant commented Apr 29, 2020

CLA assistant check
All committers have signed the CLA.

@jrprice
Copy link
Contributor Author

jrprice commented Apr 29, 2020

The part that hadn't clicked for me was that a different definition of cl_half would be needed to provide these conversion functions, which is why it would need to be done here. I have no further objections :-)

I've updated the license and also added some test coverage (not at all comprehensive, but better than nothing).

Copy link
Contributor

@kpet kpet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the tests! Looks good for a first set but I'm guessing we'll want to increase coverage over time.

@kpet
Copy link
Contributor

kpet commented May 12, 2020

There's been agreement on this PR for a while, we now have tests and correct license headers. Merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants