-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow CeedScalar to be single precision #788
Conversation
I started working on the modifying the Julia interface to allow single precision. The work is in this branch. For the most part I think it should be relatively straightforward. It would be useful if there was a runtime function that returned e.g. 32 or 64 depending on the value of |
Thanks @pazner! I'm not sure I understand exactly how the Julia interface works. You mean the user could supply a compiled version of libCEED from a directory other than the one they're using the Julia package from? There was some discussion on #784 about leaving things open to potentially more "exotic" types in the future, which may not be entirely defined by size. Would this be possible with the Julia interface? I think the idea was that it would be better to know the actual type rather than just its size. |
Yeah, the idea would be that the Julia package comes pre-installed with a "basic version" of libCEED that is precompiled. This basic version does not include CUDA or HIP support, is compiled with the "generic" compiler flags, would use double precision for CeedScalar, etc. If the user wants to use more advanced features (GPU support, architecture-specific compiler flags, single precision, etc.), they would have to compile their own version of libCEED from source, and tell LibCEED.jl to use that library instead of its preinstalled basic version.
That's a good point. Depending on the type, that should be fine. Any interface would be fine as far as I'm concerned, it could even just return a I just think it would be good to expose this information as a runtime function. If the user provides their own version of libCEED that uses e.g. float instead of double, then the Julia interface needs to be aware of this. |
I added a draft of the code to add to the GitLab CI YAML above. I think running the core tests in f32 with FC= as a separate CI job on Noether should be fine. |
But what about the separate compilation and change to |
Each CI job has its own working directory and I think we only do one CI job at a time (could increase number of slots, but this could be an issue for GPUs). |
Is the working directory just for building/running the tests, or does it temporarily copy the source as well (since we'd need to edit the header in the source)? Though I guess it wouldn't matter as long as the jobs run in serial. |
We'd just use |
I tried adding a float job based on @jeremylt's suggestions above, but I see it's stuck: "This job is stuck because you don't have any active runners online or available with any of these tags assigned to them: float." When I try to click the link it provides about CI settings, I get a 404 error -- maybe because I don't have admin rights? |
Let's just drop the |
Co-authored-by: Jeremy L Thompson <jeremy@jeremylt.org>
bfbfd12
to
08af2bf
Compare
Never mind, I guess I'm just impatient. :) |
Looks like the float job passed! (Assuming it truly ran in single precision, that is. The sed commands show up on the output, so it should have worked.) I think I'm going to officially remove the WIP tag from this PR -- in my opinion, we should be ready for review with the intent of squash-merging. |
Are there plans to also run the tests for the various language bindings in single precision mode? Or just the C test suite? |
It would be nice to have single precision versions of Python, Rust, and Julia CI. These should be easy to add as separate jobs in those GitHub actions. I could do that Tuesday, possibly in a separate PR if we're eager to merge this one. |
As this is experimental and evolving, I'm fine merging it without every-language CI. |
@nbeams Would you like to selectively squash commits (e.g., you have some WIP commit messages) or squash-merge the whole thing? |
With so many commits, I was worried that getting a nice rebase might be quite the headache, but I'll give it a go and see if I can clean it up a bit. |
Cool. One (somewhat extreme) approach is to squash the entire history into one, then amend ( |
Well, I spent almost an hour trying to come up with a nice history through rebase, and I'm not sure it's that much better (and I once again appear to have messed up the Julia conflict resolution process during the rebase, because the final result has some differences from the branch as it is here, so I definitely won't be pushing that version). Personally, I'm okay with squash-merging the whole thing and replacing the default squash-merge commit message with a descriptive message explaining the major changes (addition of the new headers, addition of new enum, changes to the tests to use epsilon, and various changes to the other language bindings to get CeedScalar information). There are a lot of files changed, but most everything fits within those categories, I think? However, I can try again if there is a strong preference for keeping separate commits. I don't think I understand your suggestion @jedbrown, how to pick out individual commits after a squash -- if you want me to try this, do you have a link to an explanation of this somewhere? There were some minor bugfixes included in this branch (notably changing the location of |
I'll just go ahead and merge. We can screenshare demo on a smaller PR sometime. Basically, you run |
Congrats @nbeams on this big lift. 🚀 And thanks to @pazner @YohannDudouit @jeremylt for the discussion and contributions. |
Yes, a big thank you to everyone who contributed to this PR! A team effort, for sure. 🎉 |
@@ -28,15 +28,15 @@ static const char *atomicAdd = QUOTE( | |||
//------------------------------------------------------------------------------ | |||
// Atomic add, for older CUDA | |||
//------------------------------------------------------------------------------ | |||
__device__ double atomicAdd(double *address, double val) { | |||
__device__ CeedScalar atomicAdd(CeedScalar *address, CeedScalar val) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one should stay double
or simply be removed with float
, atomicAdd
for float
is always defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in CUDA-8 and later? We don't test ancient versions and I think it doesn't hurt us to just not support versions less than 8.
This PR would expand the use of
CeedScalar
to be double or single precision. See also discussion in #784.Summary of main changes thus far:
ceed-f32.h
andceed-f64.h
. To change between the types, you change which file is included inceed.h
. These files contain different values forCEED_EPSILON
andCEED_ALIGN
.CeedScalarType
enum andCEED_SCALAR_TYPE
definition. In libCEED C/C++ code,CEED_SCALAR_TYPE
can be used to check for type-dependent code (e.g. whether to use the AVX basis functions, several places in the MAGMA backend, CUDA/HIP vector norms, etc.).dtype="float64"
now havedtype=scalar_types[lib.CEED_SCALAR_TYPE]
, wherescalar_types
is a Python dictionary with type strings.CeedQFunctionContextGetContextSize
to be a public/user function -- this was actually done because of the Python bindings. It was used inget_data
(not sure why it wasn't causing errors? It did when I was running tests directly in ipython) but not available throughlibceed.lib
, and not having the correct size (in bytes) was causing failures in one of the Python QFunction tests when using single precision. (This was fixed by 9c4a53c)Outstanding issues/todos:
/ceed
have been updated already)Closes #784.