Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow CeedScalar to be single precision #788

Merged
merged 65 commits into from
Sep 2, 2021
Merged

Conversation

nbeams
Copy link
Contributor

@nbeams nbeams commented Jul 7, 2021

This PR would expand the use of CeedScalar to be double or single precision. See also discussion in #784.

Summary of main changes thus far:

  • Introduction of two different headers, ceed-f32.h and ceed-f64.h. To change between the types, you change which file is included in ceed.h. These files contain different values for CEED_EPSILON and CEED_ALIGN.
  • Addition of CeedScalarType enum and CEED_SCALAR_TYPE definition. In libCEED C/C++ code, CEED_SCALAR_TYPE can be used to check for type-dependent code (e.g. whether to use the AVX basis functions, several places in the MAGMA backend, CUDA/HIP vector norms, etc.).
  • Updating Python bindings: Most notably, places that used to have dtype="float64" now have dtype=scalar_types[lib.CEED_SCALAR_TYPE], where scalar_types is a Python dictionary with type strings.
  • Moved CeedQFunctionContextGetContextSize to be a public/user function -- this was actually done because of the Python bindings. It was used in get_data (not sure why it wasn't causing errors? It did when I was running tests directly in ipython) but not available through libceed.lib, and not having the correct size (in bytes) was causing failures in one of the Python QFunction tests when using single precision. (This was fixed by 9c4a53c)

Outstanding issues/todos:

  • Test Python bindings with CUDA -- made some adjustments based on changes required for fix QFunctionContext-related failures mentioned above, but haven't tested yet
  • Possibly add a macro to avoid compiling MAGMA DGEMM non-tensor basis code when CeedScalar is float (currently gives compiler warnings, but code should never be executed), or add full SGEMM support.
  • Fix/decide what to do about the following tests: t312, t314, t315, t316, t321, t322, t323 in the basis tests, and t501, t502, t503, t505, t506, t507, t511, t521, t524, t530, t550, t551, t552, t553 in the operator tests. These tests have a tolerance such that adjusting them directly for float as was done for other tests in 2c97b84 would result in a very large value for the tolerance (probably more than we want).
  • Fix/decide what to do about the tests that compare directly to output files: t300, t301, t302, t304, t305, t320; and t402. (Also affects the Python versions of these tests)
  • Update other examples as needed to work with float (the examples in /ceed have been updated already)
  • Fix Fortran tests for float
  • Update Rust bindings
  • Update Julia bindings

Closes #784.

@pazner
Copy link
Member

pazner commented Jul 8, 2021

I started working on the modifying the Julia interface to allow single precision. The work is in this branch.

For the most part I think it should be relatively straightforward.

It would be useful if there was a runtime function that returned e.g. 32 or 64 depending on the value of CEED_SCALAR_TYPE. That would allow the Julia interface to detect whether or not it was misconfigured. Since the user can supply their own version of the library, we cannot know a priori whether CeedScalar is 32 or 64 bits.

@nbeams
Copy link
Contributor Author

nbeams commented Jul 8, 2021

Thanks @pazner!

I'm not sure I understand exactly how the Julia interface works. You mean the user could supply a compiled version of libCEED from a directory other than the one they're using the Julia package from?

There was some discussion on #784 about leaving things open to potentially more "exotic" types in the future, which may not be entirely defined by size. Would this be possible with the Julia interface? I think the idea was that it would be better to know the actual type rather than just its size.

@pazner
Copy link
Member

pazner commented Jul 8, 2021

I'm not sure I understand exactly how the Julia interface works. You mean the user could supply a compiled version of libCEED from a directory other than the one they're using the Julia package from?

Yeah, the idea would be that the Julia package comes pre-installed with a "basic version" of libCEED that is precompiled. This basic version does not include CUDA or HIP support, is compiled with the "generic" compiler flags, would use double precision for CeedScalar, etc.

If the user wants to use more advanced features (GPU support, architecture-specific compiler flags, single precision, etc.), they would have to compile their own version of libCEED from source, and tell LibCEED.jl to use that library instead of its preinstalled basic version.

There was some discussion on #784 about leaving things open to potentially more "exotic" types in the future, which may not be entirely defined by size. Would this be possible with the Julia interface? I think the idea was that it would be better to know the actual type rather than just its size.

That's a good point. Depending on the type, that should be fine. Any interface would be fine as far as I'm concerned, it could even just return a CeedScalarType enum value.

I just think it would be good to expose this information as a runtime function. If the user provides their own version of libCEED that uses e.g. float instead of double, then the Julia interface needs to be aware of this.

@jeremylt
Copy link
Member

I added a draft of the code to add to the GitLab CI YAML above. I think running the core tests in f32 with FC= as a separate CI job on Noether should be fine.

@nbeams
Copy link
Contributor Author

nbeams commented Aug 31, 2021

But what about the separate compilation and change to ceed.h? Can we ensure this job will not run in parallel with any of the other CI jobs on Noether?

@jedbrown
Copy link
Member

Each CI job has its own working directory and I think we only do one CI job at a time (could increase number of slots, but this could be an issue for GPUs).

@nbeams
Copy link
Contributor Author

nbeams commented Aug 31, 2021

Is the working directory just for building/running the tests, or does it temporarily copy the source as well (since we'd need to edit the header in the source)? Though I guess it wouldn't matter as long as the jobs run in serial.

@jedbrown
Copy link
Member

We'd just use sed to edit the header. gitlab-runner is responsible for cleaning the repo before the next job runs.

@nbeams
Copy link
Contributor Author

nbeams commented Sep 1, 2021

I tried adding a float job based on @jeremylt's suggestions above, but I see it's stuck: "This job is stuck because you don't have any active runners online or available with any of these tags assigned to them: float." When I try to click the link it provides about CI settings, I get a 404 error -- maybe because I don't have admin rights?

@nbeams nbeams marked this pull request as ready for review September 1, 2021 21:52
@jedbrown
Copy link
Member

jedbrown commented Sep 1, 2021

Let's just drop the float label for now. I can add the runner slot, but we'll want to avoid overloading the GPU. I probably need to configure 1-node SLURM or make a different solution to moderate over-subscription. (CPUs have the lovely property that oversubscription just makes them slow, but GPUs crash.)

Co-authored-by: Jeremy L Thompson <jeremy@jeremylt.org>
@nbeams nbeams force-pushed the natalie/ceedscalar-generalize branch from bfbfd12 to 08af2bf Compare September 1, 2021 22:22
@nbeams
Copy link
Contributor Author

nbeams commented Sep 1, 2021

Since it's "stuck," does it have to be manually stopped or something? I'm not seeing GitLab showing up after my force push.

Never mind, I guess I'm just impatient. :)

@nbeams
Copy link
Contributor Author

nbeams commented Sep 1, 2021

Looks like the float job passed! (Assuming it truly ran in single precision, that is. The sed commands show up on the output, so it should have worked.)

I think I'm going to officially remove the WIP tag from this PR -- in my opinion, we should be ready for review with the intent of squash-merging.

@nbeams nbeams added 1-In Review and removed 0-WIP labels Sep 1, 2021
@pazner
Copy link
Member

pazner commented Sep 1, 2021

Are there plans to also run the tests for the various language bindings in single precision mode? Or just the C test suite?

@jeremylt
Copy link
Member

jeremylt commented Sep 1, 2021

It would be nice to have single precision versions of Python, Rust, and Julia CI. These should be easy to add as separate jobs in those GitHub actions. I could do that Tuesday, possibly in a separate PR if we're eager to merge this one.

@jedbrown
Copy link
Member

jedbrown commented Sep 2, 2021

As this is experimental and evolving, I'm fine merging it without every-language CI.

@jedbrown
Copy link
Member

jedbrown commented Sep 2, 2021

@nbeams Would you like to selectively squash commits (e.g., you have some WIP commit messages) or squash-merge the whole thing?

@nbeams
Copy link
Contributor Author

nbeams commented Sep 2, 2021

With so many commits, I was worried that getting a nice rebase might be quite the headache, but I'll give it a go and see if I can clean it up a bit.

@jedbrown
Copy link
Member

jedbrown commented Sep 2, 2021

Cool. One (somewhat extreme) approach is to squash the entire history into one, then amend (git citool and some others like Emacs Magit can help) to split out some parts that deserve to be separate. I'm also okay with the whole thing being one commit if you think that makes sense.

@nbeams
Copy link
Contributor Author

nbeams commented Sep 2, 2021

Well, I spent almost an hour trying to come up with a nice history through rebase, and I'm not sure it's that much better (and I once again appear to have messed up the Julia conflict resolution process during the rebase, because the final result has some differences from the branch as it is here, so I definitely won't be pushing that version).

Personally, I'm okay with squash-merging the whole thing and replacing the default squash-merge commit message with a descriptive message explaining the major changes (addition of the new headers, addition of new enum, changes to the tests to use epsilon, and various changes to the other language bindings to get CeedScalar information). There are a lot of files changed, but most everything fits within those categories, I think? However, I can try again if there is a strong preference for keeping separate commits. I don't think I understand your suggestion @jedbrown, how to pick out individual commits after a squash -- if you want me to try this, do you have a link to an explanation of this somewhere?

There were some minor bugfixes included in this branch (notably changing the location of CeedQFunctionContextGetContextSize). @jeremylt mentioned that I might want to do a PR with just those changes that points to release rather than main -- are we starting this new workflow yet? The problem is that changes that are strictly bugfixes are not nicely separated from other changes in the commits, since I wasn't yet thinking in terms of this framework when "dividing up" commits. So we'd still ultimately have some differences of git history between release and this branch/main even without a squash-merge over here (can't just cherry-pick commits to go to release).

@jedbrown
Copy link
Member

jedbrown commented Sep 2, 2021

I'll just go ahead and merge. We can screenshare demo on a smaller PR sometime. Basically, you run git citool, which brings up a GUI and click "Amend", which allows you to deselect files/hunks/lines from the last commit, commit, then add new commits.

@jedbrown jedbrown merged commit 80a9ef0 into main Sep 2, 2021
@jedbrown jedbrown deleted the natalie/ceedscalar-generalize branch September 2, 2021 21:28
@jedbrown
Copy link
Member

jedbrown commented Sep 2, 2021

Congrats @nbeams on this big lift. 🚀 And thanks to @pazner @YohannDudouit @jeremylt for the discussion and contributions.

@nbeams
Copy link
Contributor Author

nbeams commented Sep 2, 2021

Yes, a big thank you to everyone who contributed to this PR! A team effort, for sure. 🎉

@@ -28,15 +28,15 @@ static const char *atomicAdd = QUOTE(
//------------------------------------------------------------------------------
// Atomic add, for older CUDA
//------------------------------------------------------------------------------
__device__ double atomicAdd(double *address, double val) {
__device__ CeedScalar atomicAdd(CeedScalar *address, CeedScalar val) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one should stay double or simply be removed with float, atomicAdd for float is always defined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in CUDA-8 and later? We don't test ancient versions and I think it doesn't hurt us to just not support versions less than 8.

@nbeams nbeams mentioned this pull request Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compiling with CeedScalar defined as float
5 participants