Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify and unify Makefile build system #219

Merged
merged 61 commits into from
Dec 2, 2021
Merged

Conversation

bozbez
Copy link
Collaborator

@bozbez bozbez commented Nov 14, 2021

Changes

  • Removal of old, unmaintained CMake build system.
  • Unification of op2/c and op2/fortran source folders, with op2/fortran moving to op2/include/fortran and op2/src/fortran.
  • Removal of auto-generated source files from the apps/, with a new make generate rule to build.
  • Full re-write of the GNU Make build system.

New build system

The old build system was entirely removed and a new one with the following structure implemented:

  • makefiles/common.mk - This sets variables common to both the OP2 and app builds, such as the OP2 include and library directories as well as the compilers and flags (by including makefiles/compilers/*) and the include/link flags of the dependencies.
  • makefiles/compilers/{c, c_cuda, fortran}/*.mk - These are Makefiles follow a common structure and define the compiler executables, flags, and features for each compiler. The selection of these is controlled by OP2_COMPILER and/or OP2_{C, C_CUDA, FORTRAN}_COMPILER
  • makefiles/profiles/*.mk - These follow a custom structure that splits the Makefiles into a "pre" and "post" section, where the "pre" section is included at the start of common.mk and the "post" section at the end. The intention here is to allow for machine specific configuration of compilers, libraries and flags without needing to change the generic compiler makefiles. Variables set in the "pre" section override the defaults set by common.mk and the compiler makefiles, while variables set in the "post" section can append to them.
  • makefiles/{c, f}_app.mk - These are generic makefiles that define rules to build all of the possible app variants (e.g. [mpi_]{seq, genseq, openmp, openmp4, cuda}), and are included by each of the makefiles in the app directories with a selection of input variables such as APP_NAME and VARIANT_FILTER.
  • op2/Makefile - This defines the build for the OP2 libraries in op2/lib (by default). Object lists for each of the static libraries are first defined, and then the object builds are handled by pattern rules.
  • apps/*/Makefile - These simply define APP_NAME and sometimes VARIANT_FILTER[_OUT] and then include common.mk and {f, c}_app.mk.

As well as reducing duplication and inconsistency, the new system also allows for parallel builds of the OP2 libraries as well as the apps. In addition, incremental builds with automatically generated header dependencies (via -MMD -MP) now work.

Building

The process is kept as similar as possible to the previous system, but with more configurability. For a simple build:

  1. Set your compiler toolchain: export OP2_COMPILER={gnu, cray, intel, ...}
  2. Set your ParMETIS, PT-Scotch and HDF5 library directories: export {PARMETIS, PTSCOTCH, HDF5}_INSTALL_PATH=<path>
  3. Build the OP2 libraries: cd op2; make -j.
  4. Build an app: cd apps/c/airfoil_hdf5; make generate; make -j.

TODO

  • Fix parallel compilation of FORTRAN apps when using nvfortran.
  • Allow specification of both sequential and parallel HDF5 builds, and set app variants accordingly (Fix/hdf5 compile #218).
  • Update build instructions in the READMEs.
  • Fix missing #includes in headers that occasionally break incremental compilation.
  • Test the _openmp4 app builds with an offloading-enabled compiler.
  • Update the CI pipeline.

bozbez and others added 30 commits October 21, 2021 09:52
@gihanmudalige
Copy link
Collaborator

@reguly any comment on this ?
I am sort of leaning towards the simple source files we have already - just to showcase the setting up of the environment. But happy to do otherwise.

@reguly
Copy link
Collaborator

reguly commented Nov 24, 2021

I never really used the source files committed to these repos, they were always hardcoded to your setup :) so I am fine without having those committed in as @bozbez suggests

@gihanmudalige
Copy link
Collaborator

@reguly actually I was more considering whether its useful to a user to see one of these source files. Of course the hard coded setup in the profiles also has the same issue.

@reguly
Copy link
Collaborator

reguly commented Nov 25, 2021

I don't think that is particularly relevant, as all the environment variables a properly documented.

@gihanmudalige
Copy link
Collaborator

@reguly ok. great. Lets leave it at that. I am happy with this.

@gihanmudalige
Copy link
Collaborator

@bozbez Thanks, your most recent push fixed the build issue and Hydra is subsequently running the basic Rotor 37 problem correctly. Continuing testing with the coupler now.

@bozbez
Copy link
Collaborator Author

bozbez commented Nov 29, 2021

@bozbez I can certainly see your point. I can see similarities with the source files with what you have in makefiles/profiles/cirrus-intel-nvhpc.mk, Now in it we have things like NVCC :=/lustre/sw/nvidia/hpcsdk-219/Linux_x86_64/21.9/compilers/bin/nvcc which surely will be a module load on Cirrus, then simply using nvcc will work for compilation. Given as you say module loads should not be used in a Makefile, are these "profiles" useful at all ? A user could think that something that can be built with a simple module load needs the absolute paths set to get OP2 working. On HPC clusters sometimes finding these paths can be quite tricky. Besides if Cirrus changes these locations then that would be again problematic in terms of utility and maintenance. In which case we should altogether not have such profiles.

So the reason the nvcc executable is hardcoded there is because of the Cirrus MPI module struggling to sort out it's paths when there was more than one compiler loaded - in particular if you load intel-compilers-19 and then nvidia/nvhpc then you ended up with a broken mpicxx. Since the compilers on Cirrus are manually inserting rpaths into /lustre/sw already so that you don't need module loads to run previously compiled executables, I think that hardcoding the path here is an acceptable workaround.

I think its also worth noting the difference between hardcoding paths to shared resources on known clusters and hardcoding paths into a user folder which others can't access.

The intention of the profiles, and the reason they are there instead of source files, is that they allow the user to set make variables after the "configuration" of the compilers and the libraries. This allows you to easily override or the append to e.g. CXXFLAGS and HDF5_LIB, whereas source files would only immediately allow you to override the variables. The solution here for the source files would be to add _EXTRA variables to the end of each configure variable (which might still be worth doing). Even with the _EXTRA variables though, the profiles still have the advantage of being able to do conditional environment setup using ifdef and the lot.

Source files still have a place, but rather than using them for cluster common setup like -march= and compiler paths they should be used as a shortcut to set OP2_PROFILE and X_INSTALL_PATH as well as doing the module loads. However I think rather than committing them to the repo, it is reasonable to expect the user to produce these in a directory above as a shortcut to manually setting the variables each time - as long as the variables are clearly documented.

@gihanmudalige
Copy link
Collaborator

I have tested OP2-Hydra and OP2-Hydra with the coupler with this new OP2 branch. All tests passes. We should get Arun to switch to this version, when we are ready to merge this pull request.

@bozbez bozbez marked this pull request as ready for review December 1, 2021 11:56
@bozbez
Copy link
Collaborator Author

bozbez commented Dec 1, 2021

So unfortunately I haven't been able to actually get an OpenMP 4.0 offloading build to offload - on Cirrus with the NVHPC SDK compilers the binaries fail to find the GPU, and on my new work machine with GCC built for offloading the same happens but perhaps this time it might be that the GCC offloading doesn't support Ampere. The offload compilers are being invoked however, and the PTX is generated, but no luck at runtime.

Since the builds complete and compiler flags seem fine and are equivalent to the old build system I am going to mark this task as complete despite not being able to actually run the binaries.

In general I think this is pretty much ready to merge now, there are perhaps a few things that might need tweaking over time (and perhaps for Spack) but that should not pose a significant problem given the level of flexibility in the unified system.

@reguly
Copy link
Collaborator

reguly commented Dec 1, 2021 via email

@bozbez
Copy link
Collaborator Author

bozbez commented Dec 1, 2021

I was using the default loaded by nvidia/nvhpc which was 21.9 I think? I'll give it another go (if the login node stops timing out...).

@gihanmudalige
Copy link
Collaborator

@bozbez looks good. @reguly and @bozbez we should now move the current master to a release/2020 branch and merge this branch to master. Assuming this naming convention is ok. This PR is working with Hydra as I noted above, but I will need to push some minor modifications to Hydra which I can do soon.

Additionally, I was thinking of making a default develop branch (as we do in OPS). Did we have an agreement on if this is ok ? I would rather like to keep both OP2 and OPS similar as possible.

@reguly
Copy link
Collaborator

reguly commented Dec 1, 2021 via email

@reguly
Copy link
Collaborator

reguly commented Dec 1, 2021 via email

@bozbez
Copy link
Collaborator Author

bozbez commented Dec 1, 2021

Personally I don't think we gain anything from having a develop branch and a master branch; the feature development is already done in separate feature branches and the stable versions are now in release branches.

I agree that we should try and align with OPS though, so there's that.

@gihanmudalige
Copy link
Collaborator

@reguly thanks for creating the relevant branches. So we are going with the convention of v1.0 etc. (no problem). I suppose this PR and also the completion of the other tasks will soon get us to v2.0 ? Particularly the new code gen should take us to 2.0 I think.

W.R.T develop branch, I think it will help external developers PR on to the develop branch and then us being able to test it and merge to develop. This way we are sure that the master branch is always working and we periodically merge in develop branch as major features are added. So in affect we have a leading edge branch and a "stable" branch.

@reguly
Copy link
Collaborator

reguly commented Dec 1, 2021 via email

@bozbez bozbez merged commit a312d66 into master Dec 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants