Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build workflow utilities with CMake. #264

Merged
merged 8 commits into from
Feb 19, 2021
Merged

Conversation

aerorahul
Copy link
Contributor

This PR enables the utilities under sorc/ to be built using cmake. modulefiles for hera, orion and wcoss_dell_p3 are added that load hpc-stack.

These workflow utilities were compiled on hera and orion, but not on wcoss_dell_p3
These workflow utilities were compiled on macOS, EMC Linux box (RHEL7), Docker with GNU and IntelOneAPI containers.
All utilities built as expected. They were not tested against the hand-written makefiles on the supported HPC's for reproducibility.

Minor changes in 3 Fortran source codes syndat_qctropcy.fd/qctropcy.f, gfs_bufr.fd/meteorg.f, and fv3nc2nemsio.fd/fv3_main.f90 were required to get these utilities to be compiled with GNU Fortran.

@KateFriedman-NOAA
Copy link
Member

I cloned the feature/cmake branch on Hera and ran build_workflow_utils.sh without doing anything else. I got this error:

-bash-4.2$ pwd
/scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc
-bash-4.2$ git branch
* feature/cmake
-bash-4.2$ sh build_workflow_utils.sh
++ uname -s
+ [[ Linux == Darwin ]]
++ which readlink
+ cmd=/usr/bin/readlink
++++ /usr/bin/readlink -f -n build_workflow_utils.sh
+++ dirname /scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc/build_workflow_utils.sh
++ cd /scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc
++ pwd -P
+ readonly UTILS_DIR=/scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc
+ UTILS_DIR=/scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc
+ target=NULL
+ modulefile=/scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc/../modulefiles/workflow_utils.NULL
+ [[ -f /scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc/../modulefiles/workflow_utils.NULL ]]
+ [[ -d nceplibs-ncio ]]
+ git clone -b develop https://github.com/noaa-emc/nceplibs-ncio
Cloning into 'nceplibs-ncio'...
remote: Enumerating objects: 28, done.
remote: Counting objects: 100% (28/28), done.
remote: Compressing objects: 100% (22/22), done.
remote: Total 198 (delta 11), reused 16 (delta 6), pack-reused 170
Receiving objects: 100% (198/198), 107.73 KiB | 4.90 MiB/s, done.
Resolving deltas: 100% (102/102), done.
+ cd nceplibs-ncio
+ mkdir -p build
+ cd build
+ cmake -DCMAKE_INSTALL_PREFIX=../install
CMake Error: The source directory "/scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc/nceplibs-ncio/build" does not appear to contain CMakeLists.txt.
Specify --help for usage, or press the help button on the CMake GUI.

I noticed it didn't know the target (other build scripts source machine-setup.sh) so I first added "target=hera" to test and then tried adding a line to source machine-setup.sh. It got "hera" as the target either way but it still error:

-bash-4.2$ sh build_workflow_utils.sh
++ uname -s
+ [[ Linux == Darwin ]]
++ which readlink
+ cmd=/usr/bin/readlink
++++ /usr/bin/readlink -f -n build_workflow_utils.sh
+++ dirname /scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc/build_workflow_utils.sh
++ cd /scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc
++ pwd -P
+ readonly UTILS_DIR=/scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc
+ UTILS_DIR=/scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc
+ source ./machine-setup.sh
+ target=hera
+ modulefile=/scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc/../modulefiles/workflow_utils.hera
+ [[ -f /scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc/../modulefiles/workflow_utils.hera ]]
+ set +x

The following have been reloaded with a version change:
  1) impi/2018.4.274 => impi/2018.0.4


Currently Loaded Modules:
  1) cmake/3.16.1       4) hpc-intel/18.0.5.274   7) jasper/2.0.22  10) w3emc/2.7.3  13) bacio/2.4.1   16) nemsiogfs/2.5.3  19) bufr/11.4.0
  2) hpc/1.1.0          5) impi/2018.0.4          8) zlib/1.2.11    11) sp/2.3.3     14) w3nco/2.4.1   17) sigio/2.3.2      20) hdf5/1.10.6
  3) intel/18.0.5.274   6) hpc-impi/2018.0.4      9) png/1.6.35     12) ip/3.3.3     15) nemsio/2.5.2  18) g2/3.4.1         21) netcdf/4.7.4



+ [[ -d nceplibs-ncio ]]
+ rm -rf nceplibs-ncio
+ git clone -b develop https://github.com/noaa-emc/nceplibs-ncio
Cloning into 'nceplibs-ncio'...
remote: Enumerating objects: 28, done.
remote: Counting objects: 100% (28/28), done.
remote: Compressing objects: 100% (22/22), done.
remote: Total 198 (delta 11), reused 16 (delta 6), pack-reused 170
Receiving objects: 100% (198/198), 107.73 KiB | 5.67 MiB/s, done.
Resolving deltas: 100% (102/102), done.
+ cd nceplibs-ncio
+ mkdir -p build
+ cd build
+ cmake -DCMAKE_INSTALL_PREFIX=../install
CMake Warning:
  No source or binary directory provided.  Both will be assumed to be the
  same as the current working directory, but note that this warning will
  become a fatal error in future CMake releases.


CMake Error: The source directory "/scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc/nceplibs-ncio/build" does not appear to contain CMakeLists.txt.
Specify --help for usage, or press the help button on the CMake GUI.

I'm not experienced with cmake...am I missing something? Thanks!

… compiled files. bugfix in build_workflow_utils.sh hack. nceplibs-ncio now creates the module ncio and not fv3gfs_ncio.
@aerorahul
Copy link
Contributor Author

@KateFriedman-NOAA
My apologies. The hack was missing a path to the source directory for nceplibs-ncio.
It has been fixed. I tested on Orion:
target=orion ./build_workflow_utils.sh

@KateFriedman-NOAA
Copy link
Member

@aerorahul Thanks for the fix! I updated my clone of feature/cmake on Hera and all workflow util codes built, woohoo!

These workflow utilities were compiled on hera and orion, but not on wcoss_dell_p3

Question...why not on wcoss_dell_p3? Are we waiting for a library update there?

@aerorahul
Copy link
Contributor Author

I did not build on wcoss_dell_p3. I was lazy. But, it should build.
When hpc-stack gets nceplibs-ncio installed on all machines within hpc-stack, we can remove the hack that is in build_workflow_utils.sh and uncomment the line in modulefiles/workflow_utils.<target> that loads the ncio module.

@KateFriedman-NOAA
Copy link
Member

@aerorahul Gotcha, I'll try the build on WCOSS-Dell and report any issues.

I ran a cycle test overnight on Hera with feature/cmake. The gdasechgres job segfaulted:

/scratch1/NCEPDEV/stmp4/Kate.Friedman/comrot/cmake/logs/2020090118/gdasechgres.log.1

I reran with module_base.hera using hpc-stack to test that but it segfaulted similarly:

/scratch1/NCEPDEV/stmp4/Kate.Friedman/comrot/cmake/logs/2020090118/gdasechgres.log

I see mention of module_ncio, do we need to load an additional module at runtime? Thanks!

Also, there is some additional updating/cleanup/decommissioning work to be done on feature/cmake (e.g. remove unneeded build scripts). I had to do some updating to get the branch ready to run so I'll finish that and commit those updates.

@aerorahul
Copy link
Contributor Author

@KateFriedman-NOAA
module_ncio is a fortran module and not a modulefile.
I will investigate the failure.

The cleanup/decommissioning should be done in a separate branch and not in feature/cmake and in a follow up PR.
You can push changes that you need to make the branch ready to run.

@aerorahul
Copy link
Contributor Author

@KateFriedman-NOAA
I suspect there is an issue in the ipolates routine coming from ip/3.3.3.
Can you confirm that your branch feature/hpc-stack works on hera for enkf_chgres_recenter_nc.fd?
I will tag @GeorgeGayno-NOAA and @kgerheiser as they have been in the weeds of ip lately.

…_nc.fd. When going from nemsio to netcdf, the linking of ip, sp and w3nco changed from _d to _4.
@aerorahul
Copy link
Contributor Author

@KateFriedman-NOAA
I found the copy/paste error from enkf_chgres_recenter.fd to enkf_chgres_recenter_nc.fd in CMakeLists.txt. When going from nemsio to netcdf, the linking of ip, sp and w3nco changed from _d to _4.
I tested the program and it now terminates normally.

@KateFriedman-NOAA
Copy link
Member

@aerorahul Thanks for the fix! I updated my clone on Hera, reran build_workflow_utils.sh and then rewound the failed gdasechgres job. The job completed successfully!

/scratch1/NCEPDEV/stmp4/Kate.Friedman/comrot/cmake/logs/2020090118/gdasechgres.log

I also ran the gdasechgres job in my feature/hpc-stack test (symlinked workflow execs to feature/cmake execs) and it also completed successfully:

/scratch1/NCEPDEV/stmp4/Kate.Friedman/comrot/stacktest1/logs/2020090118/gdasechgres.log

I am letting my cmake test on Hera continue forward another cycle. I want to run a cycle on both Orion and WCOSS-Dell, and repeat the test on Hera for a reproducibility check.

I'll have a commit shortly to add workflow_utils to the build_all and linking scripts so it builds/links out of the box again. Further cleanup and decommissioning will happen in a follow-up PR.

- add build_workflow_utils.sh to build_all.sh
- add workflow utils exec symlinks to link_fv3gfs.sh since these execs no longer land in /exec folder
- add workflow_utils build to partial_build.sh and fv3gfs_build.cfg for use by build_all
- turn off builds in fv3gfs_build.cfg that are now covered by workflow_utils build
- move machine-setup source in build_workflow_utils.sh before target is needed

Further cleanup and decommissioning of consolidated builds will come in follow-up commit/PR.

Refs: #264
sorc/build_all.sh Outdated Show resolved Hide resolved
- Add $target to build_workflow_utils.sh command in build_all.sh
- Move machine-setup source back down into if-block in build_workflow_utils.sh

Refs: #264
@KateFriedman-NOAA
Copy link
Member

@aerorahul The analcalc jobs are failing in my test of feature/cmake on Hera. Can you take a look? Thanks!

/scratch1/NCEPDEV/stmp4/Kate.Friedman/comrot/cmake/logs/2020090200/gdasanalcalc.log
/scratch1/NCEPDEV/stmp4/Kate.Friedman/comrot/cmake/logs/2020090200/gfsanalcalc.log

@aerorahul
Copy link
Contributor Author

@WalterKolczynski-NOAA @KateFriedman-NOAA
Can you as part of the review check and make sure that all the compiler flags and library linking was accurately transferred from the Makefile to CMakeLists.txt.

@WalterKolczynski-NOAA
Copy link
Contributor

@WalterKolczynski-NOAA @KateFriedman-NOAA
Can you as part of the review check and make sure that all the compiler flags and library linking was accurately transferred from the Makefile to CMakeLists.txt.

Will do

Copy link
Contributor

@WalterKolczynski-NOAA WalterKolczynski-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did what I could to cross-check against existing makefiles/build scripts with my rudimentary knowledge of cmake and the library dependencies. I suspect most of my comments will be either covered by cmake picking up library dependencies or just inconsequential changes. It's also possible I missed a flag being set somewhere, though I tried to check both the build script and the makefile.

sorc/fv3nc2nemsio.fd/CMakeLists.txt Outdated Show resolved Hide resolved
sorc/fv3nc2nemsio.fd/fv3_main.f90 Show resolved Hide resolved
modulefiles/workflow_utils.hera Outdated Show resolved Hide resolved
modulefiles/workflow_utils.orion Outdated Show resolved Hide resolved
modulefiles/workflow_utils.wcoss_dell_p3 Outdated Show resolved Hide resolved
sorc/syndat_maksynrc.fd/CMakeLists.txt Show resolved Hide resolved
sorc/tave.fd/CMakeLists.txt Show resolved Hide resolved
sorc/tocsbufr.fd/CMakeLists.txt Show resolved Hide resolved
sorc/vint.fd/CMakeLists.txt Show resolved Hide resolved
sorc/CMakeLists.txt Show resolved Hide resolved
@aerorahul
Copy link
Contributor Author

@WalterKolczynski-NOAA
Thank you for a careful review and cross-examination of compiler flags and dependencies.
As you can see, the ingredients are sprinkled across files and there is no "convention" or "consistency".
Please examine the responses and mark the ones that you are satisfied as "resolved".
I will work on what is left and request a re-review.

@WalterKolczynski-NOAA
Copy link
Contributor

Sorry for the hold-up. I've resolved all the ones that are definitely taken care of. I will leave it to your discretion about what to do with the remainder and whether they are impactful or not.

@aerorahul
Copy link
Contributor Author

@WalterKolczynski-NOAA Thanks again for the careful review.
There are no more changes from me on this PR.

@WalterKolczynski-NOAA
Copy link
Contributor

@KateFriedman-NOAA Go ahead and merge if you are satisfied.

Copy link
Member

@KateFriedman-NOAA KateFriedman-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cmake-ifying these codes @aerorahul ! Tested branch on Hera, Orion, and WCOSS-Dell. Ran fine. Duplicate run on Hera reproduced output.

@KateFriedman-NOAA
Copy link
Member

Reran build_workflow_utils.sh on WCOSS-Dell, Hera, and Orion after commit at 1d7776d. Build ran fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants