Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fully coupled model does not compile or run due to mpas-framework issues. #31

Closed
dazlich opened this issue Feb 15, 2024 · 10 comments
Closed
Assignees
Labels
bug Something isn't working release Related to a release version of the code

Comments

@dazlich
Copy link
Contributor

dazlich commented Feb 15, 2024

The mpas-framework is keeping the fully coupled model from compiling, specifically, there are issues with mpas_constants.F at the link step. Inspection shows that the mpas-framework version is identical to the cam version and that shouldn't be.

Backing off to a previous version of mpas_constants.F in mpas-framework permits compilation, but now the fully coupled model does not run. I am seeing errors similar to when the cam framework version was updated with the Post cesm2_3_alpha17a Updates at the beginning of the month. It appears that the mpas-framework files I modified then now need to be changed to match the v7openacc version.

I've created a modify-mpas-framework branch to work on this. This may also be a good place to consider how we will deal with the two frameworks in the future.

@dazlich
Copy link
Contributor Author

dazlich commented Feb 16, 2024

Ok, I've got the coupled model working again for intel, intel-oneapi, gnu, and nvhpc on derecho. Of course, nvhpc doesn't restart.

@gdicker1
Copy link
Contributor

This is partially addressed by EarthWorksOrg/mpas-framework#6. However as this comment in the PR discussion shows, we still have issues with GPU builds of the FullyCoupled compset.


Error (copied)

nvlink error   : Multiple definition of 'mpas_dmpar_mpas_dmpar_exch_halo_3d_real_acc_8350_gpu' in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libocn.a:mpas_dmpar.o', first defined in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libice.a:mpas_dmpar.o'
nvlink error   : Multiple definition of 'mpas_dmpar_mpas_dmpar_exch_halo_3d_real_acc_8308_gpu' in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libocn.a:mpas_dmpar.o', first defined in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libice.a:mpas_dmpar.o'
nvlink error   : Multiple definition of 'mpas_dmpar_mpas_dmpar_exch_halo_2d_real_acc_7433_gpu' in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libocn.a:mpas_dmpar.o', first defined in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libice.a:mpas_dmpar.o'
nvlink error   : Multiple definition of 'mpas_dmpar_mpas_dmpar_exch_halo_2d_real_acc_7393_gpu' in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libocn.a:mpas_dmpar.o', first defined in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libice.a:mpas_dmpar.o'
nvlink error   : Multiple definition of 'mpas_dmpar_mpas_dmpar_exch_halo_1d_real_acc_6519_gpu' in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libocn.a:mpas_dmpar.o', first defined in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libice.a:mpas_dmpar.o'
nvlink error   : Multiple definition of 'mpas_dmpar_mpas_dmpar_exch_halo_1d_real_acc_6481_gpu' in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libocn.a:mpas_dmpar.o', first defined in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libice.a:mpas_dmpar.o'
nvlink fatal   : merge_elf failed
pgacclnk: child process exit status 2: /glade/u/apps/common/23.04/spack/opt/spack/nvhpc/23.5/Linux_x86_64/23.5/compilers/bin/tools/nvdd
gmake: *** [/glade/work/gdicker/EarthWorks/EWRepo_PullRequests/2024Feb20_MPASfrwk6/cases/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/Tools/Makefile:978: /glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/cesm.exe] Error 2

@dazlich
Copy link
Contributor Author

dazlich commented Feb 20, 2024 via email

@dazlich
Copy link
Contributor Author

dazlich commented Feb 21, 2024 via email

@gdicker1
Copy link
Contributor

Looks like the GPU compilation is more sensitive to this multiple definition problem than CPU compilation.

I did something quick and dirty last night that could address this and I want you to try. My work is in ~dazlich/ewv2.1/EarthWorks/components/mpas-*
The idea is to replace the ‘mpas’ in framework module names with a string that the preprocessor can modify, and the ocean and seaice can use different values. The string I chose is ‘MPASSO’ which doesn’t occur anywhere else in earthworks if your search is case-sensitive. Specifically:
In mpas-framework/src I renamed all fortran files from mpas_.F to mpasso_.F in the framework and operators subdirectories. In each of the files I changed the module name from mpas_* to MPASSO_. In all the module use statements I changed use mpas_ to use MPASSO_.
In the mpas-[ocean,seaice]/cime_config/buildlib I added a -DMPASSO=mpaso for ocean and mpass for seaice. In all the .F and .F90 files in the src and driver_nuopc subdirectories I changed the module use statements from use mpas_
to use MPASSO_*

This quick and dirty change should fix the multiple definition problem until we tackle creating the one shared framework for all mpas components.

I haven’t put my changes into a branch yet so just grab from ~dazlich/ewv2.1/EarthWorks/components/mpas-*

It looks like copying these into a fresh clone of EarthWorks failed for me

From the case.build output:

...
cam built in 1248.923534 seconds
ERROR: BUILD FAIL: mpassi.buildlib failed, cat /glade/derecho/scratch/gdicker/2024Feb21_101946_gpu-EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/ bld/ice.bldlog.240221-102010
BUILD FAIL: mpaso.buildlib failed, cat /glade/derecho/scratch/gdicker/2024Feb21_101946_gpu-EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/ocn. bldlog.240221-102010
ERROR: case.build failed

Excerpt from the mentioned "/glade/derecho/scratch/gdicker/2024Feb21_101946_gpu-EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/ bld/ice.bldlog.240221-102010"

NVFORTRAN-S-0142-halogrouppool is not a component of this OBJECT (/glade/derecho/scratch/gdicker/2024Feb21_101946_gpu-EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/ice/source/mpas_halo.F: 182)
NVFORTRAN-S-0142-halogroups is not a component of this OBJECT (/glade/derecho/scratch/gdicker/2024Feb21_101946_gpu-EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/ice/source/mpas_halo.F: 189)

Derecho-specific files/paths:

  • My EW sandbox on Derecho: /glade/work/gdicker/EarthWorks/EWRepo_PullRequests/2024Feb20_MPASfrwk6/ew_daz-qAd-frwk
  • Full info about this run (all commands issued+results): /glade/work/gdicker/EarthWorks/EWRepo_PullRequests/2024Feb20_MPASfrwk6/EarthWorks/tools/Earthworks_scripts/EWv2_CreateBuildRun/log.gpudaz.att02.txt

(This is just an update, I'm looking into it some more)

@dazlich
Copy link
Contributor Author

dazlich commented Feb 21, 2024 via email

@dazlich
Copy link
Contributor Author

dazlich commented Feb 21, 2024 via email

@gdicker1
Copy link
Contributor

Thanks for the catch! Let me empty those dirs and re-try the copies.

@gdicker1 gdicker1 added bug Something isn't working release Related to a release version of the code labels Mar 6, 2024
@dazlich
Copy link
Contributor Author

dazlich commented Mar 8, 2024

@gdicker1 @areanddee - I think I am just about ready to issue PRs for the framework-ext-ref branches of the mpas-framework, mpas-ocean, and mpas-seaice repositories. The only sticky point is that some of the mpas-seaice files will have to be merged between this branch and the prescribed-seaice-mode branch - just a few files (cime_config/buildlib, driver_nuopc/*.F90) and any editing will be simple.

@gdicker1
Copy link
Contributor

As I mentioned in the previous comment, I think this discussion can be closed since CPU builds are fixed (due to #32 being merged). I think the GPU side of the conversation is best continued in #36

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working release Related to a release version of the code
Projects
None yet
Development

No branches or pull requests

2 participants