-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fully coupled model does not compile or run due to mpas-framework issues. #31
Comments
Ok, I've got the coupled model working again for intel, intel-oneapi, gnu, and nvhpc on derecho. Of course, nvhpc doesn't restart. |
This is partially addressed by EarthWorksOrg/mpas-framework#6. However as this comment in the PR discussion shows, we still have issues with GPU builds of the FullyCoupled compset. Error (copied)
|
We need to get working on the one shared framework issue.
On Feb 20, 2024, at 2:41 PM, G. Dylan Dickerson ***@***.***> wrote:
** Caution: EXTERNAL Sender **
This is partially addressed by EarthWorksOrg/mpas-framework#6<EarthWorksOrg/mpas-framework#6>. However as this comment<EarthWorksOrg/mpas-framework#6 (review)> in the PR discussion shows, we still have issues with GPU builds of the FullyCoupled compset.
…________________________________
Error (copied)
nvlink error : Multiple definition of 'mpas_dmpar_mpas_dmpar_exch_halo_3d_real_acc_8350_gpu' in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libocn.a:mpas_dmpar.o', first defined in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libice.a:mpas_dmpar.o'
nvlink error : Multiple definition of 'mpas_dmpar_mpas_dmpar_exch_halo_3d_real_acc_8308_gpu' in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libocn.a:mpas_dmpar.o', first defined in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libice.a:mpas_dmpar.o'
nvlink error : Multiple definition of 'mpas_dmpar_mpas_dmpar_exch_halo_2d_real_acc_7433_gpu' in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libocn.a:mpas_dmpar.o', first defined in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libice.a:mpas_dmpar.o'
nvlink error : Multiple definition of 'mpas_dmpar_mpas_dmpar_exch_halo_2d_real_acc_7393_gpu' in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libocn.a:mpas_dmpar.o', first defined in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libice.a:mpas_dmpar.o'
nvlink error : Multiple definition of 'mpas_dmpar_mpas_dmpar_exch_halo_1d_real_acc_6519_gpu' in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libocn.a:mpas_dmpar.o', first defined in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libice.a:mpas_dmpar.o'
nvlink error : Multiple definition of 'mpas_dmpar_mpas_dmpar_exch_halo_1d_real_acc_6481_gpu' in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libocn.a:mpas_dmpar.o', first defined in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libice.a:mpas_dmpar.o'
nvlink fatal : merge_elf failed
pgacclnk: child process exit status 2: /glade/u/apps/common/23.04/spack/opt/spack/nvhpc/23.5/Linux_x86_64/23.5/compilers/bin/tools/nvdd
gmake: *** [/glade/work/gdicker/EarthWorks/EWRepo_PullRequests/2024Feb20_MPASfrwk6/cases/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/Tools/Makefile:978: /glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/cesm.exe] Error 2
—
Reply to this email directly, view it on GitHub<#31 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADS3XWHOORTROTRMGJJRB2TYUUKBFAVCNFSM6AAAAABDK2KQMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJVGE3DKMJRG4>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Looks like the GPU compilation is more sensitive to this multiple definition problem than CPU compilation.
I did something quick and dirty last night that could address this and I want you to try. My work is in ~dazlich/ewv2.1/EarthWorks/components/mpas-*
The idea is to replace the ‘mpas’ in framework module names with a string that the preprocessor can modify, and the ocean and seaice can use different values. The string I chose is ‘MPASSO’ which doesn’t occur anywhere else in earthworks if your search is case-sensitive. Specifically:
In mpas-framework/src I renamed all fortran files from mpas_*.F to mpasso_*.F in the framework and operators subdirectories. In each of the files I changed the module name from mpas_* to MPASSO_*. In all the module use statements I changed use mpas_* to use MPASSO_*.
In the mpas-[ocean,seaice]/cime_config/buildlib I added a -DMPASSO=mpaso for ocean and mpass for seaice. In all the .F and .F90 files in the src and driver_nuopc subdirectories I changed the module use statements from use mpas_* to use MPASSO_*
This quick and dirty change should fix the multiple definition problem until we tackle creating the one shared framework for all mpas components.
I haven’t put my changes into a branch yet so just grab from ~dazlich/ewv2.1/EarthWorks/components/mpas-*
On Feb 20, 2024, at 2:41 PM, G. Dylan Dickerson ***@***.***> wrote:
** Caution: EXTERNAL Sender **
This is partially addressed by EarthWorksOrg/mpas-framework#6<EarthWorksOrg/mpas-framework#6>. However as this comment<EarthWorksOrg/mpas-framework#6 (review)> in the PR discussion shows, we still have issues with GPU builds of the FullyCoupled compset.
…________________________________
Error (copied)
nvlink error : Multiple definition of 'mpas_dmpar_mpas_dmpar_exch_halo_3d_real_acc_8350_gpu' in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libocn.a:mpas_dmpar.o', first defined in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libice.a:mpas_dmpar.o'
nvlink error : Multiple definition of 'mpas_dmpar_mpas_dmpar_exch_halo_3d_real_acc_8308_gpu' in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libocn.a:mpas_dmpar.o', first defined in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libice.a:mpas_dmpar.o'
nvlink error : Multiple definition of 'mpas_dmpar_mpas_dmpar_exch_halo_2d_real_acc_7433_gpu' in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libocn.a:mpas_dmpar.o', first defined in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libice.a:mpas_dmpar.o'
nvlink error : Multiple definition of 'mpas_dmpar_mpas_dmpar_exch_halo_2d_real_acc_7393_gpu' in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libocn.a:mpas_dmpar.o', first defined in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libice.a:mpas_dmpar.o'
nvlink error : Multiple definition of 'mpas_dmpar_mpas_dmpar_exch_halo_1d_real_acc_6519_gpu' in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libocn.a:mpas_dmpar.o', first defined in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libice.a:mpas_dmpar.o'
nvlink error : Multiple definition of 'mpas_dmpar_mpas_dmpar_exch_halo_1d_real_acc_6481_gpu' in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libocn.a:mpas_dmpar.o', first defined in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libice.a:mpas_dmpar.o'
nvlink fatal : merge_elf failed
pgacclnk: child process exit status 2: /glade/u/apps/common/23.04/spack/opt/spack/nvhpc/23.5/Linux_x86_64/23.5/compilers/bin/tools/nvdd
gmake: *** [/glade/work/gdicker/EarthWorks/EWRepo_PullRequests/2024Feb20_MPASfrwk6/cases/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/Tools/Makefile:978: /glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/cesm.exe] Error 2
—
Reply to this email directly, view it on GitHub<#31 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADS3XWHOORTROTRMGJJRB2TYUUKBFAVCNFSM6AAAAABDK2KQMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJVGE3DKMJRG4>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
It looks like copying these into a fresh clone of EarthWorks failed for me From the case.build output:
Excerpt from the mentioned "/glade/derecho/scratch/gdicker/2024Feb21_101946_gpu-EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/ bld/ice.bldlog.240221-102010"
Derecho-specific files/paths:
(This is just an update, I'm looking into it some more) |
Halogroup in the error messages pops out at me.
I had to add three files, mpas_halo.F, mpas_halo_types.inc, and mpas_string_utils.F to the mpas-framework when the alpha17a updates were merged in. Then, mpasav7_openacc was merged in to create v2.1. To match this, among other modifications, I had to remove those three files to make my modify-mpas-framework branch work with the v2.1 code on cpu.
If you are working with v2.1 main or develop, ocean and ice shouldn’t be compiling mpas_halo or mpas_string_utils.
On Feb 21, 2024, at 12:06 PM, G. Dylan Dickerson ***@***.***> wrote:
** Caution: EXTERNAL Sender **
Looks like the GPU compilation is more sensitive to this multiple definition problem than CPU compilation.
I did something quick and dirty last night that could address this and I want you to try. My work is in ~dazlich/ewv2.1/EarthWorks/components/mpas-*
The idea is to replace the ‘mpas’ in framework module names with a string that the preprocessor can modify, and the ocean and seaice can use different values. The string I chose is ‘MPASSO’ which doesn’t occur anywhere else in earthworks if your search is case-sensitive. Specifically:
In mpas-framework/src I renamed all fortran files from mpas_.F to mpasso_.F in the framework and operators subdirectories. In each of the files I changed the module name from mpas_* to MPASSO_. In all the module use statements I changed use mpas_ to use MPASSO_.
In the mpas-[ocean,seaice]/cime_config/buildlib I added a -DMPASSO=mpaso for ocean and mpass for seaice. In all the .F and .F90 files in the src and driver_nuopc subdirectories I changed the module use statements from use mpas_ to use MPASSO_*
This quick and dirty change should fix the multiple definition problem until we tackle creating the one shared framework for all mpas components.
I haven’t put my changes into a branch yet so just grab from ~dazlich/ewv2.1/EarthWorks/components/mpas-*
It looks like copying these into a fresh clone of EarthWorks failed for me
From the case.build output:
...
cam built in 1248.923534 seconds
ERROR: BUILD FAIL: mpassi.buildlib failed, cat /glade/derecho/scratch/gdicker/2024Feb21_101946_gpu-EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/ bld/ice.bldlog.240221-102010
BUILD FAIL: mpaso.buildlib failed, cat /glade/derecho/scratch/gdicker/2024Feb21_101946_gpu-EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/ocn. bldlog.240221-102010
ERROR: case.build failed
Excerpt from the mentioned "/glade/derecho/scratch/gdicker/2024Feb21_101946_gpu-EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/ bld/ice.bldlog.240221-102010"
NVFORTRAN-S-0142-halogrouppool is not a component of this OBJECT (/glade/derecho/scratch/gdicker/2024Feb21_101946_gpu-EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/ice/source/mpas_halo.F: 182)
NVFORTRAN-S-0142-halogroups is not a component of this OBJECT (/glade/derecho/scratch/gdicker/2024Feb21_101946_gpu-EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/ice/source/mpas_halo.F: 189)
Derecho-specific files/paths:
* My EW sandbox on Derecho: /glade/work/gdicker/EarthWorks/EWRepo_PullRequests/2024Feb20_MPASfrwk6/ew_daz-qAd-frwk
* Full info about this run (all commands issued+results): /glade/work/gdicker/EarthWorks/EWRepo_PullRequests/2024Feb20_MPASfrwk6/EarthWorks/tools/Earthworks_scripts/EWv2_CreateBuildRun/log.gpudaz.att02.txt
(This is just an update, I'm looking into it some more)
—
Reply to this email directly, view it on GitHub<#31 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADS3XWETJDT4P3LKLHZL5F3YUZATJAVCNFSM6AAAAABDK2KQMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJXG4YTSNZTGA>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
I’m looking at your mpas-framework/src/framework now. You brought in my mpasso_*.F files but you kept the old mpas_*.F files. They need to be removed - be sure to keep the mpas_*.inc files, though. Same in the operators directory.
On Feb 21, 2024, at 12:06 PM, G. Dylan Dickerson ***@***.***> wrote:
** Caution: EXTERNAL Sender **
Looks like the GPU compilation is more sensitive to this multiple definition problem than CPU compilation.
I did something quick and dirty last night that could address this and I want you to try. My work is in ~dazlich/ewv2.1/EarthWorks/components/mpas-*
The idea is to replace the ‘mpas’ in framework module names with a string that the preprocessor can modify, and the ocean and seaice can use different values. The string I chose is ‘MPASSO’ which doesn’t occur anywhere else in earthworks if your search is case-sensitive. Specifically:
In mpas-framework/src I renamed all fortran files from mpas_.F to mpasso_.F in the framework and operators subdirectories. In each of the files I changed the module name from mpas_* to MPASSO_. In all the module use statements I changed use mpas_ to use MPASSO_.
In the mpas-[ocean,seaice]/cime_config/buildlib I added a -DMPASSO=mpaso for ocean and mpass for seaice. In all the .F and .F90 files in the src and driver_nuopc subdirectories I changed the module use statements from use mpas_ to use MPASSO_*
This quick and dirty change should fix the multiple definition problem until we tackle creating the one shared framework for all mpas components.
I haven’t put my changes into a branch yet so just grab from ~dazlich/ewv2.1/EarthWorks/components/mpas-*
It looks like copying these into a fresh clone of EarthWorks failed for me
From the case.build output:
...
cam built in 1248.923534 seconds
ERROR: BUILD FAIL: mpassi.buildlib failed, cat /glade/derecho/scratch/gdicker/2024Feb21_101946_gpu-EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/ bld/ice.bldlog.240221-102010
BUILD FAIL: mpaso.buildlib failed, cat /glade/derecho/scratch/gdicker/2024Feb21_101946_gpu-EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/ocn. bldlog.240221-102010
ERROR: case.build failed
Excerpt from the mentioned "/glade/derecho/scratch/gdicker/2024Feb21_101946_gpu-EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/ bld/ice.bldlog.240221-102010"
NVFORTRAN-S-0142-halogrouppool is not a component of this OBJECT (/glade/derecho/scratch/gdicker/2024Feb21_101946_gpu-EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/ice/source/mpas_halo.F: 182)
NVFORTRAN-S-0142-halogroups is not a component of this OBJECT (/glade/derecho/scratch/gdicker/2024Feb21_101946_gpu-EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/ice/source/mpas_halo.F: 189)
Derecho-specific files/paths:
* My EW sandbox on Derecho: /glade/work/gdicker/EarthWorks/EWRepo_PullRequests/2024Feb20_MPASfrwk6/ew_daz-qAd-frwk
* Full info about this run (all commands issued+results): /glade/work/gdicker/EarthWorks/EWRepo_PullRequests/2024Feb20_MPASfrwk6/EarthWorks/tools/Earthworks_scripts/EWv2_CreateBuildRun/log.gpudaz.att02.txt
(This is just an update, I'm looking into it some more)
—
Reply to this email directly, view it on GitHub<#31 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADS3XWETJDT4P3LKLHZL5F3YUZATJAVCNFSM6AAAAABDK2KQMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJXG4YTSNZTGA>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Thanks for the catch! Let me empty those dirs and re-try the copies. |
@gdicker1 @areanddee - I think I am just about ready to issue PRs for the framework-ext-ref branches of the mpas-framework, mpas-ocean, and mpas-seaice repositories. The only sticky point is that some of the mpas-seaice files will have to be merged between this branch and the prescribed-seaice-mode branch - just a few files (cime_config/buildlib, driver_nuopc/*.F90) and any editing will be simple. |
The mpas-framework is keeping the fully coupled model from compiling, specifically, there are issues with mpas_constants.F at the link step. Inspection shows that the mpas-framework version is identical to the cam version and that shouldn't be.
Backing off to a previous version of mpas_constants.F in mpas-framework permits compilation, but now the fully coupled model does not run. I am seeing errors similar to when the cam framework version was updated with the Post cesm2_3_alpha17a Updates at the beginning of the month. It appears that the mpas-framework files I modified then now need to be changed to match the v7openacc version.
I've created a modify-mpas-framework branch to work on this. This may also be a good place to consider how we will deal with the two frameworks in the future.
The text was updated successfully, but these errors were encountered: