Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runs with MPAS-A dycore and CAM7 physics fail - missing variables in inic files #995

Closed
gdicker1 opened this issue Mar 14, 2024 · 13 comments
Assignees
Labels
bug Something isn't working correctly CoupledEval3

Comments

@gdicker1
Copy link
Collaborator

What happened?

Runs of the F2000dev compset on MPAS-A grids fail. This seems to be due to the combination of the MPAS-A dycore and CAM7 (a.k.a. cam_dev) physics.

The last output from a case's atm.log:

  ----- done assigning dimensions from Registry.xml -----


 Allocating fields ...
  34 MB allocated for fields on this task
  4346 MB total allocated for fields across all tasks
  ----- done allocating fields -----

Last output from cesm.log (reorganized for 1 thread):

dec0360.hsn.de.hpc.ucar.edu 124: forrtl: severe (174): SIGSEGV, segmentation fault occurred
dec0360.hsn.de.hpc.ucar.edu 124: Image              PC                Routine            Line        Source
dec0360.hsn.de.hpc.ucar.edu 124: libpthread-2.31.s  000014BDC4E318C0  Unknown               Unknown  Unknown
dec0360.hsn.de.hpc.ucar.edu 124: cesm.exe           0000000002CAE620  mpas_io_streams_m        1037  mpas_io_streams.F
dec0360.hsn.de.hpc.ucar.edu 124: cesm.exe           0000000002B40B6D  cam_mpas_subdrive        1154  cam_mpas_subdriver.F90
dec0360.hsn.de.hpc.ucar.edu 124: cesm.exe           0000000000643D5E  dyn_grid_mp_dyn_g         464  dyn_grid.F90
dec0360.hsn.de.hpc.ucar.edu 124: cesm.exe           0000000000592015  cam_comp_mp_cam_i         165  cam_comp.F90
dec0360.hsn.de.hpc.ucar.edu 124: cesm.exe           000000000057ACDD  atm_comp_nuopc_mp         635  atm_comp_nuopc.F90
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so         000014BDCC973B40  _ZN5ESMCI6FTable1     Unknown  Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so         000014BDCC973607  ESMCI_FTableCallE     Unknown  Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so         000014BDCCC5DF85  _ZN5ESMCI2VM5ente     Unknown  Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so         000014BDCC974351  c_esmc_ftablecall     Unknown  Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so         000014BDCCEEE6E0  esmf_compmod_mp_e     Unknown  Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so         000014BDCD22F851  esmf_gridcompmod_     Unknown  Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so         000014BDCD60C9E0  nuopc_driver_mp_l     Unknown  Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so         000014BDCD629055  nuopc_driver_mp_i     Unknown  Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so         000014BDCC973B40  _ZN5ESMCI6FTable1     Unknown  Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so         000014BDCC973607  ESMCI_FTableCallE     Unknown  Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so         000014BDCCC5DF85  _ZN5ESMCI2VM5ente     Unknown  Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so         000014BDCC974351  c_esmc_ftablecall     Unknown  Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so         000014BDCCEEE6E0  esmf_compmod_mp_e     Unknown  Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so         000014BDCD22F851  esmf_gridcompmod_     Unknown  Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so         000014BDCD60C9E0  nuopc_driver_mp_l     Unknown  Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so         000014BDCD628F3F  nuopc_driver_mp_i     Unknown  Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so         000014BDCD63DD80  nuopc_driver_mp_i     Unknown  Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so         000014BDCC973B40  _ZN5ESMCI6FTable1     Unknown  Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so         000014BDCC973607  ESMCI_FTableCallE     Unknown  Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so         000014BDCCC5DF85  _ZN5ESMCI2VM5ente     Unknown  Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so         000014BDCC974351  c_esmc_ftablecall     Unknown  Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so         000014BDCCEEE6E0  esmf_compmod_mp_e     Unknown  Unknown
dec0360.hsn.de.hpc.ucar.edu 124:
dec0360.hsn.de.hpc.ucar.edu 124: Stack trace terminated abnormally.

What are the steps to reproduce the bug?

The easiest is to create a case with --compset F2000dev to get cam_dev physics and --res mpasa120_mpasa120 to get the MPAS-A dycore. After setting up, building, and submitting the case the run will fail.

E.g. on Derecho:

./cime/scripts/create_newcase --case "${CASENAME}" --project "${PROJ}" --run-unsupported --compiler intel --res mpasa120_mpasa120 --compset F2000dev

What CAM tag were you using?

cam6_3_148

What machine were you running CAM on?

CISL machine (e.g. cheyenne)

What compiler were you using?

Intel

Path to a case directory, if applicable

/glade/derecho/scratch/gdicker/F2000dev_mpasa120_intel_1710435350

Will you be addressing this bug yourself?

Any CAM SE can do this

Extra info

No response

@gdicker1 gdicker1 added the bug Something isn't working correctly label Mar 14, 2024
@adamrher
Copy link

Can you confirm whether this occurs with F2000climo a.k.a. CAM6 physics?

Are these runs with ./xmlchange DEBUG=TRUE?

Thanks.

@gdicker1
Copy link
Collaborator Author

Hi @adamrher, I can confirm that F2000climo works. I was testing the RRTMGP changes in CAM with MPAS-A, and I was able to run with F2000climo.

I have not tried with DEBUG=TRUE yet. I will update when I do.

@gdicker1
Copy link
Collaborator Author

Here's one thread's content in cesm.log from a run with DEBUG=true

dec0314.hsn.de.hpc.ucar.edu 2:  ERROR:
dec0314.hsn.de.hpc.ucar.edu 2:  cam_mpas_subdriver::cam_mpas_read_static: FATAL: Failed to add 2 fields to stat
dec0314.hsn.de.hpc.ucar.edu 2:  ic input stream.
dec0314.hsn.de.hpc.ucar.edu 2: Image              PC                Routine            Line        Source
dec0314.hsn.de.hpc.ucar.edu 2: cesm.exe           000000000A913110  shr_abort_mod_mp_         114  shr_abort_mod.F90
dec0314.hsn.de.hpc.ucar.edu 2: cesm.exe           000000000A912F7A  shr_abort_mod_mp_          61  shr_abort_mod.F90
dec0314.hsn.de.hpc.ucar.edu 2: cesm.exe           0000000009DF56A2  cam_mpas_subdrive        1161  cam_mpas_subdriver.F90
dec0314.hsn.de.hpc.ucar.edu 2: cesm.exe           0000000000CE1FFF  dyn_grid_mp_setup         464  dyn_grid.F90
dec0314.hsn.de.hpc.ucar.edu 2: cesm.exe           0000000000CDC9B0  dyn_grid_mp_dyn_g         138  dyn_grid.F90
dec0314.hsn.de.hpc.ucar.edu 2: cesm.exe           0000000000957350  cam_comp_mp_cam_i         165  cam_comp.F90
dec0314.hsn.de.hpc.ucar.edu 2: cesm.exe           00000000008FEED9  atm_comp_nuopc_mp         635  atm_comp_nuopc.F90
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so         000014C4BF90DDA9  callVFuncPtr             2167  ESMCI_FTable.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so         000014C4BF90CDE8  ESMCI_FTableCallE         824  ESMCI_FTable.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so         000014C4BFD9DB72  enter                    2318  ESMCI_VMKernel.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so         000014C4BFD87010  enter                    1216  ESMCI_VM.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so         000014C4BF90E18F  c_esmc_ftablecall         981  ESMCI_FTable.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so         000014C4C03ED650  esmf_compmod_mp_e        1223  ESMF_Comp.F90
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so         000014C4C0D7B8E5  esmf_gridcompmod_        1412  ESMF_GridComp.F90
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so         000014C4C1821DFC  nuopc_driver_mp_l        2889  NUOPC_Driver.F90
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so         000014C4C180A69F  nuopc_driver_mp_i        1992  NUOPC_Driver.F90
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so         000014C4BF90DDA9  callVFuncPtr             2167  ESMCI_FTable.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so         000014C4BF90CDE8  ESMCI_FTableCallE         824  ESMCI_FTable.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so         000014C4BFD9DB72  enter                    2318  ESMCI_VMKernel.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so         000014C4BFD87010  enter                    1216  ESMCI_VM.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so         000014C4BF90E18F  c_esmc_ftablecall         981  ESMCI_FTable.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so         000014C4C03ED650  esmf_compmod_mp_e        1223  ESMF_Comp.F90
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so         000014C4C0D7B8E5  esmf_gridcompmod_        1412  ESMF_GridComp.F90
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so         000014C4C1821DFC  nuopc_driver_mp_l        2889  NUOPC_Driver.F90
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so         000014C4C180A44C  nuopc_driver_mp_i        1987  NUOPC_Driver.F90
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so         000014C4C17CF051  nuopc_driver_mp_i         487  NUOPC_Driver.F90
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so         000014C4BF90DDA9  callVFuncPtr             2167  ESMCI_FTable.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so         000014C4BF90CDE8  ESMCI_FTableCallE         824  ESMCI_FTable.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so         000014C4BFD9DB72  enter                    2318  ESMCI_VMKernel.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so         000014C4BFD87010  enter                    1216  ESMCI_VM.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so         000014C4BF90E18F  c_esmc_ftablecall         981  ESMCI_FTable.C
dec0314.hsn.de.hpc.ucar.edu 2:
dec0314.hsn.de.hpc.ucar.edu 2: Stack trace terminated abnormally.
dec0314.hsn.de.hpc.ucar.edu 2: MPICH ERROR [Rank 2] [job id 5d63df0c-2c01-4c32-88d0-b8a50fe5fa22] [Thu Mar 14 11:26:20 2024] [dec0314] - Abort(1001) (rank 2  in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1001) - process 2
dec0314.hsn.de.hpc.ucar.edu 2:
dec0314.hsn.de.hpc.ucar.edu 2: aborting job:
dec0314.hsn.de.hpc.ucar.edu 2: application called MPI_Abort(MPI_COMM_WORLD, 1001) - process 2

From a run on Derecho within "/glade/derecho/scratch/gdicker/F2000dev_mpasa120_intel_dbg_1710436541"

@briandobbins
Copy link
Collaborator

Is this just a problem with the IC file? I've run this with my own analytic IC files and cam_dev physics before. I think it just needs those two missing fields (cell_gradient_coef_x and cell_gradient_coef_y).

@mgduda
Copy link
Collaborator

mgduda commented Mar 14, 2024

As a temporary workaround, if testing without the frontogenesis gravity wave drag (?) scheme is acceptable, setting use_gw_front = false in CAM's namelist might suffice. It looks like the cell_gradient_coef_x and cell_gradient_coef_y fields are only read if use_gw_front or use_gw_front_igw are true: https://github.com/ESCOMP/CAM/blob/cam6_3_148/src/dynamics/mpas/driver/cam_mpas_subdriver.F90#L1152-L1162 .

@gdicker1
Copy link
Collaborator Author

Thanks @briandobbins and @mgduda for the tips.

Is this just a problem with the IC file?

It might be. I think only "atm/cam/inic/mpas/mpasa60_L32_notopo_coords_c230707.nc" has cell_gradient_coef_{xy} variables from what I checked.

... setting use_gw_front = false in CAM's namelist might suffice....

I just tried a couple of these F2000dev MPAS-A runs with use_gw_front = .false. added to user_nl_cam, and they succeeded!

@adamrher
Copy link

As a temporary workaround, if testing without the frontogenesis gravity wave drag (?) scheme is acceptable

This was off in CAM6, so it's not terrible to omit this process in the near term. But this should get fixed for production runs as our midlatitude jets and polar vortex are too strong, and so the additional drag caused by turning the frontal scheme on does move the solution in the right direction.

This is less important at higher resolutions where these waves start to become resolved.

@adamrher
Copy link

@gdicker1 if this issue is just due to missing variables in the inic file when running the frontal scheme, should we close (or rename) this issue?

@gdicker1 gdicker1 changed the title MPAS-A dycore doesn't work with CAM7 physics Runs with MPAS-A dycore and CAM7 physics fail - missing variables in inic files Mar 19, 2024
@gdicker1
Copy link
Collaborator Author

If the issue isn't fixed, I'm not sure why it should be closed. Unless someone has regenerated the files already?

@adamrher I think the issue title was fine but I changed it to "Runs with MPAS-A dycore and CAM7 physics fail - missing variables in inic files." If that still isn't what you imagined, I don't mind if the title changes again.

@adamrher
Copy link

@gdicker1 understood. You're right, the original name still conveyed this issue. I was just confused since folks have been running cam_dev with MPAS for a while now, but the issue is that our namelist_defaults have a large number of inic without the variables req'd to run cam_dev.

@adamrher
Copy link

Hi @gdicker1. I was looking through the issues and we don't have a general issue for bringing in L58/L93 support for mpas. This issue here is related, but not encompassing of the entire effort, which now includes this issue: #1102. I was going to open the issue but wanted to check with you first.

Only mpasa120 and mpasa480 are supported in cam_development. So I was thinking the issue could just provide support for those two grids -- hi-res and var-res can be a separate issue that we can address after supporting the coarser grids. Thoughts?

@gdicker1
Copy link
Collaborator Author

Hi @adamrher, thanks for checking. I think this sounds reasonable, especially to add other resolutions later.

Just to add some other thoughts: Other times this has come up there wasn't agreement on what the level heights should be for L58 and L93 (but I think this has been resolved). There has also been concerns about the amount of space the (high-resolution) files could take up on CESM data servers, especially since we could have with 3 versions of a similar grid (notopo, topo, and real-data).

@briandobbins
Copy link
Collaborator

briandobbins commented Jul 24, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctly CoupledEval3
Projects
Status: Done
Development

No branches or pull requests

6 participants