Memory leak in RK4 #137

mark-petersen · 2019-01-22T18:52:29Z

We don't run RK4 very often. Running EC60to30 on 576 cores on grizzly with gnu optimized, it runs for 7 days and then dies with memory errors.

mark-petersen · 2019-01-22T18:53:49Z

@xylar and @pwolfram, I'm bringing this issue over from MPAS-Dev/MPAS#1374 into the newer MPAS-Model repo.

mark-petersen · 2019-01-22T18:58:38Z

This is what I see. I am running from commit 2763249 on mark-petersen/ocean/smoothBLD_Luke_Phil_Todd. Identical runs with split explicit do not have this problem. I expect to see the same thing on MPAS-Dev/ocean/develop, but have not checked.

gr1112.localdomain> mpirun -n 576 ocean_model_smoothBLD_Luke_Phil_Todd
App launch reported: 16 (out of 16) daemons - 540 (out of 576) procs
Operating system error: Operating system error: Cannot allocate memory
Allocation would exceed memory limit
...
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[46293,1],286]
  Exit code:    1
--------------------------------------------------------------------------
gr1112.localdomain> pwd
/lustre/scratch2/turquoise/mpeterse/runs/c67n/forward

sbrus89 · 2019-01-23T20:06:53Z

@mark-petersen, @pwolfram, and @xylar, I also just experienced this same error today.

mark-petersen · 2019-02-28T16:34:30Z

I am working with @hrmoncada to find the memory leak. He is working with both valgrind and totalview on LANL IC. He is running simple examples to get practice, and then trying to find memory leaks in MPAS-Ocean with both split explicit and RK4.

We are having trouble, though. The valgrind output is very hard to interpret, and only gives total memory leak numbers. We have struggled with totalview, but can't get it to load the code correctly to show the actual lines of code. We compiled with gnu and DEBUG -g, but I suspect that netcdf/pnetcdf/pio libraries are causing problems for totalview.

@philipwjones and @amametjanov if you could please advise on how to find memory leaks, or assist @hrmoncada on this, that would be very helpful. This problem is stopping some of our work by @sbrus89 and @xylar so is a priority.

I scanned by eye and using grep to check for matching allocate/deallocate in MPAS-Ocean. I found a few here: #142 but not the main one causing trouble in RK4. Thanks!

amametjanov · 2019-02-28T22:24:08Z

Here is an E3SM test on Anvil:

$ ./cime/scripts/create_test --no-build SMS_D_Ln5.T62_oQU120.CMPASO-NYF --machine anvil --compiler gnu
$ cd $case-dir
$ ./xmlchange --id PIO_TYPENAME --val netcdf
$ ./xmlchange --id HIST_OPTION --val none
$ ./pelayout 
Comp  NTASKS  NTHRDS  ROOTPE
CPL :     36/     1;      0
ATM :     36/     1;      0
LND :     36/     1;      0
ICE :     36/     1;      0
OCN :     36/     1;      0
ROF :     36/     1;      0
GLC :     36/     1;      0
WAV :     36/     1;      0
ESP :      1/     1;      0
$ ./case.setup -r && ./case.build
$ qsub -I -q acme -A ACME -l nodes=1:ppn=36,walltime=1:00:00
$ source .env_mach_specific.sh
$ cd run/
$ mpiexec -n 1 valgrind -v --leak-check=yes --track-origins=yes ../bld/e3sm.exe : -n 35 ../bld/e3sm.exe >> e3sm.log 2>&1

And there appear to be no leaks in the default configuration:

==163534== HEAP SUMMARY:
==163534==     in use at exit: 0 bytes in 0 blocks
==163534==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==163534==
==163534== All heap blocks were freed -- no leaks are possible
==163534==
==163534== ERROR SUMMARY: 54320 errors from 91 contexts (suppressed: 26 from 9)
==163534==
==163534== 1 errors in context 1 of 91:
==163534== Conditional jump or move depends on uninitialised value(s)
...

Line numbers are printed for local sources: e.g.

==163534== Conditional jump or move depends on uninitialised value(s)
==163534==    at 0x979DE58: MPIDI_CH3_Rndv_transfer (in /blues/gpfs/home/software/spack/opt/spack/linux-x86_64/gcc-5.3.0/mvapich2-2.2b-yzk3up5jlbhoyyfgu7tuezfbh2dhee6f/lib/libmpi.so.12.0.5)
==163534==    by 0x9788D0B: MPIDI_CH3_PktHandler_RndvClrToSend (in /blues/gpfs/home/software/spack/opt/spack/linux-x86_64/gcc-5.3.0/mvapich2-2.2b-yzk3up5jlbhoyyfgu7tuezfbh2dhee6f/lib/libmpi.so.12.0.5)
==163534==    by 0x97A5AB9: MPIDI_CH3I_SMP_read_progress (in /blues/gpfs/home/software/spack/opt/spack/linux-x86_64/gcc-5.3.0/mvapich2-2.2b-yzk3up5jlbhoyyfgu7tuezfbh2dhee6f/lib/libmpi.so.12.0.5)
==163534==    by 0x979C289: MPIDI_CH3I_Progress_test (in /blues/gpfs/home/software/spack/opt/spack/linux-x86_64/gcc-5.3.0/mvapich2-2.2b-yzk3up5jlbhoyyfgu7tuezfbh2dhee6f/lib/libmpi.so.12.0.5)
==163534==    by 0x978F5AC: MPID_Isend (in /blues/gpfs/home/software/spack/opt/spack/linux-x86_64/gcc-5.3.0/mvapich2-2.2b-yzk3up5jlbhoyyfgu7tuezfbh2dhee6f/lib/libmpi.so.12.0.5)
==163534==    by 0x971D3CC: PMPI_Isend (in /blues/gpfs/home/software/spack/opt/spack/linux-x86_64/gcc-5.3.0/mvapich2-2.2b-yzk3up5jlbhoyyfgu7tuezfbh2dhee6f/lib/libmpi.so.12.0.5)
==163534==    by 0x92A6A18: mpi_isend (in /blues/gpfs/home/software/spack/opt/spack/linux-x86_64/gcc-5.3.0/mvapich2-2.2b-yzk3up5jlbhoyyfgu7tuezfbh2dhee6f/lib/libmpifort.so.12.0.5)
==163534==    by 0x531497: __mpas_dmpar_MOD_mpas_dmpar_exch_group_start_send (mpas_dmpar.f90:7644)
==163534==    by 0x5379B9: __mpas_dmpar_MOD_mpas_dmpar_exch_group_full_halo_exch (mpas_dmpar.f90:6916)
==163534==    by 0x537CF0: __mpas_dmpar_MOD_mpas_dmpar_field_halo_exch (mpas_dmpar.f90:6983)
==163534==    by 0xCD73B0: __ocn_time_integration_split_MOD_ocn_time_integrator_split (mpas_ocn_time_integration_split.f90:411)
==163534==    by 0xCD364E: __ocn_time_integration_MOD_ocn_timestep (mpas_ocn_time_integration.f90:110)
==163534==  Uninitialised value was created
==163534==    at 0xA638A4A: brk (in /lib64/libc-2.12.so)
==163534==    by 0xA638AF4: sbrk (in /lib64/libc-2.12.so)
==163534==    by 0x97DF7BD: _int_malloc (in /blues/gpfs/home/software/spack/opt/spack/linux-x86_64/gcc-5.3.0/mvapich2-2.2b-yzk3up5jlbhoyyfgu7tuezfbh2dhee6f/lib/libmpi.so.12.0.5)
==163534==    by 0x97E02D1: calloc (in /blues/gpfs/home/software/spack/opt/spack/linux-x86_64/gcc-5.3.0/mvapich2-2.2b-yzk3up5jlbhoyyfgu7tuezfbh2dhee6f/lib/libmpi.so.12.0.5)
==163534==    by 0x975ECFC: MPIU_Handle_obj_alloc_unsafe (in /blues/gpfs/home/software/spack/opt/spack/linux-x86_64/gcc-5.3.0/mvapich2-2.2b-yzk3up5jlbhoyyfgu7tuezfbh2dhee6f/lib/libmpi.so.12.0.5)
==163534==    by 0x97734D0: MPIDI_CH3U_Recvq_FDU_or_AEP (in /blues/gpfs/home/software/spack/opt/spack/linux-x86_64/gcc-5.3.0/mvapich2-2.2b-yzk3up5jlbhoyyfgu7tuezfbh2dhee6f/lib/libmpi.so.12.0.5)
==163534==    by 0x978E8BF: MPID_Irecv (in /blues/gpfs/home/software/spack/opt/spack/linux-x86_64/gcc-5.3.0/mvapich2-2.2b-yzk3up5jlbhoyyfgu7tuezfbh2dhee6f/lib/libmpi.so.12.0.5)
==163534==    by 0x974C369: MPIC_Irecv (in /blues/gpfs/home/software/spack/opt/spack/linux-x86_64/gcc-5.3.0/mvapich2-2.2b-yzk3up5jlbhoyyfgu7tuezfbh2dhee6f/lib/libmpi.so.12.0.5)
==163534==    by 0x957A074: MPIR_Gather_MV2_Direct (in /blues/gpfs/home/software/spack/opt/spack/linux-x86_64/gcc-5.3.0/mvapich2-2.2b-yzk3up5jlbhoyyfgu7tuezfbh2dhee6f/lib/libmpi.so.12.0.5)
==163534==    by 0x957BA1E: MPIR_Gather_index_tuned_intra_MV2 (in /blues/gpfs/home/software/spack/opt/spack/linux-x86_64/gcc-5.3.0/mvapich2-2.2b-yzk3up5jlbhoyyfgu7tuezfbh2dhee6f/lib/libmpi.so.12.0.5)
==163534==    by 0x952BF6F: MPIR_Gather_impl (in /blues/gpfs/home/software/spack/opt/spack/linux-x86_64/gcc-5.3.0/mvapich2-2.2b-yzk3up5jlbhoyyfgu7tuezfbh2dhee6f/lib/libmpi.so.12.0.5)
==163534==    by 0x952CB2B: PMPI_Gather (in /blues/gpfs/home/software/spack/opt/spack/linux-x86_64/gcc-5.3.0/mvapich2-2.2b-yzk3up5jlbhoyyfgu7tuezfbh2dhee6f/lib/libmpi.so.12.0.5)

Full log is at

/lcrc/group/acme/azamat/acme_scratch/anvil/SMS_D_Ln5.T62_oQU120.CMPASO-NYF.anvil_gnu.20190228_135852_wwch45/run/e3sm.log

xylar · 2019-03-01T00:21:39Z

@amametjanov, I'm pretty sure E3SM is only configured to run MPAS-Ocean with split-explicit time stepping (@mark-petersen or @jonbob, please correct me if I'm wrong). So I don't think the above test would have caught the RK4 memory leak we're concerned about.

philipwjones · 2019-03-01T16:36:27Z

@hrmoncada and @mark-petersen - can you point me to valgrind output on IC? And were you running with @amametjanov 's arguments (ie track-origins)?

mark-petersen · 2019-03-13T12:03:46Z

I'm pretty sure E3SM is only configured to run MPAS-Ocean with split-explicit time stepping ... So I don't think the above test would have caught the RK4 memory leak we're concerned about.

That is correct.

@hrmoncada has run a test with me using MPAS-Ocean stand-alone and setting
config_time_integrator='RK4'
Both valgrind and totalview show memory leaks. In order to use totalview on IC, only the intel compiler worked (compile with DEBUG=true). Here are the two types of output from totalview. Line numbers are probably c-preprocessed code. Just look at the MB sized lines for now.

Leak detection source report

Leak detection backtrace report

mark-petersen · 2019-03-13T16:06:52Z

I know the general problem but not the specific solution.

Summary:
In RK4 we clone the state pool and then destroy it on each timestep, because it keeps a copy of the RHS of the timestepping scheme for every single state variable. The pool is created and cloned in the framework, and cycles over all variables in the source pool to make an exact copy.

The problem must be that the destroy pool is missing a corresponding deallocate on a particular type of variable, or it is not deallocating correctly. A clone pool only exists in RK4, and nowhere else in MPAS, which is why it is not a problem in E3SM.

Thanks @matthewhoffman for a helpful conversation.

Details
Looking closely at the above memory trace:

Pools are created and cloned here. Lines don't exactly match with above in F or f90 files, I'm not sure why.

src/core_ocean/mode_forward/mpas_ocn_time_integration_rk4.F
  74    subroutine ocn_time_integrator_rk4(domain, dt)!{{{
 207       block => domain % blocklist
 208       do while (associated(block))
 209          call mpas_pool_get_subpool(block % structs, 'state', statePool)
 
 213          call mpas_pool_create_pool(provisStatePool)
 215          call mpas_pool_clone_pool(statePool, provisStatePool, 1)

Pools are destroyed here:

src/core_ocean/mode_forward/mpas_ocn_time_integration_rk4.F
 707       block => domain % blocklist
 708       do while(associated(block))
 709          call mpas_pool_get_subpool(block % structs, 'provis_state', provisStatePool)
 711          call mpas_pool_destroy_pool(provisStatePool)
 713          call mpas_pool_remove_subpool(block % structs, 'provis_state')

In framework, the variables are cloned here for 3d-reals. The trace above indicates that field3d_real is the problem

src/framework/mpas_pool_routines.F
 479    recursive subroutine mpas_pool_clone_pool(srcPool, destPool, overrideTimeLevels)!{{{

 757                else if (associated(dptr % r3a)) then
 758                   if ( newmem % data % contentsTimeLevs > 1) then
 759                      allocate(newmem % data % r3a(newmem % data % contentsTimeLevs))
 760                      minTimeLevels = min(dptr % contentsTimeLevs, newmem % data % contentsTimeLevs)
 761                      do j = 1, minTimeLevels
 762                         call mpas_duplicate_field(dptr % r3a(j), newmem % data % r3)
 763                         newmem % data % r3a(j) = newmem % data % r3
 764                         deallocate(newmem % data % r3)
 765                      end do
 766
 767                      do j = minTimeLevels, newmem % data % contentsTimeLevs
 768                         call mpas_duplicate_field(dptr % r3a(dptr % contentsTimeLevs), newmem % data      % r3)
 769                         newmem % data % r3a(j) = newmem % data % r3
 770                         deallocate(newmem % data % r3)
 771                      end do

and destroyed here

src/framework/mpas_pool_routines.F
203    recursive subroutine mpas_pool_destroy_pool(inPool)!{{{
 329                   else if (associated(dptr % r3a)) then
 330                      do j=1,dptr % contentsTimeLevs
 331                         if (associated(dptr % r3a(j) % array)) then
 332                            deallocate(dptr % r3a(j) % array, stat=local_err)
 333                         end if
 334                      end do
 335                      deallocate(dptr % r3a, stat=local_err)

Comparing these two blocks of code, the 3D field must not be fully deallocated. It must be allocated with mpas_duplicate_field and deallocated with deallocate. More soon...

mark-petersen · 2019-03-14T02:36:25Z

I think I found it. Change the initial index from minTimeLevels to minTimeLevels+1 on the second do loop, here:

src/framework/mpas_pool_routines.F
767                      do j = minTimeLevels+1, newmem % data % contentsTimeLevs

In the code right now,
newmem % data % r3a(minTimeLevels) % array
is allocated twice, with two calls to mpas_duplicate_field. I'm not sure what happens when an array is allocated twice and deallocated once, but it is definitely a memory leak hazard.

@hrmoncada, @xylar or @sbrus89 if any of you could retest your memory leak fail with that change, it would be most helpful. I can then make a PR. There are corresponding changes for r1a, r2a, r4a, r5a.

mark-petersen · 2019-03-14T03:02:10Z

I emailed Doug, and he kindly responded:

Yes, that would be a memory leak, and your fix looks correct.

Another tool you can use in the future is Intel Inspector, which is useful for helping to find threading race conditions, and memory issues (like memory leaks).

A memory leak is when you allocate memory, and never deallocate it, so the OS think it's constantly used but you can't really access it. If you do this once per time step, and you have many many time steps, eventually you run out of available memory to allocate anything new, and the program dies.

In this case, you allocate space for a field, and store a pointer to it. Then you allocate space for the same field, and overwrite the pointer so it points to the new field rather than the old one. Then the first allocation is left dangling forever.

Currently, the subroutine `mpas_pool_clone_pool` double-allocates arrays due to an index error, so that `mpas_duplicate_field` is called twice (by mistake) but deallocate is only called once. This PR fixes that index. This error previously caused ocean simulations using RK4 time stepping to die with out-of-memory errors. fixes #137

mark-petersen · 2019-04-08T20:54:43Z

Fixed by #185. Will appear on ocean/develop very soon, after develop is merged in.

pwolfram · 2019-08-29T16:46:15Z

Not 100% sure this issue was entirely resolved, this may be related to #350.

mark-petersen added bug Ocean labels Jan 22, 2019

mark-petersen mentioned this issue Jan 22, 2019

Memory leak in RK4 MPAS-Dev/MPAS#1374

Closed

mark-petersen mentioned this issue Jan 31, 2019

Memory leak cleanup #142

Merged

mark-petersen mentioned this issue Mar 14, 2019

Clone pool memory leak #185

Merged

pwolfram mentioned this issue Apr 4, 2019

Adds wetting and drying capabilities to RK4 with dam break #54

Merged

mark-petersen closed this as completed Apr 10, 2019

pwolfram added the coastal label Apr 23, 2019

pwolfram mentioned this issue Aug 29, 2019

Memory leak with RK4 in hurricane simulation #350

Closed

matthewhoffman added bug Ocean labels Mar 17, 2021

mark-petersen mentioned this issue Apr 3, 2024

Create the provis_state subpool at RK4 initialization to avoid memory leak E3SM-Ocean-Discussion/E3SM#87

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak in RK4 #137

Memory leak in RK4 #137

mark-petersen commented Jan 22, 2019

mark-petersen commented Jan 22, 2019

mark-petersen commented Jan 22, 2019

sbrus89 commented Jan 23, 2019

mark-petersen commented Feb 28, 2019

amametjanov commented Feb 28, 2019

xylar commented Mar 1, 2019

philipwjones commented Mar 1, 2019

mark-petersen commented Mar 13, 2019

mark-petersen commented Mar 13, 2019 •

edited

Loading

mark-petersen commented Mar 14, 2019

mark-petersen commented Mar 14, 2019

mark-petersen commented Apr 8, 2019

pwolfram commented Aug 29, 2019

Memory leak in RK4 #137

Memory leak in RK4 #137

Comments

mark-petersen commented Jan 22, 2019

mark-petersen commented Jan 22, 2019

mark-petersen commented Jan 22, 2019

sbrus89 commented Jan 23, 2019

mark-petersen commented Feb 28, 2019

amametjanov commented Feb 28, 2019

xylar commented Mar 1, 2019

philipwjones commented Mar 1, 2019

mark-petersen commented Mar 13, 2019

mark-petersen commented Mar 13, 2019 • edited Loading

mark-petersen commented Mar 14, 2019

mark-petersen commented Mar 14, 2019

mark-petersen commented Apr 8, 2019

pwolfram commented Aug 29, 2019

mark-petersen commented Mar 13, 2019 •

edited

Loading