Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prescribed soil moisture (and LAI) streams are not threadsafe #791

Closed
ekluzek opened this issue Aug 23, 2019 · 12 comments · Fixed by #883
Closed

prescribed soil moisture (and LAI) streams are not threadsafe #791

ekluzek opened this issue Aug 23, 2019 · 12 comments · Fixed by #883
Assignees
Labels
bug something is working incorrectly
Milestone

Comments

@ekluzek
Copy link
Collaborator

ekluzek commented Aug 23, 2019

Brief summary of bug

Streams in cime for cime5.6.20 are not threadsafe. So any calls to streams needs to be done outside a clumps loop.

@olyson @swensosc

General bug information

CTSM version you are using: release-clm5.0.27

Does this bug cause significantly incorrect results in the model's science? No

Configurations affected:

With threading on and use_soil_moisture_streams = .true.

Details of bug

My test case is: ERP_P72x2_D_Ld3.f09_g17.I2000Clm50SpGs.cheyenne_intel.clm-prescribed

Note, the non-threaded version of above works fine.

Important details of your setup / configuration so we can reproduce the bug

Looks like the two important things are using the prescribed soil moisture streams, and running threaded.

The call in clm_driver looks like this...

    !$OMP PARALLEL DO PRIVATE (nc,bounds_clump)
    do nc = 1,nclumps
       call get_clump_bounds(nc, bounds_clump)

       if (use_soil_moisture_streams) then
          call t_startf('prescribed_sm')
          call PrescribedSoilMoistureInterp(bounds_clump, soilstate_inst, &
               waterstate_inst)
          call t_stopf('prescribed_sm')
       endif

If it's moved outside the clump loop and run over bounds_proc, it should work fine.

Important output or errors that show the problem

21:MPT ERROR: shared memory sequence number error received 46073 expected 61394
26:MPT ERROR: Assertion failed at req.h:347: "0 <= ref_count"
21:MPT ERROR: shared memory sequence number error received 61394 expected 61395
0:forrtl: severe (151): allocatable array is already allocated
0:Image PC Routine Line Source
0:cesm.exe 00000000043657BC Unknown Unknown Unknown
0:cesm.exe 0000000003916CDD shr_stream_mod_mp 1542 shr_stream_mod.F90
0:cesm.exe 00000000038FE5AB shr_stream_mod_mp 1123 shr_stream_mod.F90
0:cesm.exe 00000000036EB10A shr_dmodel_mod_mp 622 shr_dmodel_mod.F90
0:cesm.exe 0000000003891136 shr_strdata_mod_m 743 shr_strdata_mod.F90
0:cesm.exe 00000000022B7CC7 soilmoisturestrea 244 SoilMoistureStreamMod.F90
0:cesm.exe 0000000000882A6F clm_driver_mp_clm 339 clm_driver.F90
0:libiomp5.so 00002B58773B6D13 __kmp_invoke_micr Unknown Unknown
0:libiomp5.so 00002B5877386B17 Unknown Unknown Unknown
0:libiomp5.so 00002B58773861C5 Unknown Unknown Unknown
0:libiomp5.so 00002B58773B7193 Unknown Unknown Unknown
0:libdplace.so.0.0. 00002B58731BA1AD Unknown Unknown Unknown
0:libpthread.so.0 00002B58781A4734 Unknown Unknown Unknown
0:libc.so.6 00002B5878EA6D3D clone Unknown Unknown
-1:MPT ERROR: MPI_COMM_WORLD rank 0 has terminated without calling MPI_Finalize()
-1: aborting job

@ekluzek ekluzek added the bug something is working incorrectly label Aug 23, 2019
@ekluzek ekluzek added this to the cesm2.1.2 milestone Aug 23, 2019
@ekluzek ekluzek self-assigned this Aug 23, 2019
@ekluzek
Copy link
Collaborator Author

ekluzek commented Aug 23, 2019

I've documented the underlying problem in cime here...

ESMCI/cime#3216

@ekluzek ekluzek changed the title prescribed soil moisture streams is not threadsafe prescribed soil moisture (and LAI) streams are not threadsafe Aug 23, 2019
@ekluzek
Copy link
Collaborator Author

ekluzek commented Aug 23, 2019

The same problem exists for LAI streams. But, the LAI streams call is in the middle of the 3rd big Open-MP clumps loop. If order doesn't matter, I can move it to before the loop or after the loop. If order does matter (and it might) I'll need to break the loop up into two. That would likely make Open-MP a little less efficient.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Aug 23, 2019

The traceback for the LAI streams issue looks like this...

0:forrtl: severe (151): allocatable array is already allocated
0:Image PC Routine Line Source
0:cesm.exe 00000000043657BC Unknown Unknown Unknown
0:cesm.exe 0000000003916CDD shr_stream_mod_mp 1542 shr_stream_mod.F90
0:cesm.exe 00000000038FE5AB shr_stream_mod_mp 1123 shr_stream_mod.F90
0:cesm.exe 00000000036EB10A shr_dmodel_mod_mp 622 shr_dmodel_mod.F90
0:cesm.exe 0000000003891136 shr_strdata_mod_m 743 shr_strdata_mod.F90
0:cesm.exe 0000000001D4A36F satellitephenolog 226 SatellitePhenologyMod.F90
0:cesm.exe 0000000001D4DBD0 satellitephenolog 336 SatellitePhenologyMod.F90
0:cesm.exe 000000000089C19A clm_driver_mp_clm 837 clm_driver.F90

@ekluzek ekluzek added the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Aug 26, 2019
@ekluzek
Copy link
Collaborator Author

ekluzek commented Aug 26, 2019

OK, to move this to a non-threaded portion I also have to create a bounds_proc filter (all filters are currently bounds_clump). If I make a bounds_clump filter loop without OMP it dies with the following message....

/glade/p/cesmdata/cseg/inputdata/lnd/clm2/surfdata_map/release-clm5.0.18/surfda
ta_0.9x1.25_hist_16pfts_Irrig_CMIP6_simyr2000_c190214.nc
Successfully read monthly vegetation data for
month 1

ERROR: get_clump_bounds ERROR: Calling from inside a non-threaded region)

@billsacks billsacks removed the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Aug 26, 2019
ekluzek added a commit to ekluzek/CTSM that referenced this issue Oct 4, 2019
@ekluzek
Copy link
Collaborator Author

ekluzek commented Oct 11, 2019

The change for LAI seems to preserve answers. But, the change for soil-moisture changes answers.

@olyson
Copy link
Contributor

olyson commented Oct 11, 2019

Here is the present day soil moisture climatology SSP585 case that used the workaround:

/glade/work/cmip6/cases/LS3MIP/f.e21.FHIST_SSP585.f09_f09_mg17.CMIP6-lfmip-pdLC.001

Here are diagnostics comparing that case to the case where the soil moisture came from:

http://webext.cgd.ucar.edu/FHIST_BGC/f.e21.FHIST_SSP585.f09_f09_mg17.CMIP6-lfmip-pdLC/001/lnd/f.e21.FHIST_SSP585.f09_f09_mg17.CMIP6-lfmip-pdLC.001.1980_2014-b.e21.BHIST.f09_g17.CMIP6-historical.011.1980_2014/setsIndex.html

We see here, for example, that soil moisture is quite similar in the two cases, differing mostly at high latitudes because of the redistribution of liquid/solid and/or crop distribution (because we didn't prescribe crop soil moisture):

http://webext.cgd.ucar.edu/FHIST_BGC/f.e21.FHIST_SSP585.f09_f09_mg17.CMIP6-lfmip-pdLC/001/lnd/f.e21.FHIST_SSP585.f09_f09_mg17.CMIP6-lfmip-pdLC.001.1980_2014-b.e21.BHIST.f09_g17.CMIP6-historical.011.1980_2014/set2/set2_ANN_H2OSOI_4.png

@ekluzek
Copy link
Collaborator Author

ekluzek commented Oct 16, 2019

OK, I'm doing more testing, and I'm reaching the conclusion that the new code (that was run in the long simulation f.e21.FHIST_SSP585.f09_f09_mg17.CMIP6-lfmip-pdLC.001) is NOT correct. I'm not completely sure why not, as it looks like it should be. But, I'm finding when I hold the streams constant using soilm_tintalgo = 'nearest', the old code maintains H2OSOI constant in time and agreeing with the input stream file -- but the new code does NOT. The new code has H2OSOI correct at nstep==0, but then starts diverging after that.

@dlawrenncar
Copy link
Contributor

dlawrenncar commented Oct 16, 2019 via email

@ekluzek
Copy link
Collaborator Author

ekluzek commented Oct 16, 2019

@dlawrenncar this does NOT affect CMIP6 simulations. It only points out a problem in the one
prescribed soil moisture simulation that @olyson did. And even further it doesn't even mean that specific simulation is even completely wrong (because it does agree with the stream value at nstep==0). I think it just means that the soil moisture for these prescribed simulations isn't completely "prescribed" it's allowed to vary more than it should.

Now, I also don't completely understand this right now. So it's possible I'll discover something else in tracking this down. But, I would assume that it's only confined to prescribed soil moisture simulations.

@dlawrenncar
Copy link
Contributor

dlawrenncar commented Oct 16, 2019 via email

@ekluzek
Copy link
Collaborator Author

ekluzek commented Oct 16, 2019

Ahh, OK. Yes, the crop soil moisture was allowed to evolve in that simulation, so crop would be completely fine. The problem that I'm finding is that I don't believe the natural vegetation soil moisture was held as tightly to the values prescribed as it should have been. So there's a certain amount of suspicion you should have for the natural vegetation soil moisture -- but not to the level of thinking that it's completely wrong.

In my testing I'm prescribing both crop and nat-veg just so I can can verify exactly what's going on. And it's just pointing out that prescribed soil moisture with the fix put in for @olyson simulation wasn't completely correct.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Nov 15, 2019

Fixed in release-clm5.0.28.

@ekluzek ekluzek closed this as completed Dec 2, 2019
fritzt pushed a commit to CESM-GC/CTSM that referenced this issue May 19, 2021
AGonzalezNicolas pushed a commit to HPSCTerrSys/clm5_0 that referenced this issue Jun 27, 2024
AGonzalezNicolas pushed a commit to HPSCTerrSys/clm5_0 that referenced this issue Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something is working incorrectly
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants