-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
controling maxCohortsPerPatch #853
Comments
Coincidentally I was also planning on doing a sensitivity analysis on max patches and max cohorts per patch. I started a repo here. I wasn't sure what to do about |
@rgknox ELM has a capping of 16+1 patches (numpft+bareground) for natural land column. I tried increasing maximum patches from 14 (10 primary and 4 secondary) to 20 (10 primary and another 10 secondary) since *** Error in `/global/cscratch1/sd/sshu3/e3sm_scratch/cori-haswell/sshu.cori-haswell.E4f9ce69d2-Faaa10622.2022-03-30/bld/e3sm.exe': corrupted size vs. prev_size: 0x000000000c94ac50 *** Due to the complexity of ELM infrastructure, I feel something was missing when I revised the code. Hope the information can help. |
I was looking at the coupling between FATES and CLM/ELM today with @adrifoster . We need to pass the number of patches desired by FATES into CLM/ELM, and then allow CLM/ELM to use that value to allocate space for its various data structures (instead of using the value dictated by the surface dataset). Currently we don't pass that direction. It will be a little tricky to do this. We found that it is in initialize1() that CLM/ELM uses the surface dataset to decide on how many patches to allocate, see surfrd_get_num_patches(). But it isn't until intialize2() that we read in the FATES parameter file (where we could specifiy max patches), see CLMFatesGlobals(). I'd like to consult some ELM/CLM devs about their thoughts or concerns about moving the reading of the FATES parameter file to initialize1(), just prior to clm_varpar_init(), where the number of patches is first used. |
One thing to note, we do have parameters in the parameter file, that control how many cohorts and patches the model ends up maintaining. These parameters aren't caps on the maximum possible, but the fusion tolerances:
For instance, I just ran a simulation at BCI using my nutrient enabled branch, and found that even though my maximum possible number of cohorts was 100 per patch, I was averaging around 20 per patch in actuality. This site has 1 tropical broadleaf evergreen PFT. The fusion algorithm will attempt to fuse cohorts at the specified tolerance, even if the number of cohorts has not exceeded the maximum. |
we can now control this, closing |
maxCohortsPerPatch controls the maximum number of cohorts that will be simulated on a patch, and is defined here:
https://github.com/NGEET/fates/blob/sci.1.55.5_api.22.1.0/main/EDTypesMod.F90#L34
It defaults to 100. .That's a lot! In ED2 I think we were typically using 20ish.. (or trying less)? This is a value that I suspect many people would like to adjust (lower!),
and this is slowing down runs unnecessarily(edit: maybe, maybe not). Lots to say here.First: it is not defined as a constant, but it is also not intended to change over the run. Its value is overwritten during the initialization sequence here: https://github.com/NGEET/fates/blob/sci.1.55.5_api.22.1.0/main/FatesInterfaceMod.F90#L777-L781 This is confusing because it is set twice. I propose we at least remove the initial value in EDTypesMod, because it is overwritten.
Second: I think it
wouldmight (debating this) be better to get this value, and the max number of patches, into the parameter file. This is a little tricky for a few reasons. One is because we have to be sure about order of operations, but we have made a point to read in the fates parameter file early in the initialization sequence, so we should be ok there. The other is that we may have some statically allocated arrrays using the max patch values. And how we allocate may have to change, but not sure, have to look. And thirdly, the maximum number of patches is also constrained by the host model which has its own expectation and capping. @sshu88 may be investigating this.Third: As mentioned, the total number of cohorts may be our largest lever in the fight against slow runs, so really dialing in on this value and not using more cohorts than we need is important. A sensitivity analysis seems in order, right? Getting this value into the parameter file seems an important step.
The text was updated successfully, but these errors were encountered: