Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

controling maxCohortsPerPatch #853

Closed
rgknox opened this issue Apr 12, 2022 · 5 comments
Closed

controling maxCohortsPerPatch #853

rgknox opened this issue Apr 12, 2022 · 5 comments

Comments

@rgknox
Copy link
Contributor

rgknox commented Apr 12, 2022

maxCohortsPerPatch controls the maximum number of cohorts that will be simulated on a patch, and is defined here:

https://github.com/NGEET/fates/blob/sci.1.55.5_api.22.1.0/main/EDTypesMod.F90#L34

It defaults to 100. .That's a lot! In ED2 I think we were typically using 20ish.. (or trying less)? This is a value that I suspect many people would like to adjust (lower!), and this is slowing down runs unnecessarily (edit: maybe, maybe not). Lots to say here.

First: it is not defined as a constant, but it is also not intended to change over the run. Its value is overwritten during the initialization sequence here: https://github.com/NGEET/fates/blob/sci.1.55.5_api.22.1.0/main/FatesInterfaceMod.F90#L777-L781 This is confusing because it is set twice. I propose we at least remove the initial value in EDTypesMod, because it is overwritten.

Second: I think it would might (debating this) be better to get this value, and the max number of patches, into the parameter file. This is a little tricky for a few reasons. One is because we have to be sure about order of operations, but we have made a point to read in the fates parameter file early in the initialization sequence, so we should be ok there. The other is that we may have some statically allocated arrrays using the max patch values. And how we allocate may have to change, but not sure, have to look. And thirdly, the maximum number of patches is also constrained by the host model which has its own expectation and capping. @sshu88 may be investigating this.

Third: As mentioned, the total number of cohorts may be our largest lever in the fight against slow runs, so really dialing in on this value and not using more cohorts than we need is important. A sensitivity analysis seems in order, right? Getting this value into the parameter file seems an important step.

@adrifoster
Copy link
Contributor

adrifoster commented Apr 12, 2022

Coincidentally I was also planning on doing a sensitivity analysis on max patches and max cohorts per patch.

I started a repo here.

I wasn't sure what to do about maxPatchesPerSite_by_disttype (https://github.com/adrifoster/fates/blob/9725247b1d8432233666d271222d61d4bf1e9670/main/EDTypesMod.F90#L33), other than make it some set percentage?

@sshu88
Copy link
Contributor

sshu88 commented Apr 12, 2022

@rgknox ELM has a capping of 16+1 patches (numpft+bareground) for natural land column. I tried increasing maximum patches from 14 (10 primary and 4 secondary) to 20 (10 primary and another 10 secondary) since nocomp mode can generate more than 4 secondary patches when do_harvest is turned on in global FATES simulation. Brazil site case works fine but the global 4x5 case cannot finalize simulation normally after completing the calculation and writing the restart files:

*** Error in `/global/cscratch1/sd/sshu3/e3sm_scratch/cori-haswell/sshu.cori-haswell.E4f9ce69d2-Faaa10622.2022-03-30/bld/e3sm.exe': corrupted size vs. prev_size: 0x000000000c94ac50 ***
160: forrtl: error (76): Abort trap signal
160: Image PC Routine Line Source
160: e3sm.exe 0000000002CE3004 Unknown Unknown Unknown
160: e3sm.exe 000000000262CD00 Unknown Unknown Unknown
160: e3sm.exe 00000000028FE5F0 Unknown Unknown Unknown
160: e3sm.exe 0000000002E634B1 Unknown Unknown Unknown
160: e3sm.exe 0000000002E8A8C7 Unknown Unknown Unknown
160: e3sm.exe 0000000002E90D33 Unknown Unknown Unknown
160: e3sm.exe 0000000002E92EA7 Unknown Unknown Unknown
160: e3sm.exe 000000000156C628 Unknown Unknown Unknown
160: e3sm.exe 000000000157D319 perf_mod_mp_t_fin 1760 perf_mod.F90
160: e3sm.exe 00000000004120B4 cime_comp_mod_mp_ 3571 cime_comp_mod.F90
160: e3sm.exe 0000000000424C6A MAIN__ 154 cime_driver.F90
160: e3sm.exe 00000000004020F2 Unknown Unknown Unknown
160: e3sm.exe 0000000002E59B0F Unknown Unknown Unknown
160: e3sm.exe 0000000000401FDA Unknown Unknown Unknown

Due to the complexity of ELM infrastructure, I feel something was missing when I revised the code. Hope the information can help.

ELM branch
FATES branch

@rgknox
Copy link
Contributor Author

rgknox commented Apr 12, 2022

I was looking at the coupling between FATES and CLM/ELM today with @adrifoster . We need to pass the number of patches desired by FATES into CLM/ELM, and then allow CLM/ELM to use that value to allocate space for its various data structures (instead of using the value dictated by the surface dataset). Currently we don't pass that direction.

It will be a little tricky to do this. We found that it is in initialize1() that CLM/ELM uses the surface dataset to decide on how many patches to allocate, see surfrd_get_num_patches(). But it isn't until intialize2() that we read in the FATES parameter file (where we could specifiy max patches), see CLMFatesGlobals(). I'd like to consult some ELM/CLM devs about their thoughts or concerns about moving the reading of the FATES parameter file to initialize1(), just prior to clm_varpar_init(), where the number of patches is first used.
@ekluzek @billsacks @olyson

@rgknox
Copy link
Contributor Author

rgknox commented Apr 13, 2022

One thing to note, we do have parameters in the parameter file, that control how many cohorts and patches the model ends up maintaining. These parameters aren't caps on the maximum possible, but the fusion tolerances:

	double fates_cohort_age_fusion_tol ;
		fates_cohort_age_fusion_tol:units = "unitless" ;
		fates_cohort_age_fusion_tol:long_name = "minimum fraction in differece in cohort age between cohorts." ;
	double fates_cohort_size_fusion_tol ;
		fates_cohort_size_fusion_tol:units = "unitless" ;
		fates_cohort_size_fusion_tol:long_name = "minimum fraction in difference in dbh between cohorts" ;
  	double fates_patch_fusion_tol ;
		fates_patch_fusion_tol:units = "unitless" ;
		fates_patch_fusion_tol:long_name = "minimum fraction in difference in profiles between patches" ;
--------
fates_cohort_age_fusion_tol = 0.08 ;
 fates_cohort_size_fusion_tol = 0.08 ;
fates_patch_fusion_tol = 0.05 ;

For instance, I just ran a simulation at BCI using my nutrient enabled branch, and found that even though my maximum possible number of cohorts was 100 per patch, I was averaging around 20 per patch in actuality. This site has 1 tropical broadleaf evergreen PFT. The fusion algorithm will attempt to fuse cohorts at the specified tolerance, even if the number of cohorts has not exceeded the maximum.

@rgknox
Copy link
Contributor Author

rgknox commented Feb 22, 2023

we can now control this, closing

@rgknox rgknox closed this as completed Feb 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants