Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stem area index AND canopy snow radiation fixes. #750

Merged
merged 34 commits into from
Nov 23, 2021

Conversation

rosiealice
Copy link
Contributor

@rosiealice rosiealice commented Jun 7, 2021

This PR addresses issue #744, by implementing leaf layer level optical properties, and using these instead of the previous, erroneous, usage of rhol and taul in the radiation transfer code.

It also now seems likely that the absence of a proper treatment of the radiative effects of canopy snow interception in FATES is causing large discrepancies between FATES-SP and CLM5-SP simulations.

The implementation of snow on canopy layers needs to follow from the fix to the treatment of stem area, as it affects the reflective and transmission parameters that are weighted by other changes in this PR, hence, I have bundled them together.

The physics/energy/hydrology dynamics of canopy snow interception, sublimation, melting, etc. are handled by the HLM, and so here there is a modification to the interface to pass fcansno (the fraction of the canopy that has snow on) into FATES.

This (fcansno) is currently a patch level property so is applied uniformly to all FATES layers.

In CLM5, the treatment of snow affect the 'omega' parameter of the big leaf model, which is the reflectance + transmittance of the leaves. This is not applicable to FATES RTM and so instead I have modified the rho and tau parameters directly to account for the snow. The reflectance parameters for snow are hard-wired here but they could be put in the parameter file, or be sent from the HLM, or hard-wired somewhere more sensible (I couldn't figure out where that might be though) as people see fit.

Description:

This PR has a ctsm counter-part: ESCOMP/CTSM#1531

Collaborators:

Expectation of Answer Changes:

This is expected to be (slightly!) answer changing, where SAI>0

Checklist:

  • My change requires a change to the documentation.
  • I have updated the in-code documentation .AND. (the technical note .OR. the wiki) accordingly.
  • I have read the CONTRIBUTING document.
  • FATES PASS/FAIL regression tests were run
  • If answers were expected to change, evaluation was performed and provided

Test Results:

I have tested these changes in full FATES mode, (this PR being independant of SP mode).

Analysis of results is ongoing. I wanted to get the PR up for the sake of discussion.

CTSM (or) E3SM (specify which) test hash-tag:

CTSM (or) E3SM (specify which) baseline hash-tag:

FATES baseline hash-tag:

Test Output:

@rosiealice rosiealice changed the title Sai bugfix Stem area radiation bugfix Jun 7, 2021
if(total_lai_sai(L,ft,iv).gt.0._r8)then
frac_lai = currentPatch%elai_profile(L,ft,iv)/total_lai_sai(L,ft,iv)
frac_sai = 1.0_r8 - frac_lai
f_abs(L,ft,iv,ib) = 1.0_r8 - (frac_lai*(rhol(ft,ib) + taul(ft,ib))+&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this initialized when total_lai_sai<=0?
If uninitialized, could propogate nans, even in cases where it is multiplied by zero.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @rgknox. That might explain some of the slightly odd behaviour I'm currently chasing...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken that if statement out now...

@rosiealice rosiealice added bug - science PR status: Not Ready The author is signaling that this PR is a work in progress and not ready for integration. labels Jun 7, 2021
@rosiealice rosiealice removed the PR status: Not Ready The author is signaling that this PR is a work in progress and not ready for integration. label Jun 23, 2021
@rosiealice
Copy link
Contributor Author

Made some changes to simplify this PR, and went through the process of checking whether it gave identical answers to the main branch when frac_lai = 1.0 , which it should, and now does.

At the moment this makes SABV go up, and GPP do down. My hypothesis is that this is to do with the vertical layering and that the layers getting more radiation are light saturated and that the layers getting less are not. Will plot out vertical profiles to check, then this will boe good to go I think.

@rosiealice
Copy link
Contributor Author

On further consideration, it is in fact impossible to assess the impacts of this change in the absence, ironically, of SP mode. Either I need to merge this into the SP branch for testing, or wait until it is fixed/integrated. Not sure it's worth doing all of that merging or if it's best to wait until the SP tag is ready.

The impact of this change, in any case, will be a decline in GPP resulting from the non-use of PAR absorbed by stems (which is current;y being used for PSN)

@rgknox
Copy link
Contributor

rgknox commented Jun 30, 2021

@rosiealice
I'm wondering what we should do with the error checking, see here:

https://github.com/NGEET/fates/pull/750/files#diff-d6ad5d3f7cd1ec4c5194c3821f739e5d6bde18ba02c4a2174e97774f03ed291cR1005

I think there are few things here to consider:

  1. I would recommend that if we don't actually kill the run, we should only generate log messages when a local debug flag is on.
  2. What is an acceptable level of error, and should we really consider killing the run if we don't meet that threshold. I see that the error is being added back in to force energy conservation (which is a good thing). I'm speaking extemporaneously, trying to decide if 0.15 is something to worry about
  3. Maybe re-arrange the sequence so that the "if direct/indirect" is inside the "if exceeds error" logical, might make the code shorter and more readable, not sure

@rosiealice
Copy link
Contributor Author

I think the thing to do is to actually print out the error and then look at what the status quo is before we decide what to do with all these thresholds. Then we'll know whether there is an existing problem, and also how it will respond to changes in e.g. the SAI handling etc.

I'd put a placeholder to track the error and on my branch am using an existing history field to output it. Need to make a new field to add to this PR.

@rosiealice rosiealice changed the title Stem area radiation bugfix Stem area index AND canopy snow radiation fixes. Jul 19, 2021
@glemieux
Copy link
Contributor

Reviewing the other tests, particularly the non-debug tests reveals that more are seeing a blowing up of the radiation_error:
FatesColdDefSizeAgeMort, FatesColdDefReducedComplexFixedBiogeo, FatesColdDefLogging, FatesColdDefNoFire, Fates

@rgknox
Copy link
Contributor

rgknox commented Oct 25, 2021

It seems like the accumulating error should be affected by the footprint fraction of the scattering element. For instance, if the leaf layer in question that is generating some error, has a very very tiny footprint compared to the rest of the canopy, it would not contribute as much to the total energy imbalance. The energy balance is W/m2 of patch, as I understand. If the horizontal footprint of each leaf-layer element is the PFT's area, then the error term is /m2 of pft. We need to multiply each error by [m2 pft / m2 patch] ? (or maybe not, checking units on these errors and where they are defined...)

https://github.com/rosiealice/fates/blob/e760438f82bc02422165a167a048e9c75906dffb/biogeophys/EDSurfaceAlbedoMod.F90#L1002

@glemieux , it does look like the error term is being zero'd properly, I'm just wondering if the errors are growing because the canopy is growing, and with more stuff to reflect, there is more error.

@rgknox
Copy link
Contributor

rgknox commented Oct 25, 2021

In these lines:

https://github.com/rosiealice/fates/blob/e760438f82bc02422165a167a048e9c75906dffb/biogeophys/EDSurfaceAlbedoMod.F90#L974-L976

It seems to me that we are subtracting the total fraction of direct radiation absorbed by the ground from the total fraction of radiation absorbed by the canopy. I don't think that is an error though, right? Don't we want to add them together and compare to how much is upwelling (ie total reflected)?

EDIT: The descriptions on variables are not correct, need to track these down. e.g. sabs_dir, sabs_dif, local: abs_rad()

sabs_dir is filled by the local abs_dir variable, but the abs_dir variables is tough to follow, not sure what it is supposed to be.

ie: in = absorbed + reflected

             error = abs(currentPatch%sabs_dir(ib) - (currentPatch%tr_soil_dir(ib) * &
                  (1.0_r8-currentPatch%gnd_alb_dir(ib) ) + &
                  currentPatch%tr_soil_dir_dif(ib) * (1.0_r8-currentPatch%gnd_alb_dif(ib)     )))

@rgknox
Copy link
Contributor

rgknox commented Oct 25, 2021

Also possible that the zero'ing of the radiation_error is happening inside the "radtype" loop, but seems like it should be outside and prior to this loop?

https://github.com/rosiealice/fates/blob/e760438f82bc02422165a167a048e9c75906dffb/biogeophys/EDSurfaceAlbedoMod.F90#L595

@jkshuman
Copy link
Contributor

This came up in discussion during the SE call. The consensus was that there needs to be a general clean-up of radiation identified in issue #794 For this current PR the plan was to address this immediate problem without refactoring the radiation code and then address a clean-up, refactor, and stand-alone unit test as part of a new issue or within #794 @glemieux @rgknox

@ckoven
Copy link
Contributor

ckoven commented Nov 2, 2021

I went down a bit of a rabbit hole today trying to understand these radiation errors that are behind the test failures that @glemieux ran into in #750 (comment). I think there are (at least) two distinct problems with different causes; and interestingly they both appear to be situations that don't happen in FATES-SP mode and so don't show up when we look at that.

So if we look at the radiation error in FATES-SP mode, we get something that looks well behaved, with values on the order of 10^-9 W/m2 and structure in space and time, which you can see here in both time-average map and zonal-mean fields:
Screen Shot 2021-11-02 at 3 18 18 PM
Screen Shot 2021-11-02 at 3 18 31 PM

If we run the same thing in full-FATES mode, the behavior is pretty different. First I have to plot it as log(abs(RAD_ERROR)) to span the wide dynamic range, and in map and zonal-mean views it looks like this:
Screen Shot 2021-11-02 at 3 22 29 PM
Screen Shot 2021-11-02 at 3 23 15 PM

So as I see it, there are three categories of gridcells if full-FATES mode: some where the error is still order 10^-8-10^-9 W/m2, which are in arid-semi-arid ecosystems, some where the error is order 10^-1 W/m2, which are in forested ecosystems, and then a handfull of gridcells in the high arctic where the error is order 10^20ish W/m2. So I think those lsat ones are the ones that are causing the model to fail when debug is turned on and leading to the test failures.

Given that neither of the types of badly-behaved gridcells show up in FATES-SP mode, I was wondering which of the differences in model dynamics between those configurations might be driving things, e.g.:

  • SP mode only has one canopy level, whereas full-FATES can have two (or more)
  • SP mode has well-behaved LAI that can't be tiny or too large, whereas full FATES has fewer guardrails
  • SP mode has no patch or cohort creation or fusion whereas full-FATES does.

I am pretty sure, just based on spatial patterns alone, that the 10^-1 error gridcells are from multiple canopy layers (see next map of mean number of canopy layers). I tried setting nclmax to one in order to confirm, but it crashed when I tried to reuse my old restart file and I didn't pursue that anymore. So we're going to have to look into that, but I'll set that aside for now as I don't think its what's crashing the model.

Screen Shot 2021-11-02 at 3 51 27 PM

For the 10^20 W/m2 errors, its less clear to me. Since I initially suspected uninitialized values of patch creation, I tried running in both FATES-ST3 (i.e. no phenology, no growth, no disturbance) mode, and also in full-FATES with no patch creation by setting mort_disturb_frac to zero and turning off fire. The FATES-ST3 run had none of the 10^20 W/m2 error gridcells but did have the 10^0 gridcells. The full-FATES but no new patch creation had both types of bad gridcells. So that said that the 10^20 error is not related to disturbance per se but is related to something relevant to plant growth at high latitudes. I turned off the snow occlusion of LAI by asserting ELAI = TLAI and that didn't fix the problem. I looked at patterns of LAI and it does seem that the really bad gridcells correspond to ones with really small LAI values (see next map of log(max(ELAI)):

Screen Shot 2021-11-02 at 3 53 58 PM

So then I looked into the lai_min parameter, which was reinstated via #739 to prevent overflows (we had removed it in #733 but then reversed that, so #261 should really be reopened). I set the ai_min value from 0.1 to 10^-3 (any smaller than that and the model crashes), and that changed the value of the 10^20 W/m2 errors by a couple orders of magnitude, and so helped confirm that this is an edge case related to super-small LAI values.

So I think what's going on is that there is some divide-by-tiny-number error that is causing these radiation error blowups. Interestingly they don't seem to show up in other downstream things, so possibly this only happens in cases where the plant-occupied fraction of the patches is also tiny and so the signal is getting swamped by the better-behaved bare-ground patch?

Anyway, I haven't fixed the problem but I think I've narrowed it down a lot.

@ckoven
Copy link
Contributor

ckoven commented Nov 3, 2021

OK I think I figured out what's going on that's causing the crashes, and fixed it. Basically on patches with a really small fractional coverage by vegetation, the calculations are blowing up because of ftweight being in the denominator so much. But because the whole norman radiation scheme is only applied to the vegetation-covered fraction of the gridcells, the radiation errors should therefore also be scaled by currentPatch%total_canopy_area / currentPatch%area when treated as a patch-level quantity and output to history. So when I added that multiplier, the problem of really huge radiation errors mostly went away, but there were still a couple instances where it didn't, because it was multiplying a really huge number with a really tiny number and running into precision errors, so I also added a further check to only include track the radiation_error at all when currentPatch%total_canopy_area / currentPatch%area was greater than some tiny number. Doing both of those things makes the really bad gridcells go away, and also let it complete when I tried running it with DEBUG=TRUE.

See maps and zonal-mean plots of log(abs(RAD_ERROR)) below. There is still clearly a problem that we need to fix for the multiple-canopy-layer case -- some of these errors are not small. But that should I think be a separate PR from this.

Screen Shot 2021-11-03 at 3 31 54 PM

Screen Shot 2021-11-03 at 3 32 15 PM

PR to the PR branch with this change is rosiealice#10

@rosiealice
Copy link
Contributor Author

Phew. Thanks @ckoven ! So, the upshot is that this PR with the radiation error will now not crash, but nonetheless it will tell us that the error is large in full-fates cases?

There is a lot of complexity in the multiple canopy layer representation relating to how light gets through layers which are partially filled , spatially. I do wonder if it makes sense to try and do a run with the maximum NCL set to 1 (from bare ground) such that one can isolate whether it's the multiple canopy layers per se or just the n>1 cohorts that is the source of the problem.

Good thing that SP mode is behaving moderately sensibly to act as a backstop, even if this is troublesome...

Thanks for doing this. Much appreciated.

@ckoven
Copy link
Contributor

ckoven commented Nov 3, 2021

Yep, and agreed. I'll start that run off to confirm the multi-layer hypothesis. Thank you @rosiealice for this PR!

fix for patch%radiation_error
@ckoven
Copy link
Contributor

ckoven commented Nov 4, 2021

brief update to answer @rosiealice's question of what happens when I run with a single canopy layer from bare ground in full-FATES configuration. Turns out that the radiation errors are still much higher (order 10^-1 W/m2) than in the FATES-SP mode case. So not sure exactly what's behind them, but I guess its not required to have multiple canopy levels for that to happen. Also there still seem to be some very-high-error (10^20ish) blips sneaking through. That is consistent with @glemieux finding that the revised code also is still running into crashing situations when in debug mode in the test suite.

Screen Shot 2021-11-04 at 11 52 18 AM

Screen Shot 2021-11-04 at 11 52 30 AM

@glemieux
Copy link
Contributor

glemieux commented Nov 4, 2021

Also there still seem to be some very-high-error (10^20ish) blips sneaking through. That is consistent with @glemieux finding that the revised code also is still running into crashing situations when in debug mode in the test suite.

This appears to be strongly correlated with the check on the patch variable nrad that occurs here in ED_Norman_Radiation:

if (maxval(currentPatch%nrad(1,:))==0)then
!there are no leaf layers in this patch. it is effectively bare ground.
! no radiation is absorbed
bc_out(s)%fabd_parb(ifp,:) = 0.0_r8
bc_out(s)%fabi_parb(ifp,:) = 0.0_r8
do ib = 1,hlm_numSWb
bc_out(s)%albd_parb(ifp,ib) = bc_in(s)%albgr_dir_rb(ib)
bc_out(s)%albi_parb(ifp,ib) = bc_in(s)%albgr_dif_rb(ib)
bc_out(s)%ftdd_parb(ifp,ib)= 1.0_r8
!bc_out(s)%ftid_parb(ifp,ib)= 1.0_r8
bc_out(s)%ftid_parb(ifp,ib)= 0.0_r8
bc_out(s)%ftii_parb(ifp,ib)= 1.0_r8
enddo
else
call PatchNormanRadiation (currentPatch, &
bc_out(s)%albd_parb(ifp,:), &
bc_out(s)%albi_parb(ifp,:), &
bc_out(s)%fabd_parb(ifp,:), &
bc_out(s)%fabi_parb(ifp,:), &
bc_out(s)%ftdd_parb(ifp,:), &
bc_out(s)%ftid_parb(ifp,:), &
bc_out(s)%ftii_parb(ifp,:))
endif ! is there vegetation?

If the check returns .true. the call to PatchNormanRadiation will be skipped as well as the subsequent zero'ing of the radiation error. The weird thing is that the radiation error patch level variable is being initialized to zero for each new patch, so I'm not sure why that result in the value increasing. That said, I wrote out some diagnostics and found that all patches where the error is blowing up is indeed showing maxval(currentPatch%nrad(1,:))==0. My guess is that there is some floating point issue with doing the radiation error update in ED_SunShadeFracs. Note that this subroutine is called directly from clmfates_interfaceMod via wrap_sunfrac in clm_drv: https://github.com/rosiealice/fates/blob/1ce8c80b65869efa6e99813a2b5fa60745771ea9/biogeophys/EDSurfaceAlbedoMod.F90#L1240-L1241

Perhaps we can zero out this error within the above nrad check?

Incorporate leaf xylem water potential to address over estimation of transpiration.  Additionally, provide an option to use _gross in BB stomatal conductance model
@glemieux
Copy link
Contributor

glemieux commented Nov 4, 2021

After implementing fc4a301, I merged in the latest tags and retested the fates suite. All expected tests now pass with DIFFs, as expected. Testing directory can be found here: /glade/u/home/glemieux/scratch/ctsm-tests/tests_saibug_fix-nradcheck_fix

Note that the ctsm aux_clm suite still needs to be exercised. This PR will be integrated after ctsm5.1.dev062 has been integrated.

@rosiealice
Copy link
Contributor Author

Nice job Greg!!!

On the multiple layers thing, that's interesting that it's not the multiple layers themselves. I suspect it's to do with the complexity in the canopy structure when there are multiple cohorta of different LAI depths. That creates lots of oddities in how light is passed through. For example, if cohort 1 has an Lai of 4 and cohort 2 has 6, then for the layers 5 and 6 light has to pass through space for Cohort #1 and a leaf matrix for #2, both on the way up and and on the way down. Ive been trying to figure out a test that will stop this happening in full fates mode. If only to check out this theory. I.e. by changing how the LAI is distributed into the boxes. One way might be to 'spreae out' the leaf area in the spatially incomplete layers so that the canopy is always 'rectangular'. That might not make scientific sense but it would allow this to be checked at least!

@glemieux glemieux added the tag: next Next PR to be incorporated label Nov 18, 2021
@glemieux
Copy link
Contributor

All test pass with DIFFs as expected on izumi: /scratch/cluster/glemieux/ctsm-tests/tests_pr750-fates

This set of changes by Junyan Ding, updates the Van Genuchten P-V and P-K relationships to use the secondary pore side distribution parameter N. This also fixes a redundant parameter entry for lwp_k.
@glemieux
Copy link
Contributor

Final testing is complete. All expected tests pass, with DIFFs as expected:

  • cheyenne: /glade/u/home/glemieux/scratch/ctsm-tests/tests_pr750-fates-dev062
  • izumi: /scratch/cluster/glemieux/ctsm-tests/tests_pr750-fates-dev062-3

@glemieux glemieux merged commit 2dc17d5 into NGEET:master Nov 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug - science ctsm An issue is related to ctsm host land model or a particular PR has a corresponding ctsm-side PR tag: next Next PR to be incorporated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use of SAI in surface albedo routines.
5 participants