Stem area index AND canopy snow radiation fixes. #750

rosiealice · 2021-06-07T12:34:25Z

This PR addresses issue #744, by implementing leaf layer level optical properties, and using these instead of the previous, erroneous, usage of rhol and taul in the radiation transfer code.

It also now seems likely that the absence of a proper treatment of the radiative effects of canopy snow interception in FATES is causing large discrepancies between FATES-SP and CLM5-SP simulations.

The implementation of snow on canopy layers needs to follow from the fix to the treatment of stem area, as it affects the reflective and transmission parameters that are weighted by other changes in this PR, hence, I have bundled them together.

The physics/energy/hydrology dynamics of canopy snow interception, sublimation, melting, etc. are handled by the HLM, and so here there is a modification to the interface to pass fcansno (the fraction of the canopy that has snow on) into FATES.

This (fcansno) is currently a patch level property so is applied uniformly to all FATES layers.

In CLM5, the treatment of snow affect the 'omega' parameter of the big leaf model, which is the reflectance + transmittance of the leaves. This is not applicable to FATES RTM and so instead I have modified the rho and tau parameters directly to account for the snow. The reflectance parameters for snow are hard-wired here but they could be put in the parameter file, or be sent from the HLM, or hard-wired somewhere more sensible (I couldn't figure out where that might be though) as people see fit.

Description:

This PR has a ctsm counter-part: ESCOMP/CTSM#1531

Collaborators:

Expectation of Answer Changes:

This is expected to be (slightly!) answer changing, where SAI>0

Checklist:

My change requires a change to the documentation.
I have updated the in-code documentation .AND. (the technical note .OR. the wiki) accordingly.
I have read the CONTRIBUTING document.
FATES PASS/FAIL regression tests were run
If answers were expected to change, evaluation was performed and provided

Test Results:

I have tested these changes in full FATES mode, (this PR being independant of SP mode).

Analysis of results is ongoing. I wanted to get the PR up for the sake of discussion.

CTSM (or) E3SM (specify which) test hash-tag:

CTSM (or) E3SM (specify which) baseline hash-tag:

FATES baseline hash-tag:

Test Output:

rgknox · 2021-06-07T17:30:41Z

biogeophys/EDSurfaceAlbedoMod.F90

+                    if(total_lai_sai(L,ft,iv).gt.0._r8)then
+                      frac_lai = currentPatch%elai_profile(L,ft,iv)/total_lai_sai(L,ft,iv)
+                      frac_sai = 1.0_r8 - frac_lai
+                      f_abs(L,ft,iv,ib) = 1.0_r8 - (frac_lai*(rhol(ft,ib) + taul(ft,ib))+&


Is this initialized when total_lai_sai<=0?
If uninitialized, could propogate nans, even in cases where it is multiplied by zero.

Good catch @rgknox. That might explain some of the slightly odd behaviour I'm currently chasing...

I've taken that if statement out now...

rosiealice · 2021-06-23T21:46:35Z

Made some changes to simplify this PR, and went through the process of checking whether it gave identical answers to the main branch when frac_lai = 1.0 , which it should, and now does.

At the moment this makes SABV go up, and GPP do down. My hypothesis is that this is to do with the vertical layering and that the layers getting more radiation are light saturated and that the layers getting less are not. Will plot out vertical profiles to check, then this will boe good to go I think.

rosiealice · 2021-06-29T14:25:09Z

On further consideration, it is in fact impossible to assess the impacts of this change in the absence, ironically, of SP mode. Either I need to merge this into the SP branch for testing, or wait until it is fixed/integrated. Not sure it's worth doing all of that merging or if it's best to wait until the SP tag is ready.

The impact of this change, in any case, will be a decline in GPP resulting from the non-use of PAR absorbed by stems (which is current;y being used for PSN)

rgknox · 2021-06-30T15:11:53Z

@rosiealice
I'm wondering what we should do with the error checking, see here:

https://github.com/NGEET/fates/pull/750/files#diff-d6ad5d3f7cd1ec4c5194c3821f739e5d6bde18ba02c4a2174e97774f03ed291cR1005

I think there are few things here to consider:

I would recommend that if we don't actually kill the run, we should only generate log messages when a local debug flag is on.
What is an acceptable level of error, and should we really consider killing the run if we don't meet that threshold. I see that the error is being added back in to force energy conservation (which is a good thing). I'm speaking extemporaneously, trying to decide if 0.15 is something to worry about
Maybe re-arrange the sequence so that the "if direct/indirect" is inside the "if exceeds error" logical, might make the code shorter and more readable, not sure

rosiealice · 2021-07-02T13:51:35Z

I think the thing to do is to actually print out the error and then look at what the status quo is before we decide what to do with all these thresholds. Then we'll know whether there is an existing problem, and also how it will respond to changes in e.g. the SAI handling etc.

I'd put a placeholder to track the error and on my branch am using an existing history field to output it. Need to make a new field to add to this PR.

glemieux · 2021-10-22T23:57:34Z

Reviewing the other tests, particularly the non-debug tests reveals that more are seeing a blowing up of the radiation_error:
FatesColdDefSizeAgeMort, FatesColdDefReducedComplexFixedBiogeo, FatesColdDefLogging, FatesColdDefNoFire, Fates

rgknox · 2021-10-25T14:34:09Z

It seems like the accumulating error should be affected by the footprint fraction of the scattering element. For instance, if the leaf layer in question that is generating some error, has a very very tiny footprint compared to the rest of the canopy, it would not contribute as much to the total energy imbalance. The energy balance is W/m2 of patch, as I understand. If the horizontal footprint of each leaf-layer element is the PFT's area, then the error term is /m2 of pft. We need to multiply each error by [m2 pft / m2 patch] ? (or maybe not, checking units on these errors and where they are defined...)

https://github.com/rosiealice/fates/blob/e760438f82bc02422165a167a048e9c75906dffb/biogeophys/EDSurfaceAlbedoMod.F90#L1002

@glemieux , it does look like the error term is being zero'd properly, I'm just wondering if the errors are growing because the canopy is growing, and with more stuff to reflect, there is more error.

rgknox · 2021-10-25T15:02:58Z

In these lines:

https://github.com/rosiealice/fates/blob/e760438f82bc02422165a167a048e9c75906dffb/biogeophys/EDSurfaceAlbedoMod.F90#L974-L976

It seems to me that we are subtracting the total fraction of direct radiation absorbed by the ground from the total fraction of radiation absorbed by the canopy. I don't think that is an error though, right? Don't we want to add them together and compare to how much is upwelling (ie total reflected)?

EDIT: The descriptions on variables are not correct, need to track these down. e.g. sabs_dir, sabs_dif, local: abs_rad()

sabs_dir is filled by the local abs_dir variable, but the abs_dir variables is tough to follow, not sure what it is supposed to be.

ie: in = absorbed + reflected

             error = abs(currentPatch%sabs_dir(ib) - (currentPatch%tr_soil_dir(ib) * &
                  (1.0_r8-currentPatch%gnd_alb_dir(ib) ) + &
                  currentPatch%tr_soil_dir_dif(ib) * (1.0_r8-currentPatch%gnd_alb_dif(ib)     )))

rgknox · 2021-10-25T15:46:13Z

Also possible that the zero'ing of the radiation_error is happening inside the "radtype" loop, but seems like it should be outside and prior to this loop?

https://github.com/rosiealice/fates/blob/e760438f82bc02422165a167a048e9c75906dffb/biogeophys/EDSurfaceAlbedoMod.F90#L595

jkshuman · 2021-10-25T19:51:16Z

This came up in discussion during the SE call. The consensus was that there needs to be a general clean-up of radiation identified in issue #794 For this current PR the plan was to address this immediate problem without refactoring the radiation code and then address a clean-up, refactor, and stand-alone unit test as part of a new issue or within #794 @glemieux @rgknox

ckoven · 2021-11-02T23:00:26Z

I went down a bit of a rabbit hole today trying to understand these radiation errors that are behind the test failures that @glemieux ran into in #750 (comment). I think there are (at least) two distinct problems with different causes; and interestingly they both appear to be situations that don't happen in FATES-SP mode and so don't show up when we look at that.

So if we look at the radiation error in FATES-SP mode, we get something that looks well behaved, with values on the order of 10^-9 W/m2 and structure in space and time, which you can see here in both time-average map and zonal-mean fields:

If we run the same thing in full-FATES mode, the behavior is pretty different. First I have to plot it as log(abs(RAD_ERROR)) to span the wide dynamic range, and in map and zonal-mean views it looks like this:

So as I see it, there are three categories of gridcells if full-FATES mode: some where the error is still order 10^-8-10^-9 W/m2, which are in arid-semi-arid ecosystems, some where the error is order 10^-1 W/m2, which are in forested ecosystems, and then a handfull of gridcells in the high arctic where the error is order 10^20ish W/m2. So I think those lsat ones are the ones that are causing the model to fail when debug is turned on and leading to the test failures.

Given that neither of the types of badly-behaved gridcells show up in FATES-SP mode, I was wondering which of the differences in model dynamics between those configurations might be driving things, e.g.:

SP mode only has one canopy level, whereas full-FATES can have two (or more)
SP mode has well-behaved LAI that can't be tiny or too large, whereas full FATES has fewer guardrails
SP mode has no patch or cohort creation or fusion whereas full-FATES does.

I am pretty sure, just based on spatial patterns alone, that the 10^-1 error gridcells are from multiple canopy layers (see next map of mean number of canopy layers). I tried setting nclmax to one in order to confirm, but it crashed when I tried to reuse my old restart file and I didn't pursue that anymore. So we're going to have to look into that, but I'll set that aside for now as I don't think its what's crashing the model.

For the 10^20 W/m2 errors, its less clear to me. Since I initially suspected uninitialized values of patch creation, I tried running in both FATES-ST3 (i.e. no phenology, no growth, no disturbance) mode, and also in full-FATES with no patch creation by setting mort_disturb_frac to zero and turning off fire. The FATES-ST3 run had none of the 10^20 W/m2 error gridcells but did have the 10^0 gridcells. The full-FATES but no new patch creation had both types of bad gridcells. So that said that the 10^20 error is not related to disturbance per se but is related to something relevant to plant growth at high latitudes. I turned off the snow occlusion of LAI by asserting ELAI = TLAI and that didn't fix the problem. I looked at patterns of LAI and it does seem that the really bad gridcells correspond to ones with really small LAI values (see next map of log(max(ELAI)):

So then I looked into the lai_min parameter, which was reinstated via #739 to prevent overflows (we had removed it in #733 but then reversed that, so #261 should really be reopened). I set the ai_min value from 0.1 to 10^-3 (any smaller than that and the model crashes), and that changed the value of the 10^20 W/m2 errors by a couple orders of magnitude, and so helped confirm that this is an edge case related to super-small LAI values.

So I think what's going on is that there is some divide-by-tiny-number error that is causing these radiation error blowups. Interestingly they don't seem to show up in other downstream things, so possibly this only happens in cases where the plant-occupied fraction of the patches is also tiny and so the signal is getting swamped by the better-behaved bare-ground patch?

Anyway, I haven't fixed the problem but I think I've narrowed it down a lot.

…ion is tiny

ckoven · 2021-11-03T22:40:23Z

OK I think I figured out what's going on that's causing the crashes, and fixed it. Basically on patches with a really small fractional coverage by vegetation, the calculations are blowing up because of ftweight being in the denominator so much. But because the whole norman radiation scheme is only applied to the vegetation-covered fraction of the gridcells, the radiation errors should therefore also be scaled by currentPatch%total_canopy_area / currentPatch%area when treated as a patch-level quantity and output to history. So when I added that multiplier, the problem of really huge radiation errors mostly went away, but there were still a couple instances where it didn't, because it was multiplying a really huge number with a really tiny number and running into precision errors, so I also added a further check to only include track the radiation_error at all when currentPatch%total_canopy_area / currentPatch%area was greater than some tiny number. Doing both of those things makes the really bad gridcells go away, and also let it complete when I tried running it with DEBUG=TRUE.

See maps and zonal-mean plots of log(abs(RAD_ERROR)) below. There is still clearly a problem that we need to fix for the multiple-canopy-layer case -- some of these errors are not small. But that should I think be a separate PR from this.

PR to the PR branch with this change is rosiealice#10

rosiealice · 2021-11-03T23:40:03Z

Phew. Thanks @ckoven ! So, the upshot is that this PR with the radiation error will now not crash, but nonetheless it will tell us that the error is large in full-fates cases?

There is a lot of complexity in the multiple canopy layer representation relating to how light gets through layers which are partially filled , spatially. I do wonder if it makes sense to try and do a run with the maximum NCL set to 1 (from bare ground) such that one can isolate whether it's the multiple canopy layers per se or just the n>1 cohorts that is the source of the problem.

Good thing that SP mode is behaving moderately sensibly to act as a backstop, even if this is troublesome...

Thanks for doing this. Much appreciated.

ckoven · 2021-11-03T23:43:21Z

Yep, and agreed. I'll start that run off to confirm the multi-layer hypothesis. Thank you @rosiealice for this PR!

fix for patch%radiation_error

ckoven · 2021-11-04T19:01:08Z

brief update to answer @rosiealice's question of what happens when I run with a single canopy layer from bare ground in full-FATES configuration. Turns out that the radiation errors are still much higher (order 10^-1 W/m2) than in the FATES-SP mode case. So not sure exactly what's behind them, but I guess its not required to have multiple canopy levels for that to happen. Also there still seem to be some very-high-error (10^20ish) blips sneaking through. That is consistent with @glemieux finding that the revised code also is still running into crashing situations when in debug mode in the test suite.

glemieux · 2021-11-04T19:36:36Z

Also there still seem to be some very-high-error (10^20ish) blips sneaking through. That is consistent with @glemieux finding that the revised code also is still running into crashing situations when in debug mode in the test suite.

This appears to be strongly correlated with the check on the patch variable nrad that occurs here in ED_Norman_Radiation:

fates/biogeophys/EDSurfaceAlbedoMod.F90

Lines 127 to 153 in e1cf819

    
           if (maxval(currentPatch%nrad(1,:))==0)then 
        
              !there are no leaf layers in this patch. it is effectively bare ground.  
        
              ! no radiation is absorbed   
        
              bc_out(s)%fabd_parb(ifp,:) = 0.0_r8 
        
              bc_out(s)%fabi_parb(ifp,:) = 0.0_r8 
        
              do ib = 1,hlm_numSWb 
        
                 bc_out(s)%albd_parb(ifp,ib) = bc_in(s)%albgr_dir_rb(ib) 
        
                 bc_out(s)%albi_parb(ifp,ib) = bc_in(s)%albgr_dif_rb(ib) 
        
                 bc_out(s)%ftdd_parb(ifp,ib)= 1.0_r8 
        
                 !bc_out(s)%ftid_parb(ifp,ib)= 1.0_r8 
        
                 bc_out(s)%ftid_parb(ifp,ib)= 0.0_r8 
        
                 bc_out(s)%ftii_parb(ifp,ib)= 1.0_r8 
        
              enddo 
        
           else 
        
              call PatchNormanRadiation (currentPatch, & 
        
                   bc_out(s)%albd_parb(ifp,:), & 
        
                   bc_out(s)%albi_parb(ifp,:), & 
        
                   bc_out(s)%fabd_parb(ifp,:), & 
        
                   bc_out(s)%fabi_parb(ifp,:), & 
        
                   bc_out(s)%ftdd_parb(ifp,:), & 
        
                   bc_out(s)%ftid_parb(ifp,:), & 
        
                   bc_out(s)%ftii_parb(ifp,:)) 
        
           endif ! is there vegetation?

If the check returns .true. the call to PatchNormanRadiation will be skipped as well as the subsequent zero'ing of the radiation error. The weird thing is that the radiation error patch level variable is being initialized to zero for each new patch, so I'm not sure why that result in the value increasing. That said, I wrote out some diagnostics and found that all patches where the error is blowing up is indeed showing maxval(currentPatch%nrad(1,:))==0. My guess is that there is some floating point issue with doing the radiation error update in ED_SunShadeFracs. Note that this subroutine is called directly from clmfates_interfaceMod via wrap_sunfrac in clm_drv: https://github.com/rosiealice/fates/blob/1ce8c80b65869efa6e99813a2b5fa60745771ea9/biogeophys/EDSurfaceAlbedoMod.F90#L1240-L1241

Perhaps we can zero out this error within the above nrad check?

Incorporate leaf xylem water potential to address over estimation of transpiration. Additionally, provide an option to use _gross in BB stomatal conductance model

glemieux · 2021-11-04T23:53:00Z

After implementing fc4a301, I merged in the latest tags and retested the fates suite. All expected tests now pass with DIFFs, as expected. Testing directory can be found here: /glade/u/home/glemieux/scratch/ctsm-tests/tests_saibug_fix-nradcheck_fix

Note that the ctsm aux_clm suite still needs to be exercised. This PR will be integrated after ctsm5.1.dev062 has been integrated.

rosiealice · 2021-11-05T00:58:45Z

Nice job Greg!!!

On the multiple layers thing, that's interesting that it's not the multiple layers themselves. I suspect it's to do with the complexity in the canopy structure when there are multiple cohorta of different LAI depths. That creates lots of oddities in how light is passed through. For example, if cohort 1 has an Lai of 4 and cohort 2 has 6, then for the layers 5 and 6 light has to pass through space for Cohort #1 and a leaf matrix for #2, both on the way up and and on the way down. Ive been trying to figure out a test that will stop this happening in full fates mode. If only to check out this theory. I.e. by changing how the LAI is distributed into the boxes. One way might be to 'spreae out' the leaf area in the spatially incomplete layers so that the canopy is always 'rectangular'. That might not make scientific sense but it would allow this to be checked at least!

glemieux · 2021-11-19T20:36:59Z

All test pass with DIFFs as expected on izumi: /scratch/cluster/glemieux/ctsm-tests/tests_pr750-fates

This set of changes by Junyan Ding, updates the Van Genuchten P-V and P-K relationships to use the secondary pore side distribution parameter N. This also fixes a redundant parameter entry for lwp_k.

glemieux · 2021-11-22T22:59:02Z

Final testing is complete. All expected tests pass, with DIFFs as expected:

cheyenne: /glade/u/home/glemieux/scratch/ctsm-tests/tests_pr750-fates-dev062
izumi: /scratch/cluster/glemieux/ctsm-tests/tests_pr750-fates-dev062-3

rosiealice added 9 commits June 1, 2021 03:20

sai bug changes first pass

227ded7

addeding LAI+SAI criteria to radiation loop check

54f158b

added error checking, and zero lai+sai capacity

ff4c61a

added tracking of residual radiation error for diagnosis

789342d

removed frac_lai dummy args

c892b4e

rewind tracking of radiation errors

6f8acfb

removing Hui's bugfix

f2b207e

whitespace corrections

63e35d7

revert lgerror changes

fa3874b

rosiealice changed the title ~~Sai bugfix~~ Stem area radiation bugfix Jun 7, 2021

rgknox reviewed Jun 7, 2021

View reviewed changes

rosiealice added bug - science PR status: Not Ready The author is signaling that this PR is a work in progress and not ready for integration. labels Jun 7, 2021

rosiealice added 7 commits June 9, 2021 04:15

removed total_lai_sai to simplify changes

870b36c

reformatted esai statemnts, made laifrac=1

4e81d3e

whitespace changes

89729a2

more whitespace changes

f796749

revert ftid change

6919a2f

removed fraclai=1

17b7688

remove f_abs line, add fraclai check an zero out new indicies

924c551

rosiealice removed the PR status: Not Ready The author is signaling that this PR is a work in progress and not ready for integration. label Jun 23, 2021

modified photosynthetic radiation

848bc48

adding in error tracking variable

e670674

rosiealice added 2 commits July 2, 2021 08:48

added radiation error output term

a717efc

snow on canopy implementation

1fbcb7d

rosiealice changed the title ~~Stem area radiation bugfix~~ Stem area index AND canopy snow radiation fixes. Jul 19, 2021

ckoven added 3 commits November 3, 2021 14:07

scaling patch%rad_error by patch vegetated fraction

356dbd5

adding further logic to ignore rad_error when veg-covered patch fract…

56746e0

…ion is tiny

removed FatesPFTIndexSwapper_rf.py

0dfa707

ckoven mentioned this pull request Nov 3, 2021

fix for patch%radiation_error rosiealice/fates#10

Merged

Merge pull request #10 from ckoven/SAI_bugfix

1ce8c80

fix for patch%radiation_error

glemieux added 3 commits November 4, 2021 15:30

zeroing out the rad error if no leaves are present

fc4a301

Merge tag 'sci.1.49.0_api.17.0.0' into SAI_bugfix-nradcheckzero

9eaa0e4

Incorporate leaf xylem water potential to address over estimation of transpiration. Additionally, provide an option to use _gross in BB stomatal conductance model

adding comments to radiation_error calcs

3e098e5

glemieux added the tag: next Next PR to be incorporated label Nov 18, 2021

Merge tag 'sci.1.50.0_api.17.0.0' into SAI_bugfix

0837c3c

This set of changes by Junyan Ding, updates the Van Genuchten P-V and P-K relationships to use the secondary pore side distribution parameter N. This also fixes a redundant parameter entry for lwp_k.

glemieux merged commit 2dc17d5 into NGEET:master Nov 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stem area index AND canopy snow radiation fixes. #750

Stem area index AND canopy snow radiation fixes. #750

rosiealice commented Jun 7, 2021 •

edited by glemieux

Loading

rgknox Jun 7, 2021

rosiealice Jun 7, 2021

rosiealice Jun 23, 2021

rosiealice commented Jun 23, 2021

rosiealice commented Jun 29, 2021

rgknox commented Jun 30, 2021

rosiealice commented Jul 2, 2021

glemieux commented Oct 22, 2021

rgknox commented Oct 25, 2021 •

edited

Loading

rgknox commented Oct 25, 2021 •

edited

Loading

rgknox commented Oct 25, 2021

jkshuman commented Oct 25, 2021

ckoven commented Nov 2, 2021

ckoven commented Nov 3, 2021

rosiealice commented Nov 3, 2021

ckoven commented Nov 3, 2021

ckoven commented Nov 4, 2021

glemieux commented Nov 4, 2021

glemieux commented Nov 4, 2021

rosiealice commented Nov 5, 2021

glemieux commented Nov 19, 2021

glemieux commented Nov 22, 2021

Stem area index AND canopy snow radiation fixes. #750

Stem area index AND canopy snow radiation fixes. #750

Conversation

rosiealice commented Jun 7, 2021 • edited by glemieux Loading

Description:

Collaborators:

Expectation of Answer Changes:

Checklist:

Test Results:

rgknox Jun 7, 2021

Choose a reason for hiding this comment

rosiealice Jun 7, 2021

Choose a reason for hiding this comment

rosiealice Jun 23, 2021

Choose a reason for hiding this comment

rosiealice commented Jun 23, 2021

rosiealice commented Jun 29, 2021

rgknox commented Jun 30, 2021

rosiealice commented Jul 2, 2021

glemieux commented Oct 22, 2021

rgknox commented Oct 25, 2021 • edited Loading

rgknox commented Oct 25, 2021 • edited Loading

rgknox commented Oct 25, 2021

jkshuman commented Oct 25, 2021

ckoven commented Nov 2, 2021

ckoven commented Nov 3, 2021

rosiealice commented Nov 3, 2021

ckoven commented Nov 3, 2021

ckoven commented Nov 4, 2021

glemieux commented Nov 4, 2021

glemieux commented Nov 4, 2021

rosiealice commented Nov 5, 2021

glemieux commented Nov 19, 2021

glemieux commented Nov 22, 2021

rosiealice commented Jun 7, 2021 •

edited by glemieux

Loading

rgknox commented Oct 25, 2021 •

edited

Loading

rgknox commented Oct 25, 2021 •

edited

Loading