Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BalanceCheck: Calls get_proc_bounds from threaded region #164

Closed
ekluzek opened this issue Dec 16, 2017 · 2 comments · Fixed by #1447
Closed

BalanceCheck: Calls get_proc_bounds from threaded region #164

ekluzek opened this issue Dec 16, 2017 · 2 comments · Fixed by #1447
Assignees
Labels
bug something is working incorrectly priority: low Background task that doesn't need to be done right away.

Comments

@ekluzek
Copy link
Collaborator

ekluzek commented Dec 16, 2017

Gautam Bisht < gbisht > - 2017-04-05 14:01:12 -0600
Bugzilla Id: 2439
Bugzilla Depends: 2027, 2310, 2384,
Bugzilla CC: rfisher,

The bug was discovered by Azamat Mametjanov (azamat@mcs.anl.gov) with ACME, but it is also applicable to CLM. BalanceCheck() is called by each thread. When a balance check is violated, the thread attempts to determine the global index and if CLM had more than 1 thread, the code will crash. Below is the call sequence:

BalanceCheck()

  • GetGlobalIndex()
    • call get_proc_bounds()
@ekluzek ekluzek added this to the clm5 milestone Dec 16, 2017
@ekluzek
Copy link
Collaborator Author

ekluzek commented Dec 16, 2017

Erik Kluzek < erik > - 2017-05-02 13:45:04 -0600

Note, we have several other bugs recognized with this problem. I added them to the depends on above.

@ekluzek ekluzek modified the milestones: clm5, future Jan 19, 2018
@ekluzek ekluzek removed this from the future milestone May 29, 2019
@ekluzek ekluzek added priority: low Background task that doesn't need to be done right away. bug something is working incorrectly labels May 29, 2019
mariuslam pushed a commit to NordicESMhub/ctsm that referenced this issue Aug 26, 2019
Merge branch 'rgknox-leafnpp-diag-fix'

This pull request mostly deals with some bug fixes to carbon
accounting. @jenniferholm noticed that the diagnostics for leaf_npp
were periodically showing negative values. Through ESCOMP#164 we identified
that this occurred because daily carbon balances during the allocation
sequences were sometimes negative to to high respiration and low
gpp. The model was correctly using storage carbon to "pay" the
negative daily carbon balance and allow maintenance respiration to
occur, it was not however correctly diagnosing this flow. The
accounting of this process is a little complicated because storage can
be used to pay for maintenance respiration as well as maintenance
turnover demand.

During the process of verifying that carbon accounting errors were
low, I triggered spurious values of output variables that triggered
netcdf write errors, this lead to the identification that
cohort%npp_accum was not being properly copied during copying of
cohorts during patch fission.

A fix was needed to lawrencium machine files for its lr2 partition for
serial runs. While these changes are unrelated to carbon accounting,
they are trivial and simple, so I bundled them here.

Fixes: ESCOMP#164, ESCOMP#169, ESCOMP#168 and possibly ESCOMP#154

User interface changes?: no

Code review: requesting @jenniferholm and @serbinsh for evaluation in
their science algorithms.

Testing:
  rgknox:
    Test suite: lawrencium lr3 (baseline) and lawrencium-lr2 (non-baseline) edTest, Rapid Science Check tool (single site multi-decadal analysis)
    Test baseline: 5c5928f
    Test namelist changes: none
    Test answer changes:
    Test summary: all PASS

  andre:
    Test suite: ed - yellowstone gnu, intel, pgi
                     hobart nag
    Test baseline: 30f84d7
    Test namelist changes: none
    Test answer changes: bit for bit
    Test summary: all tests pass

    Test suite: clm_short - yellowstone gnu, intel, pgi
    Test baseline: clm4_5_12_r195
    Test namelist changes: none
    Test answer changes: bit for bit
    Test summary: all tests pass
@billsacks
Copy link
Member

Some additional thoughts on this issue are given in #102. In particular, see my suggestion here #102 of adding an optional argument to get_proc_bounds that lets it proceed even if you're in a threaded region.

@billsacks billsacks self-assigned this Jul 20, 2021
billsacks added a commit that referenced this issue Aug 2, 2021
Improve and document method for getting diagnostics on a problem point

Following from the changes in ctsm5.1.dev047 to change the lower bounds
to 1 on all processors: The one downside of that change was that it made
it a bit harder to write diagnostics on a given point. This PR cleans up
the method for writing these diagnostics and adds relevant
documentation.

- Resolves #164 (BalanceCheck: Calls get_proc_bounds from
  threaded region)
- Resolves #1424 (Change all the endrun calls to write out
  the point context information)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something is working incorrectly priority: low Background task that doesn't need to be done right away.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants