-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multiple concurrent instances per MPI task #463
Support multiple concurrent instances per MPI task #463
Conversation
Changes to add an instance index for CCPP physics Changes to be committed: modified: scripts/mkstatic.py modified: src/ccpp_types.F90 modified: src/ccpp_types.meta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that for now a hardcoded maximum of 200 ensemble members is enough ... maybe that needs to become configurable in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am approving because it is not going to get in the way of our work.
However, like @climbfuji, I wonder about the hard coded limit. Can it be checked or eliminated?
scripts/mkstatic.py
Outdated
@@ -896,7 +896,7 @@ class Group(object): | |||
private | |||
public :: {subroutines} | |||
|
|||
logical, save :: initialized = .false. | |||
logical, dimension(200), save :: initialized = .false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The save
is redundant, it is on by default at module scope.
How / where is the index initialized (there is nothing in this PR to indicate that as far as I can tell)?
Could the 200
limit be a public parameter that can be checked somewhere in case of an overflow?
Better yet, is it known when the write
method is called? The actual size could be passed in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that having this as a public parameter is better than hard-coding it to 200 (though that's plenty high enough for NRL's purposes). I'm at the edge of my practical knowledge of CCPP's innards and would need to spend some time figuring out how to add the size parameter. Someone else reading this would probably know how to do it in 10 minutes. If so, I support that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michalakes I have opened a PR onto your branch that makes this value a little less hard-coded (defined in common.py). Ideally it would have been defined in the host model's config file, but it looks like the order of operations in the ccpp_prebuild.py script doesn't allow for that without relatively invasive changes, and that doesn't seem worth the effort for the prebuild script.
@gold2718 @climbfuji Let me know if you have any comments on this, or if there's something I'm missing. It does seem like it should be relatively easy to define this variable in the host config file with capgen, since all the config variables are available in the run_env
structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
save
is redundant, it is on by default at module scope. How / where is the index initialized (there is nothing in this PR to indicate that as far as I can tell)? Could the200
limit be a public parameter that can be checked somewhere in case of an overflow? Better yet, is it known when thewrite
method is called? The actual size could be passed in.
While the size can easily be entered and entered around passed around with capgen, that will not help with either non-compliant parameterizations or with capgen itself.
capgen makes use of module variables to handle automatic memory management for variables that need to be passed between suite phases (e.g., created at init time and used at run time). To accomodate multiple instances would require a pretty extensive reworking of this infrastructure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll just observe that instances is a one-dimensional array of logicals. One could hard code its dimension to 2000 (well beyond any foreseeable number of ensemble members) and it would still probably use less storage than what's devoted to string constants in CCPP. Perfect being enemy of the good, I would hard-code, putting in an error check somewhere, and move on with this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I respectfully request that it be approved and merged and that that we worry about hypotheticals if and when they arise. Agile.
I understand what you are saying but it would also help if you had followed the published workflow for this project (https://github.com/NCAR/ccpp-framework/wiki/Development-Workflow). Also, the governance committee is supposed to approve changes to the functionality of the framework before they are implemented. I will not be a holdout in the sense if the other reviewers approve, I will as well. However, the issues of non-compliant physics schemes are not hypotheticals. capgen is also not hypothetical and IMHO anything which makes unification harder should definitely be approved by the CCPP Framework governance committee.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of the purpose of the logical is to indicate if shared data that gets initialized/ingested only occurs once. In your example if each instance needed that array then it will be initialized.
I think these statements are correct. However, the module variables created by capgen are 2-D or 3-D variables and they are there because the suite modifies them. Every instance will need its own copy of each module variable which can be done but will require quite a bit of work because this is a completely new feature that was not anticipated. Since these are usually 3-D variables, hard coding a large number will indeed be a lot of memory. I do not see how the logical can help with this implementation (but please educate me if I'm still missing something).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I respectfully request that it be approved and merged and that that we worry about hypotheticals if and when they arise. Agile.
I understand what you are saying but it would also help if you had followed the published workflow for this project (https://github.com/NCAR/ccpp-framework/wiki/Development-Workflow). Also, the governance committee is supposed to approve changes to the functionality of the framework before they are implemented. I will not be a holdout in the sense if the other reviewers approve, I will as well. However, the issues of non-compliant physics schemes are not hypotheticals. capgen is also not hypothetical and IMHO anything which makes unification harder should definitely be approved by the CCPP Framework governance committee.
You're quite right and I respect your process. Could you (plural) please expedite this through the process. The PR is solid and addresses needed functionality. If there are tweaks needed then please go ahead. We've already done the heavy lifting on this. As to whether the feature was anticipated or not, I'd only point out that it supports ensembles and if it wasn't, it should have been.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As to whether the feature was anticipated or not, I'd only point out that it supports ensembles and if it wasn't, it should have been.
Understood, however, until now, I have never seen ensembles in the same MPI task so I missed that possibility.
I have not been going to any meetings (where I understand this PR has been discussed). Can anyone fill me (or us) in on the current thinking on this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, however, until now, I have never seen ensembles in the same MPI task so I missed that possibility. I have not been going to any meetings (where I understand this PR has been discussed). Can anyone fill me (or us) in on the current thinking on this PR?
Not sure if this is what you what you are asking (others please chime in), but in addition to supporting ensembles the feature is also needed to support our 4DVAR, from which the tangent linear and non-linear instances of the model are calling the same (but separate instances of) the physics. There are probably other use cases wherein multiple instances of the same physics would need to be invoked within the same process (superparameterizations, maybe?) Thanks for being open to considering this further.
I have been thinking about this PR and have a few concerns (so I'm withdrawing my approval for now). It seems that the real use case is that you want to run multiple instances of a model in the same set of tasks (multiple overlapping instances), is that correct? Can someone point me to the section of the current CCPP documentation (https://ccpp-techdoc.readthedocs.io/en/v6.0.0/) that covers rules for the CCPP framework? I do not see anything and in particular am having trouble finding anything similar to the "driver" rules section of the historical rules document (D1 -- D35 in https://dtcenter.org/sites/default/files/community-code/ccpp-requirements-historical.pdf). Still, looking at both documents, I do not see any requirement that states or implies that multiple overlapping instances of a model needs to be supported by the CCPP. I think this is important because supporting multiple overlapping instances puts requirements on the CCPP Framework as well as on CCPP Physics Parameterizations and a clear requirement will guide both framework and parameterization development. As pointed out in the issue, there are existing physics parameterizations that contain module variables (e.g., h2ointerp) that might contain model state. Please correct me if I am wrong but if a parameterization holds model state (e.g., an evolving cloud PDF) as a module variable, then multiple instances would all be trying to use the same data which seems bound to fail (even without threading). @michalakes sort of pointed this out back in March when he said:
Can the CCPP governance committee please issue a statement on this requirement and get it into documentation somewhere? If it is decided that supporting multiple overlapping instances of a physics suite is a CCPP requirement, can someone please add a "capgen unification" issue for it? |
@michalakes I remember having a second new member of the ccpp_types data structure to capture the maximum number of instances (default one). This could be set during the ccpp (framework) init phase and then used to tell the physics to allocate the heap data to the correct size? |
Done
Yes, we want multiple concurrent instances of a model, and thus multiple concurrent instances of CCPP and other libraries it uses, to work correctly when running on the same process, that is, MPI task. |
and
The reason the first step in our process is Open an Issue is so that discussions like this can happen well before the PR. I'm entering a discussion below since there is still no issue for this feature. As for capgen, I would treat this by adding a new dimension to module-level variables. I could allocate enough space to hold the data for each instance and select the current instance by using the Thoughts on this approach? Is this an approach that could also be used in mkstatic to avoid the hard-coding issue (which seems to be causing at least some discomfort aside from the memory issue)? And a side point, y'all must have a lot of memory to be able to run multiple instances in a single task (but maybe that's just the mental scar tissue left over from the Blue Gene era talking :) |
Sorry for my late reply, I was on leave for the last 2+ weeks. I think it's worth noting that @michalakes and @areinecke joined the CCPP developer meeting a few months ago to explain exactly what the problem is and how this PR addresses it. So I'd say they did what we expected them to do. Unfortunately @gold2718 wasn't on that call and therefore didn't get all the information. I get it that issues need to be created, but at the same time the developer meeting is the main system for discussing/proposing changes to the framework and if someone makes the effort to show up there that should be good enough. Regarding the proposed solution for capgen above (#463 (comment)), this is what I would have done for |
Sure but there is still a community that will not find out what is going on this way (heck, even I didn't). I took a minute and closed that gap (mostly referencing the really good description at NCAR/ccpp-physics#999). |
I agree with the two reviewers who've approved the pull request and think the PR as it currently stands is good enough to go in as-is, with a more comprehensive approach to follow if needed. I'm a dilatant when it comes to the ins and outs of CCPP's infrastructure. I managed to learn just enough to incorporate and test the changes to get our project rolling again and to provide back to you in the form of the current PR. So beyond what I've already said, I don't have any additional thoughts on your proposal other than to encourage you all to go forward expeditiously in whatever manner satisfies your requirements and processes.
The NEPTUNE code scales well both in memory and computationally. Dividing the job over a separate sets of tasks for each ensemble member consumes about the same total amount of memory as spreading the ensembles over all the tasks. There are good scaling efficiency arguments for having each ensemble member on its own set of tasks; but there are also advantages for computing the ensemble statistics having all the members available on each task. NEPTUNE supports either mode of operation, notwithstanding this issue with the physics. |
So do we have a quorum? @mkavulich, @grantfirl, are there remaining objections to the short-term plan? |
CAM scales as well but CAM-physics probably uses a lot more memory per column. Still, on runs where we run a few dozen columns per task, I suppose we could run a small ensemble instead (and use more cores). Interesting approach! Thanks for bearing with me on the discussion (where I started way behind). |
If I recall correctly it was mainly @climbfuji and @gold2718 who advocated for a less hard-coded definition here, so if both of you approve of the current version then I also approve. I'll double-check with @grantfirl to confirm before merging. |
I approved it on Apr 29 with a note that in the future we may want to make it configurable (#463 (review)) |
Sorry, I am having a bit of trouble following the conclusion here. We decided a couple meetings ago that I should try to pull out the number of instances into a more configurable location, and the PR I opened to John's branch (michalakes#1) is the best effort I could come up with without making invasive changes to the CCPP prebuild. Is the decision to go forward without those modifications? |
I looked at your PR and it's totally fine using this. The back and forth here was about the nature of the change in general, i.e. creating multiple instances for members in one MPI task, and how to add this to |
Is there any reason why @michalakes can't just accept your PR and then you can merge to main? |
Is there a PR I need to accept? Sorry, I don't see it. |
If I go to your fork and look at Pull Requests, I see this: Note this is a PR to your branch so it shows up on your fork, not on the NCAR organization page. |
…-physics Make maximum instances a little less hard-coded
Got it, thanks. All merged. Thanks (That is, I have merged Mike's PR into my fork and it's ready to be merged to main, all-willing) |
Thanks everyone for the conversation and getting this PR wrapped up. I will go ahead and merge now. |
Thanks, everyone!
From: Michael Kavulich ***@***.***>
Sent: Tuesday, June 20, 2023 9:57 AM
To: NCAR/ccpp-framework ***@***.***>
Cc: John Michalakes ***@***.***>; Mention ***@***.***>
Subject: Re: [NCAR/ccpp-framework] Support multiple concurrent instances per MPI task (PR #463)
Thanks everyone for the conversation and getting this PR wrapped up. I will go ahead and merge now.
—
Reply to this email directly, view it on GitHub <#463 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB5QAL2SKOEVT6USM23ALTLXMHB5RANCNFSM6AAAAAAVLKP47Y> .
You are receiving this because you were mentioned. <https://github.com/notifications/beacon/AB5QAL757ANCRYN5JEKWGZDXMHB5RA5CNFSM6AAAAAAVLKP472WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTS7J7Y4E.gif> Message ID: ***@***.*** ***@***.***> >
|
Short description
Support multiple concurrent instances per MPI task
closes #469
Changes to be committed:
Description
In order to have an ensemble of model instances in a JEDI executable, we
need for packages the model relies on to be “stateless” – meaning
that the data for an instances are not shared between instances. A more
comprehensive solution would be to implement physics so that all such
data is either passed in via arguments to the package, or allocated as
an instance of a class defining the physics.
Deferring the larger issue of revisiting CCPP requirements to cover
support for concurrent instances, this PR takes a more modest approach. It
adds a new "instance" dimension to the types already defined for GFS
physics (GFS_control, GFS_data, GFS_interstitial). And it includes some
changes that were needed to address first-time initialization latches
inside certain packages.
One of these, in Thompson MP, declears the heap allocated fields by ccpp
instance. The issue here was that only the first instance of Thompson
MP was initializing itself. The other guards against trying to allocate
data that has already been allocated (e.g. h2ointerp and ozinterp).
The extent of packages touched by the PR is only one NRL-used suite that
is a subset of all the packages in CCPP.
User interface changes?: No
Fixes:
Allow multiple instances of CCPP physics in single executable for ensemble DA #999
NCAR/ccpp-physics#999
Testing:
Has been successfully tested in NRL NEPTUNE code, which includes a twoinsts-physics in its CI test harness.
test removed:
unit tests:
system tests:
manual testing: