Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend mepo support to include the fixture repo #114

Closed
tclune opened this issue Oct 28, 2020 · 2 comments
Closed

Extend mepo support to include the fixture repo #114

tclune opened this issue Oct 28, 2020 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@tclune
Copy link
Contributor

tclune commented Oct 28, 2020

The original design for mepo largely excluded the fixture repository itself. Fixture repositories were assumed to be small and thus the missing functionality has not been a problem for many users.

New use cases have emerged in which more robust support for tracking changes at the fixture level is important. Such extensions must be carefully considered though. E.g., a checkout of a different branch on the fixture could alter the components.yaml file and essentially give mepo a lobotomy.

This issue is to track specific capabilities of mepo that should be expanded. I will shortly post the previous email thread on this topic into a response.

@tclune tclune added the enhancement New feature or request label Oct 28, 2020
@tclune
Copy link
Contributor Author

tclune commented Oct 28, 2020

From email:

Ricardo,

I understand. We do need to think carefully about what it would mean for certain mepo operations to be supported on the fixture. E.g., if mepo can checkout a branch on the fixture, then it could overwrite the components.yaml file which at best would make the results of stat/diff confusing and at worst may cause some serious problems.

Certainly mepo can/should support the ability to get a status and a diff in the fixture. Probably any passive operations at that level will make sense.

Let me know if you have a preferred approach to what should happen if a user requests to switch branches in the fixture.

  • Tom

On Oct 28, 2020, at 11:51 AM, Todling, Ricardo (GSFC-6101) ricardo.todling@nasa.gov wrote:

Tom,

Although I can understand what you say, and the vision from Arlindo, I still think that we cannot leave anything out of a tool you trust to wrap the git commands and provide the user information on what is and what is not changed and commit. In my personal opinion, no matter how small the fixture maybe – even if a single file – mepo must look at everything.

I still think, what I suggested - having a set of lines in components.yaml that refer to the fixture branch - is not an unreasonable request, and it has the potential to do the job.

As it stands, I see this as a major problem w/ mepo and a recommendation for people to use mepo would not come from me. For my part, I am now very aware of this flaw and I have been using mepo very conservatively – though I wish it didn’t have to be this way.

Ricardo

From: "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Date: Tuesday, October 27, 2020 at 12:54 PM
To: RICARDO TODLING ricardo.todling@nasa.gov
Cc: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov, "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, Joseph Stassi joe.stassi@nasa.gov, "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Hi Ricardo,

I think Arlindo’s vision for fixture repositories is that they would be small things that just assemble other repos. And as such, the mepo design that Purnendu came up with seems to be pretty good (or at least better than the alternatives according to those that have used it with any regularity.)

ADAS, as it currently stands is not a small fixture and, as you have discovered, this highlights the design choices that went into mepo. We probably can/should do something to allow mepo status to at least report back status and diffs in the fixture. To go further and allow mepo to manage branching and such will require a special entry in the configuration.yaml file. This is not impossible, but I’d like to be certain that this is the best path before we go to far in that direction.

Ultimately, I think you’ll probably want to migrate big chunks of the ADAS fixture into a finer-grained set of repos. Note - you don’t necessarily have to move each subdir under Applications into a separate repository. We could make “Applications” itself a repository (though the repository should have a different name of course). This would minimize the near term disruption, but I wold only recommend that if you were confident you don’t want to go even finer grain. We should endeavor to aim at where you think you will end up.

Cheers,

  • Tom

On Oct 26, 2020, at 5:52 PM, Todling, Ricardo (GSFC-6101) ricardo.todling@nasa.gov wrote:

Matt:

By far and foremost: thank you very much for looking into the nuances I seem to be stumbling on. And also for the many thoughtful insights.

I am starting to think – pls don’t take any of this badly – that mepo is not the really a solution. As you say, there will always be a level that is beyond the control of mepo. So ultimately, git is the only thing that helps. I understand, from your explanation, that the smallest we make the fixture, the more mepo can show and the closer the user is from not missing something; but again, ultimately, mepo might not be able to do it all.

On the other hand, perhaps there is a way to put a dummy set of lines in the components.yaml file that refer to “self”; this would indicate to mepo that it should: (a) look for the branch the file components.yaml is sitting under; and (ii) issue the command line the user is trying to use w/ mepo and invoke the equivalent git command line for that branch.

My inclination, at least for the time being, it to minimize the use of mepo. I think relying on git is the safest. If not solution for such things are found, I think it will be important to make sure some of these nuances are spelled out to avoid confusion as more people start using the GMAO git repo – although new comers will get confused no matter what (myself included!).

Thank you.
Ricardo
From: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov
Date: Monday, October 26, 2020 at 4:25 PM
To: RICARDO TODLING ricardo.todling@nasa.gov, "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Cc: Joseph Stassi joe.stassi@nasa.gov, "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Note: A "solution" for this, would be to put many of the Applications that are currently in the GEOSadas fixture:

https://github.com/GEOS-ESM/GEOSadas/tree/main/src/Applications

into separate repositories themselves. I suppose we didn't do this initially because unlike, say, GEOSgcm_App, folders like GEOSdas_App are not "shared" amongst fixtures (or with outside collaborators like GOCART soon will be). The GEOSadas is the only one with GEOSdas_App.

Similarly, the GEOSldas has Components and Applications that are only in the fixture:

https://github.com/GEOS-ESM/GEOSldas/tree/main/src/Components/GEOSldas_GridComp

so mepo does not control them, only git.

--
Matt Thompson, SSAI, Ld Scientific Programmer/Analyst
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-614-6712 Fax: 301-614-6246
http://science.gsfc.nasa.gov/sed/bio/matthew.thompson

From: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov
Date: Monday, October 26, 2020 at 4:09 PM
To: "Todling, Ricardo (GSFC-6101)" ricardo.todling@nasa.gov, "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Cc: "Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" joe.stassi@nasa.gov, "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Ahhh. Okay. Yeah, that's a limitation of mepo, and there is no good simple way around it that I can think of (maybe Tom would have better luck?)

Much like manage_externals, mepo only tracks the repositories it manages under components.yaml. But there is one repo not in that: the fixture itself!

When you do:

mepo clone fixture-URL

it's actually a 'convenience' doing:

git clone fixture-URL
cd fixture-url
mepo init
mepo clone
cd ..

In the old days of mepo, this is how we used it. I added the ability to do a "one-step" clone which automated that process because some trial users were a bit annoyed at the repetition.

Mepo is able to track everything that the "mepo init clone" phases did. But, the fixture itself isn't seen as mepo never cloned it.

Now git is tracking that, so if you do

git status

within the fixture, it will see the difference and you can use git commands to work with the fixture. (Like how if you ran "git status" in, say, src/Shared/@mapl it will only tell you about that repo.)

Essentially, there will always be some level of repository that a tool like this cannot manage. And, at the moment, that is the fixture containing the components.yaml. As you were editing:

src/Applications/GAAS_App/ana_aod.j.tmpl

and that is part of the fixture repo itself, mepo doesn't know about it.

I can try to look at getting, say, "mepo status" also report status of the fixture itself as well. This would be a longer term project as it would add a lot of complexity to mepo though as there would need to be a set of code for all the mepo-cloned repos and a different set for the fixture itself.

Matt

--
Matt Thompson, SSAI, Ld Scientific Programmer/Analyst
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-614-6712 Fax: 301-614-6246
http://science.gsfc.nasa.gov/sed/bio/matthew.thompson

From: "Todling, Ricardo (GSFC-6101)" ricardo.todling@nasa.gov
Date: Monday, October 26, 2020 at 3:55 PM
To: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov, "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Cc: "Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" joe.stassi@nasa.gov, "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Ok …

I have something even simpler that does not work for me.

I have a working version (checkout from feature/rtodling/migrate-to-geosadas5271)

Then I do:

cd src/Applications/ana_aod.j.tmpl
I edit a line
I do: git status
I get:
modified: ana_aod.j.tmpl
I do: mepo status
I get

Checking status...
env | (b) origin/feature/bmauer/updates-from-geosadas5_27_0 (DH)
cmake | (t) v1.0.11 (DH)
ecbuild | (t) geos/v1.0.0 (DH)
NCEP_Shared | (b) origin/feature/mathomp4/update-to-geosadas527 (DH)
GMAO_Shared | (b) origin/feature/rtodling/migrate-to-geosadas5271 (DH)
MAPL | (b) origin/feature/rtodling/migrate-to-geosadas5271 (DH)
FMS | (t) geos/orphan/v1.0.3 (DH)
GEOSana_GridComp | (b) origin/feature/rtodling/migrate-to-geosadas5271 (DH)
GEOSgcm_GridComp | (b) origin/feature/rtodling/migrate-to-geosadas5271 (DH)
g5pert | (b) origin/feature/rtodling/migrate-to-geosadas5271 (DH)
GEOSagcmPert_GridComp | (b) origin/feature/bmauer/updates-from-geosadas5_27_0 (DH)
FVdycoreCubed_GridComp | (t) v1.0.9 (DH)
fvdycore | (t) geos/v1.0.2 (DH)
GEOSchem_GridComp | (b) origin/feature/bmauer/updates-from-geosadas5_27_0 (DH)
mom | (t) geos/v1.0.1 (DH)
GEOSgcm_App | (b) origin/feature/rtodling/migrate-to-geosadas5271 (DH)
UMD_Etc | (t) v1.0.2 (DH)
CPLFCST_Etc | (t) v1.0.1 (DH)

Nothing shows.

I did the same mepo status, but like this:

/home/mathomp4/MepoDevelopment/mepo status

That returns the same thing as mepo …

Ricardo

From: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov
Date: Monday, October 26, 2020 at 3:47 PM
To: RICARDO TODLING ricardo.todling@nasa.gov, "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Cc: Joseph Stassi joe.stassi@nasa.gov, "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Note: this was all with my new mepo with the fixes for your "move model" workflow. Old mepo will not be as happy.

--
Matt Thompson, SSAI, Ld Scientific Programmer/Analyst
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-614-6712 Fax: 301-614-6246
http://science.gsfc.nasa.gov/sed/bio/matthew.thompson

From: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov
Date: Monday, October 26, 2020 at 3:46 PM
To: "Todling, Ricardo (GSFC-6101)" ricardo.todling@nasa.gov, "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Cc: "Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" joe.stassi@nasa.gov, "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Well,

You are working in a different way than any of us have since using mepo, so you might be exposing bugs.

Still I just tried this:

$ mepo clone git@github.com:GEOS-ESM/MAPL.git
$ cd MAPL

I then made a new branch and edited ESMA_env/README.md to have a new line:
$ mepo checkout -b test-branch-1 ESMA_env
$ cat ESMA_env/README.md

ESMA Environment

Repository containing modulefiles for GEOS-ESM

This is a new line
$ mepo status
Checking status...
ESMA_env | (b) test-branch-1
| README.md: modified, not staged
ESMA_cmake | (t) v3.2.1 (DH)
ecbuild | (t) geos/v1.0.5 (DH)

Now I'll stage, commit and push:

$ mepo stage ESMA_env

  • ESMA_env: README.md
    $ mepo commit -m "test commit" ESMA_env
  • ESMA_env: README.md
    $ mepo push ESMA_env

And then I did:

$ cd ..
$ mv MAPL MAPL-OLD
$ mepo clone git@github.com:GEOS-ESM/MAPL.git
$ cd MAPL
$ mepo status
Checking status...
ESMA_env | (t) v3.0.1 (DH)
ESMA_cmake | (t) v3.2.1 (DH)
ecbuild | (t) geos/v1.0.5 (DH)

Check the Env/README:
$ cat ESMA_env/README.md

ESMA Environment

Repository containing modulefiles for GEOS-ESM

Now I grab that branch:
$ mepo checkout test-branch-1 ESMA_env
$ mepo status
Checking status...
ESMA_env | (b) test-branch-1
ESMA_cmake | (t) v3.2.1 (DH)
ecbuild | (t) geos/v1.0.5 (DH)
$ cat ESMA_env/README.md

ESMA Environment

Repository containing modulefiles for GEOS-ESM

This is a new line

Now I add a new line:
$ echo "Another line" >> ESMA_env/README.md
$ cat ESMA_env/README.md

ESMA Environment

Repository containing modulefiles for GEOS-ESM

This is a new line
Another line

and I see the new change:
$ mepo status
Checking status...
ESMA_env | (b) test-branch-1
| README.md: modified, not staged
ESMA_cmake | (t) v3.2.1 (DH)
ecbuild | (t) geos/v1.0.5 (DH)
$ mepo diff
Diffing...
ESMA_env (location: ESMA_env):

diff --git a/README.md b/README.md
index 7ab94c6..c661a4f 100644
--- a/README.md
+++ b/README.md
@@ -2,3 +2,4 @@
Repository containing modulefiles for GEOS-ESM

This is a new line
+Another line

But if I go to MAPL-OLD:
$ cd ../MAPL-OLD
$ mepo status
Checking status...
ESMA_env | (b) test-branch-1
ESMA_cmake | (t) v3.2.1 (DH)
ecbuild | (t) geos/v1.0.5 (DH)

It does not see the edit.

Matt

Matt Thompson, SSAI, Ld Scientific Programmer/Analyst
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-614-6712 Fax: 301-614-6246
http://science.gsfc.nasa.gov/sed/bio/matthew.thompson

From: "Todling, Ricardo (GSFC-6101)" ricardo.todling@nasa.gov
Date: Monday, October 26, 2020 at 3:32 PM
To: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov, "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Cc: "Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" joe.stassi@nasa.gov, "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

I am willing to try it …

But I guess I am running into another issue w/ mepo.

After I committed and pushed some changes, and checkout out a fresh copy of what pushed, I recompiled, made more changes and now I did a mepo status and mepo does not detect my changes! Maybe the problem is related to what you are working on … but the funny thing is that what I checked, I placed in a directory of same name as what I had before I committed/pushed.

These are really odd behaviors – I find it strange that others have run into similar issues!

Ricardo

From: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov
Date: Monday, October 26, 2020 at 3:19 PM
To: RICARDO TODLING ricardo.todling@nasa.gov, "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Cc: Joseph Stassi joe.stassi@nasa.gov, "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Ricardo,

Oh, I have the edits, but not yet released in the "main" mepo that we expose to people. I was finding little bugs here and there as I tried various things, but I think it's better now.

If you'd like to be a pioneer tester, you can try:

/home/mathomp4/MepoDevelopment/mepo

on discover. That does point to my version with fixes. If you have a chance, try doing a:

clone->change-things->move-directory->re-clone->check-original

quick workflow using that mepo instead of the one from the module.

If that works for you, then I can look at getting it released in "main" mepo quickly.

Matt

--
Matt Thompson, SSAI, Ld Scientific Programmer/Analyst
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-614-6712 Fax: 301-614-6246
http://science.gsfc.nasa.gov/sed/bio/matthew.thompson

From: "Todling, Ricardo (GSFC-6101)" ricardo.todling@nasa.gov
Date: Monday, October 26, 2020 at 1:54 PM
To: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov, "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Cc: "Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" joe.stassi@nasa.gov, "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Hum … Matt, have you already “implemented” your changes? Meaning is my env by default picking up changes you made in this regard?

If so, something is not work. I just made changes to my source code; git see them, but not mepo; mepo is looking elsewhere!

R

From: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov
Date: Monday, October 26, 2020 at 11:20 AM
To: RICARDO TODLING ricardo.todling@nasa.gov, "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Cc: Joseph Stassi joe.stassi@nasa.gov, "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Ricardo,

I believe I have a fix for the "move the repo" now. I'm doing a bit more testing, but things seem to work for me.

Note: This will require new clones to use, though, as I had to change the internal state mepo references that are created at initialization time. Mepo isn't really capable of doing much internal surgery to itself after cloning for fundamental things like "disk location of sub-repository". Most of what I'm checking now is that I don't break using mepo on older clones of the model.

Matt

PS: Also, it's possible some edge cases of weird mepo operators I don't use often might have issues, but I'm making sure all the usual ones still work. But I suppose the good thing about OO programming is you can fix bugs for lots of things all at once. :)

--
Matt Thompson, SSAI, Ld Scientific Programmer/Analyst
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-614-6712 Fax: 301-614-6246
http://science.gsfc.nasa.gov/sed/bio/matthew.thompson

From: "Todling, Ricardo (GSFC-6101)" ricardo.todling@nasa.gov
Date: Monday, October 26, 2020 at 8:51 AM
To: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov, "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Cc: "Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" joe.stassi@nasa.gov, "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Well, it doesn’t have to be today … come on – give yourself some slack …

From: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov
Date: Monday, October 26, 2020 at 8:49 AM
To: RICARDO TODLING ricardo.todling@nasa.gov, "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Cc: Joseph Stassi joe.stassi@nasa.gov, "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Ricardo,

Yes, it is a bug indeed. But it will take a bit of time to fix as I've stared at mepo. Much of what is coded in there assumes the path is fixed. So I'll need to be careful to make sure all the functionality doesn't break.

Hopefully I can get a fix for it today.

Matt

--
Matt Thompson, SSAI, Ld Scientific Programmer/Analyst
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-614-6712 Fax: 301-614-6246
http://science.gsfc.nasa.gov/sed/bio/matthew.thompson

From: "Todling, Ricardo (GSFC-6101)" ricardo.todling@nasa.gov
Date: Monday, October 26, 2020 at 8:47 AM
To: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov, "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Cc: "Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" joe.stassi@nasa.gov, "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Hi Matt,

First let me thank you for the answer.

Let me say that, although nobody in the SI Team might have renamed a checked out directory, this is rather common practice among us using CVS. So, most of us do exactly what I have just described. I am 100% sure others will do exactly what I did; and I am 100% sure, many people will not notice the issue w/ this change and will get bitten by it rather badly.

So, I appreciate you making this an issue. For my part, I say, this is fundamental and we should aim to having this addressed before a Git version of the DAS is released to anyone (other than our little experimental group here).

Thank you again.
Ricardo
PS: basically, the path used by mepo should not be absolute; it should be relative.

From: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov
Date: Monday, October 26, 2020 at 8:29 AM
To: RICARDO TODLING ricardo.todling@nasa.gov, "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Cc: Joseph Stassi joe.stassi@nasa.gov, "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Ricardo,

I was off Friday, so I'll start trying to answer some of these questions.

First, as for moving the entire fixture, indeed that could cause mepo issues, usually a crash. But, actually, you did things juuuuust right in a certain way that exposed an issue.

Why? My guess is because you did something neither Purnendu, nor I, nor anyone has done really...which is move a checkout and reclone to the same path. Staring at the mepo code, when you clone a fixture, the system uses absolute paths in the state, thus, if you move a fixture, the state still thinks you all the clones are in the original paths.

So, one, you clone to "/path/to/GEOSadas", then you rename the fixture to "/path/to/GEOSadas_" and then and reclone a fixture in the exact same path as the old fixture, "/path/to/GEOSadas". If this happens, well, the mepo state in both cases will refer to "/path/to/GEOSadas". So mepo commands in GEOSadas_ will actually affect the other fixture!

(The other scenario is if you didn't make the second clone then mepo will fail because the paths in the state no longer exist.)

As I said, I suppose this is a scenario none of us has encountered because I guess we usually don't rename clones this way, we usually just re-clone anew in a new path. That said, I am pretty sure it doesn't need to be this way, it was just coded that way at the beginning. I have made an issue here:

#106

and I'll start working on a fix.

--
Matt Thompson, SSAI, Ld Scientific Programmer/Analyst
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-614-6712 Fax: 301-614-6246
http://science.gsfc.nasa.gov/sed/bio/matthew.thompson

From: "Todling, Ricardo (GSFC-6101)" ricardo.todling@nasa.gov
Date: Friday, October 23, 2020 at 8:15 PM
To: "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Cc: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov, "Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" joe.stassi@nasa.gov, "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Ben: thanks, we’ll work on that when time comes …

For now, I think there is something messed up with mepo. Here is what’s happening to me. I checkouted a version of the system (5_27_0) – following your original instructions; compiled it; run w/ it … all worked. Then I made changes to merge it w/ the cvs version 5_27_1 – compiled, reproduced both the AOD analysis and the GSI analysis (as compared to those of 5_27_1) … but the model is completely messed up … I don’t know yet why that is the case – though I have a suspect (related to how the repo is working, but before working on the suspect – I did this:

I moved my version to a different name (from it’s top directory); then I cloned and checked again from scratch you original instruction ... and while working on building that, I when to my renamed directory – the one w/ all my changes – and did “mepo status” and it shows nothing – it thinks nothing is changed in that directory … when I cd to one of the repo’s and do a git status, I see all my changes/additions/etc …

What I think is happening is that when I do mepo status … mepo looks in the recent fresh check out I have … mepo probably hides some information somewhere and even though I am in another directory it is still looking into the dir w/ that same original name.

In other words

I had version checked out under a directory named GEOSadas; I had all my changes there. I rename this directory to _GEOSdas; I clone and check another version; so name I have sitting side by side two directories: GEOSadas and _GEOSadas; I cd to _GEOSadas … do “mepo status” and nothing shows … it is as if I had cd’ed to GEOSadas (the freshely checked out version) … that’s no good!

Something is wrong w/ mepo.

Ricardo

PS: Worse now … the freshly checkout version does not compile to completion!

From: "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov
Date: Friday, October 23, 2020 at 4:18 PM
To: RICARDO TODLING ricardo.todling@nasa.gov, "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Cc: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov, Joseph Stassi joe.stassi@nasa.gov, "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

The short answer is once we are satisfied with the configuration, everywhere that is currently point to a branch in the components.yaml must be tagged and the components.yaml points to a tag instead of a branch.
How to do this within the semantic versioning scheme is less easy since we can’t stuff these on top of main in many of these and simply make them the next major version. Once Tom is back we can discuss.

From: "Todling, Ricardo (GSFC-6101)" ricardo.todling@nasa.gov
Date: Friday, October 23, 2020 at 3:54 PM
To: "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Cc: "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov, "Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" joe.stassi@nasa.gov, "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Hi Tom,

Welcome back!

Thanks for the extra clarification. So far, I haven’t stumbled on anything that jumps the eye as an issue in my mind. I am happy, as I said to Ben, that the approach you’ve taken is perhaps a little less aggressive in adopting “features” of ecbuild.

I need to get a little more familiarized w/ mepo since committing through the various repositories when making changes across them is a bit of a pain without knowing mepo’s syntax. In any case, I will be doing the dummy thing for now until I learn better.

I am trying to piggy back from Matt and Ben’s work to bring 5_27_0 to Git and am putting the changes that just went in in CVS as the 5_27_1 into Git; this latter is the version being passed to Mark and Rob for our next parallel system and is the one that combines tests done on top of 5_27_0 … I think the changes are all-in-all minor but they provide me with a robust set of changes to experience the process.

What is still not clear to me is how, when all is said and done, I will be able to checkout a consistent version of whatever I come up w/ after I commit the bits and pieces in the various repos and branches. I’ll have to ask Matt and Ben when I get to that.

Cheers.
Ricardo

From: "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Date: Friday, October 23, 2020 at 3:31 PM
To: RICARDO TODLING ricardo.todling@nasa.gov
Cc: "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov, Joseph Stassi joe.stassi@nasa.gov, "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Ricrardo,

Just for a bit more background on our implementation choice. The ecbuild mechanism for building subprojects places their source trees in a flat directory at the top of the super project source tree. There are several nice aspects of this, but Arlindo was insistent that as much as possible the git translation from cvs preserve our hierarchically nested structure. Our strategy is also a bit friendlier to performing the cmake configuration step on a batch node. The ecbuild/jedi approach requires web access at that stage. (I say it is a minor advantage, because we still need web access for the recursive clone step anyway.)

The main ways in which we are currently using ecbuild are:

macros for installation. This is one thing that cmake can do well, but it very difficult to figure out from the documentation and requires lots of scattered lines of cmake code to do it well. ecbuild nicely encapsulates this. We had to submit a bugfix to work for our hierarchical structure, and that was readily accepted. The only other aspect that I could wish be improved is that they don’t include version number in the installation directory structure.
wrapper macros for add_library() and add_executable(). For the former, we actually again wrap their ecbuild_library(). Basically this is because we had already introduce our own wrapper that had capabilities that their macro lacks (e.g. stubs). This is a bit ugly and should be improved further in our abundant spare time.
packages to support finding 3rd party libraries (e.g., netcdf).

There are likely other great things in there that we could exploit. To my understanding NOAA is skipping ecbuild entirely - it will be interesting to see how they absorb jedi.

Cheers,

  • Tom

On Oct 23, 2020, at 3:05 PM, Todling, Ricardo (GSFC-6101) ricardo.todling@nasa.gov wrote:

Hi Ben,

Don’t get me wrong: I could not dislike more the way this aspect of JEDI works; I am very glad to know you have not adopted that.

While I waited to hear from you, I went ahead and rebuild anyway, even being unsure of what would happen – but sure enough I see that after compiling all I had is intact …

I certainly prefer and would never advocate in favor of the mode used in JEDI.

Thanks for clarifying.
Ricardo
From: "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov
Date: Friday, October 23, 2020 at 3:00 PM
To: RICARDO TODLING ricardo.todling@nasa.gov, "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov, Joseph Stassi joe.stassi@nasa.gov, "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Cc: "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Ricardo,
We are operating very differently from JEDI in a few regards and are not using the parts of ecbuild you are seeing in JEDI, to my knowledge we are using a limited subset of the ecbuild capabilities.

As you are aware, with JEDI the pull/build is tightly coupled in you clone the empty bundle and their build script both pull the individual repos and builds via ecbuild and just getting the source without building is not easy or supported, which seems odd to me.
Then as you have noticed when you go to rebuild it will attempt to pull updates from GIT unless you go out of the way to disable this.
After GEOS it took a while to get use to this, I will still don’t like how tightly coupled the checkout/build is, but it is what it is. Half the time when I checkout code I know I am going to want to change it before building so forcing one to build right away if frustrating. It has been a month since I last build JEDI so if this has changed I would not be aware.

We are doing none of this with GEOS.
We use mepo to pull the repos.
Then you have to go build, the build process never will and is incapable of pulling anything from git. We are just not making use of that ability of ecbuild, once you have build once, you have to explicitly pull yourself, there is no danger of changes being pulled from GIT without your knowing.

The parallel_build script looks like it combines the mepo clone and then the build in 1 step but this is just for the initial checkout, it only clones if it detects that the initial clone has not happened.

From: "Todling, Ricardo (GSFC-6101)" ricardo.todling@nasa.gov
Date: Friday, October 23, 2020 at 1:18 PM
To: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov, "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, "Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" joe.stassi@nasa.gov, "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Cc: "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Matt and/or Ben:

I have a dummy question about how things are setup in the make procedures for GEOS from the Git checkout.

If I understand it right, parts of the build use ecbuild; if you are following the “approach” adopted by JEDI, the build/compile is preceded by a pull – so if any changes were made by others working on the versions checked out, the code automatically updates – in order to prevent this from happening so no conflicts arise w/ changes made by the user trying to compile you have to explicit go somewhere in the make procedure and comment out the update option.

Is this how things also work in our GEOSgit build procedure; in order, if make changes to my working copy and trying re-building will the build pull from the repo without my knowing? If so, can you tell me how I de-activate this feature – if not … all good.

Thank you.
Ricardo

From: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov
Date: Thursday, October 22, 2020 at 9:40 AM
To: RICARDO TODLING ricardo.todling@nasa.gov, "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, Joseph Stassi joe.stassi@nasa.gov, "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Cc: "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Ricardo,

It's ugly indeed. Hmm. Well, I guess MPIRUN_AOD is defined in fvsetup? If so, we could do something in the "if nccs" section (around line 7792) a la:

if ($?MPT_VERSION) then
setenv MPIRUN_AOD "mpirun "
else
setenv MPIRUN_AOD "esma_mpirun "
endif

and I guess for $ANAAODX you could do:

$MPIRUN $ANAAODX -v -x $EXPID $flags -o $ods_a $aer_f

as an MPIRUN seems to be defined in that file:

setenv MPIRUN "$MPIRUN_AOD -np $NCPUS_AOD"

This should be pretty safe unless $NCPUS_AOD is larger than the number of cores on a node. mpirun is odd with MPT as to run on multiple nodes requires extra effort.

Matt Thompson, SSAI, Ld Scientific Programmer/Analyst
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-614-6712 Fax: 301-614-6246
http://science.gsfc.nasa.gov/sed/bio/matthew.thompson

From: "Todling, Ricardo (GSFC-6101)" ricardo.todling@nasa.gov
Date: Thursday, October 22, 2020 at 9:03 AM
To: "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, "Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" joe.stassi@nasa.gov, "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov, "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Cc: "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Also, question of Matt and Joe: what’s the take on how we are going to proceed w/ the handling of the ugliness in the AOD analysis?
As I understand there are two level of changes:

In the main DAS job script where the var MPIRUN_AOD needs to be defined as mpirun instead of esma_mpirun
And, in ana_aod.j.tmpl, where an mpiexec_mpt -np 1 needs to be put in front of $ANAAODX …

Are you just going to wire these in, or what?

Ricardo

From: "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov
Date: Thursday, October 22, 2020 at 8:55 AM
To: RICARDO TODLING ricardo.todling@nasa.gov, Joseph Stassi joe.stassi@nasa.gov, "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov, "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Cc: "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Ricardo,
Sorry, I did miss those files. This is being merge all done by hand and xxdiff directories as there is no other way to do it so there were bound to be a few. I’ve added these fiels and push them to the feature/bmauer/updates-from-geosadas5_27_0 branch so you can re-pull the branch

From: "Todling, Ricardo (GSFC-6101)" ricardo.todling@nasa.gov
Date: Wednesday, October 21, 2020 at 10:46 PM
To: "Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" joe.stassi@nasa.gov, "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov, "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Cc: "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Notice that there is at least one file missing from the merge that causes the ensemble to fail:

GMAO_Shared | (b) origin/feature/bmauer/updates-from-geosadas5_27_0 (DH)
| GMAO_etc/PUBLICTAG: modified, not staged
| GMAO_etc/Recd_State.pm: untracked file
| GMAO_etc/arbitrary.pl: untracked file
| GMAO_etc/obsys-nccs-arc.rc: untracked file

The file GMAO_etc/arbitrary.pl must be added … and I don’t understand why the other marked untracked above are missing in Git; also PUBLICTAG is outdated … I can push these, and anything else I find, once I finish testing the ensemble. But I wanted Joe to know that w/ the key missing file the ensemble will fail badly.

Ricardo
From: Joseph Stassi joe.stassi@nasa.gov
Date: Wednesday, October 14, 2020 at 9:59 AM
To: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov, "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Cc: "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, RICARDO TODLING ricardo.todling@nasa.gov, "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

The C48f job completed one cycle. The output restarts are zero diff with what I got from the CVS build for GEOSadas-5_27_0.

From: Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] matthew.thompson@nasa.gov
Sent: Wednesday, October 14, 2020 9:39 AM
To: Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] joe.stassi@nasa.gov; Clune, Thomas L. (GSFC-6101) thomas.l.clune@nasa.gov
Cc: Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] benjamin.m.auer@nasa.gov; Todling, Ricardo (GSFC-6101) ricardo.todling@nasa.gov; tom.clune@nasa.gov tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Whoops. Yeah, mpirun -np 1 as you figured out.

MPT is so weird...

I think (in the end) SchedMD thought the issue was internal to MPT and they thought they'd fixed it in SLURM 17:

https://bugs.schedmd.com/show_bug.cgi?id=3520

but maybe the issue that Aaron found wasn't the only one?

--
Matt Thompson, SSAI, Ld Scientific Programmer/Analyst
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-614-6712 Fax: 301-614-6246
http://science.gsfc.nasa.gov/sed/bio/matthew.thompson

From: "Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" joe.stassi@nasa.gov
Date: Wednesday, October 14, 2020 at 9:19 AM
To: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov, "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Cc: "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, "Todling, Ricardo (GSFC-6101)" ricardo.todling@nasa.gov, "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

"mpirun -np 1 solve.x" worked.

The job is still running. I will see how far it gets.

From: Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] joe.stassi@nasa.gov
Sent: Wednesday, October 14, 2020 9:00 AM
To: Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] matthew.thompson@nasa.gov; Clune, Thomas L. (GSFC-6101) thomas.l.clune@nasa.gov
Cc: Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] benjamin.m.auer@nasa.gov; Todling, Ricardo (GSFC-6101) ricardo.todling@nasa.gov; tom.clune@nasa.gov tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Matt,

Here is what I get:

m_GetAI::GetAI_External: mpirun /gpfsm/dswdev/jstassi/GEOSadas/g5270/install/bin/solve.x
MPT ERROR: '/gpfsm/dswdev/jstassi/GEOSadas/g5270/install/bin/solve.x' is not a legal hostname
(HPE MPT 2.17 11/30/17 07:46:44)
m_GetAI::GetAI_External: cannot run mpirun /gpfsm/dswdev/jstassi/GEOSadas/g5270/install/bin/solve.x
000.die.: from m_GetAI::GetAI_External()
MPT ERROR: Rank 0(g:0) is aborting with error code 2.
Process ID: 21313, Host: borgn136, Program: /gpfsm/dswdev/jstassi/GEOSadas/g5270/install/bin/ana_aod.x
MPT Version: HPE MPT 2.17 11/30/17 07:45:32

It looks like I need "mpirun -np 1 solve.x". Does that look right?

Joe

From: Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] matthew.thompson@nasa.gov
Sent: Wednesday, October 14, 2020 8:31 AM
To: Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] joe.stassi@nasa.gov; Clune, Thomas L. (GSFC-6101) thomas.l.clune@nasa.gov
Cc: Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] benjamin.m.auer@nasa.gov; Todling, Ricardo (GSFC-6101) ricardo.todling@nasa.gov; tom.clune@nasa.gov tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Joe,

This is a bit of a longshot, but can you try calling it with just "mpirun"? The mpirun in MPT is a weirdo, but for a single core, it's not (it's when you have multiple nodes it's odd).

So not 'mpiexec_mpt' or 'esma_mpirun' but just straight 'mpirun'.

Note: I believe Ben is now testing MPT with the tag as well, so we can also look at this.

Matt

--
Matt Thompson, SSAI, Ld Scientific Programmer/Analyst
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-614-6712 Fax: 301-614-6246
http://science.gsfc.nasa.gov/sed/bio/matthew.thompson

From: "Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" joe.stassi@nasa.gov
Date: Wednesday, October 14, 2020 at 7:47 AM
To: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov, "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Cc: "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, "Todling, Ricardo (GSFC-6101)" ricardo.todling@nasa.gov, "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Matt,

It looks like solve.x is being called from within the ana_aod.x job, based on what I see in the error messages below.

I modified m_GetAI.F90 so that solve.x is not called with mpiexec_mpt, and I reran the job. It died with the following error:

m_GetAI::GetAI_External: /gpfsm/dswdev/jstassi/GEOSadas/g5270/install/bin/solve.x
ctrl_vsend/writev failed: Bad file descriptor
m_GetAI::GetAI_External: cannot run /gpfsm/dswdev/jstassi/GEOSadas/g5270/install/bin/solve.x
000.die.: from m_GetAI::GetAI_External()
MPT ERROR: Rank 0(g:0) is aborting with error code 2.
Process ID: 2023, Host: borgp153, Program: /gpfsm/dswdev/jstassi/GEOSadas/g5270/install/bin/ana_aod.x
MPT Version: HPE MPT 2.17 11/30/17 07:45:32

Here is the logfile: /discover/nobackup/jstassi/C48f/run/ana_aod.abnormal.log.20190117_00z.txt

This looks like the same error I was getting when ana_aod.x was not called with mpiexec_mpt.

Do you have suggestions about how to call solve.x when it is called from within an mpiexec_mpt job?

Joe

From: Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] matthew.thompson@nasa.gov
Sent: Tuesday, October 13, 2020 1:02 PM
To: Clune, Thomas L. (GSFC-6101) thomas.l.clune@nasa.gov; Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] joe.stassi@nasa.gov
Cc: Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] benjamin.m.auer@nasa.gov; Todling, Ricardo (GSFC-6101) ricardo.todling@nasa.gov; tom.clune@nasa.gov tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

No. It has it:

https://github.com/GEOS-ESM/GMAO_Shared/blob/47ca847958d8d3ebc9bd337084ef5512c5596540/GMAO_psas/m_GetAI.F90#L394-L403

In fact, I'm wondering might be that we are now hitting another infamous MPT bug/issue that seems to randomly(?) pop up. MPT often has really problems with someone calling MPI_Init twice. That is running mpiexec_mpt within an mpiexec_mpt job can have issues. (prund.pl is another place this happens.)

Joe, can you refresh our memory on who is calling this? I see that solve.x is part of psas, but I'm having trouble figuring out how m_GetAI gets called, etc. Is this being called within another MPI job? Or not? I can't tell.

Matt

Matt Thompson, SSAI, Ld Scientific Programmer/Analyst
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-614-6712 Fax: 301-614-6246
http://science.gsfc.nasa.gov/sed/bio/matthew.thompson

From: "Clune, Thomas L. (GSFC-6101)" thomas.l.clune@nasa.gov
Date: Tuesday, October 13, 2020 at 12:50 PM
To: "Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" joe.stassi@nasa.gov
Cc: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov, "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, "Todling, Ricardo (GSFC-6101)" ricardo.todling@nasa.gov, "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Is this just the same problem at another level. My poor memory is that ’solve.x’ is launched from within Fortran. I further seem to recall it is being run with a single process to minimize some bug in PSAS that has arisen in recent compilers. (A bug that I’m quite likely to be the cause of during a much earlier part of my career.)

Maybe solve.x is lacking the mpi launch?

On Oct 13, 2020, at 12:12 PM, Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] joe.stassi@nasa.gov wrote:

Matt,

Here is the logfile: /discover/nobackup/jstassi/C48f/fvwork.28533/C48f.ana_aod.log.20190117_00z.txt

I didn't see any other error; just the messages I showed, and then it sat there until the job timed out.

Joe

From: Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] matthew.thompson@nasa.gov
Sent: Tuesday, October 13, 2020 11:44 AM
To: Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] joe.stassi@nasa.gov; Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] benjamin.m.auer@nasa.gov; Todling, Ricardo (GSFC-6101) ricardo.todling@nasa.gov
Cc: tom.clune@nasa.gov tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Joe,

Were there any errors or anything?

This is of course one of the more "infamous" bits of the ADAS. Where the MPI command is run inside a Fortran program.

But I can't see any obvious reason this would work differently because of CMake...

Matt

--
Matt Thompson, SSAI, Ld Scientific Programmer/Analyst
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-614-6712 Fax: 301-614-6246
http://science.gsfc.nasa.gov/sed/bio/matthew.thompson

From: "Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" joe.stassi@nasa.gov
Date: Tuesday, October 13, 2020 at 6:39 AM
To: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov, "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, "Todling, Ricardo (GSFC-6101)" ricardo.todling@nasa.gov
Cc: "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Matt,

I added 'mpiexec_mpt -np 1' to the front of the call to ana_aod.x in the C48f/run/gaas/ana_aod.j.tmpl file.

The job gets past the error that I was getting previously, but then it gets hung up calling the solve.x program.

m_GetAI::GetAI_External: esma_mpirun -np 1 /gpfsm/dswdev/jstassi/GEOSadas/g5270/install/bin/solve.x
/gpfsm/dswdev/jstassi/GEOSadas/g5270/install/bin/esma_mpirun: mpi_type = mpt
/usr/local/sgi/mpi/mpt-2.17/bin/mpiexec_mpt -np 1 /gpfsm/dswdev/jstassi/GEOSadas/g5270/install/bin/solve.x
srun: cluster configuration lacks support for cpu binding
srun: Job 40315743 step creation temporarily disabled, retrying
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.

Joe

From: Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] matthew.thompson@nasa.gov
Sent: Friday, October 9, 2020 8:18 AM
To: Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] joe.stassi@nasa.gov; Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] benjamin.m.auer@nasa.gov; Todling, Ricardo (GSFC-6101) ricardo.todling@nasa.gov
Cc: tom.clune@nasa.gov tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Joe,

My first guess is that sometimes with MPT you can get odd errors if you try and execute code that is compiled with MPT but not calling it with mpiexec_mpt. So my first thought would be to try and execute ana_aod.x with 'mpiexec_mpt -np 1' and see if that helps.

Matt

--
Matt Thompson, SSAI, Ld Scientific Programmer/Analyst
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
This is of course one of the more "infamous" bits of the ADAS. Where the MPI command is run inside a Fortran program.

But I can't see any obvious reason this would work differently because of CMake...

Matt

--
Matt Thompson, SSAI, Ld Scientific Programmer/Analyst
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-614-6712 Fax: 301-614-6246
http://science.gsfc.nasa.gov/sed/bio/matthew.thompson

From: "Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" joe.stassi@nasa.gov
Date: Tuesday, October 13, 2020 at 6:39 AM
To: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov, "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, "Todling, Ricardo (GSFC-6101)" ricardo.todling@nasa.gov
Cc: "tom.clune@nasa.gov" tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Matt,

I added 'mpiexec_mpt -np 1' to the front of the call to ana_aod.x in the C48f/run/gaas/ana_aod.j.tmpl file.

The job gets past the error that I was getting previously, but then it gets hung up calling the solve.x program.

m_GetAI::GetAI_External: esma_mpirun -np 1 /gpfsm/dswdev/jstassi/GEOSadas/g5270/install/bin/solve.x
/gpfsm/dswdev/jstassi/GEOSadas/g5270/install/bin/esma_mpirun: mpi_type = mpt
/usr/local/sgi/mpi/mpt-2.17/bin/mpiexec_mpt -np 1 /gpfsm/dswdev/jstassi/GEOSadas/g5270/install/bin/solve.x
srun: cluster configuration lacks support for cpu binding
srun: Job 40315743 step creation temporarily disabled, retrying
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.

Joe

From: Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] matthew.thompson@nasa.gov
Sent: Friday, October 9, 2020 8:18 AM
To: Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] joe.stassi@nasa.gov; Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] benjamin.m.auer@nasa.gov; Todling, Ricardo (GSFC-6101) ricardo.todling@nasa.gov
Cc: tom.clune@nasa.gov tom.clune@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Joe,

My first guess is that sometimes with MPT you can get odd errors if you try and execute code that is compiled with MPT but not calling it with mpiexec_mpt. So my first thought would be to try and execute ana_aod.x with 'mpiexec_mpt -np 1' and see if that helps.

Matt

--
Matt Thompson, SSAI, Ld Scientific Programmer/Analyst
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-614-6712 Fax: 301-614-6246
http://science.gsfc.nasa.gov/sed/bio/matthew.thompson

From: "Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" joe.stassi@nasa.gov
Date: Friday, October 9, 2020 at 7:47 AM
To: "Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" benjamin.m.auer@nasa.gov, "Todling, Ricardo (GSFC-6101)" ricardo.todling@nasa.gov
Cc: "tom.clune@nasa.gov" tom.clune@nasa.gov, "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" matthew.thompson@nasa.gov
Subject: Re: Candidate git configuration for GEOSadas-5_27_0

Hi Ben,

Thank you for the help. I tried running the C48f test job, and I got the following error during AOD analysis.

file: /discover/nobackup/jstassi/C48f/run/ana_aod.abnormal.log.20190117_00z.txt
error message: ctrl_vsend/writev failed: Bad file descriptor

Do you recognize this error?

Joe

From: Auer, Benjamin M. (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] benjamin.m.auer@nasa.gov
Sent: Wednesday, October 7, 2020 5:13 PM
To: Todling, Ricardo (GSFC-6101) ricardo.todling@nasa.gov; Stassi, Joe (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] joe.stassi@nasa.gov
Cc: tom.clune@nasa.gov tom.clune@nasa.gov; Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] matthew.thompson@nasa.gov
Subject: Candidate git configuration for GEOSadas-5_27_0

Ricardo, Joe,
With Matt’s help I have created an initial git configuration that we believe captures GEOSadas-5_27_0 for testing.

Using the lastest git tag scott created that was synced to Larry’s CVS version I was able to start from there. I took the changes to the adas specific code relative to our first adas version on git and any changes we saw in the shared repositories, i.e. GCMgridcomp etc …

So far I have run Joe’s C48 3D var test case and it was zero-diff in spot checking things like the agcm_import_rst after 8 cycles to the CVS tag so the work done last year to match gnumake/cmake seems still valid.

Please try building this and doing whatever tests are needed to get this validated or find problems. I will try running other test cases of Joe’s but you both have a much better sense of what to test and what to look for.

Once you are satisfied we (me, you, Matt, Scott, Tom, others) will need really coordinate to figure out the path forward. Right now I had to pretty much branch most of the repos that are common to both the gcm and adas fixtures to capture the changes, so we will need to version/tag these, and chart the path forward to get the adas caught up to the model.

To obtain the git version:

Clone the adas fixture

git clone git@github.com:GEOS-ESM/GEOSadas.git

Then checkout the branch with our changes for GEOSadas-5_27_0

git checkout feature/mathomp4/update-to-geosadas527-withoutMAPL2changes

From here you can run parallel build. This will checkout all the other repositories and build. The g5_modules has the logic I took from the g5_modules in the CVS tag and will build with MPT if it finds itself on a haswell and Intel MPI if it finds itself on a skylake.

Note that parallel build is set to use “mepo” and the components.yaml file to specify the configuration of the fixture so make sure you have mepo in your path. The simplest way is to load the GEOSenv module. I put this in my tcshrc:

setenv SITEAM /discover/swdev/gmao_SIteam

if ( ! $?OS12 ) then
module use -a ${SITEAM}/modulefiles-SLES11
else
module use -a ${SITEAM}/modulefiles-SLES12
endif
module load GEOSenv

@mathomp4
Copy link
Member

Please note that #113 does seem to be able to do this in current testing. You do need to add:

GEOSgcm
   fixture: true
   develop: main

to the components.yaml to enable mepo to see the fixture itself.

I am currently doing some heavy testing to make sure using the PR's version of mepo with fixtures that don't have these lines is still okay. I need to confirm if this change is backward compatible or not (as I might have to call it v2.0.0 if it introduces a break in components.yaml)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants