-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes to COMPASS to support conda MPI #480
Fixes to COMPASS to support conda MPI #480
Conversation
This PR is based off #468 so it should be merged with (or after) that PR. |
TestingI successfully ran all steps of the |
e28457c
to
9126dab
Compare
d1ee55b
to
ace9375
Compare
ace9375
to
8f5b2d1
Compare
I got this working on my laptop and on Grizzly today. This required modifications to the approach, but these changes mean COMPASS users will not have to do anything other than load a compass conda environment, and So far, it is not necessary to specify the |
5e93015
to
cd7ae92
Compare
This merge moves soma/4km/32to4km and soma/8km/32to8km test cases into a new subdirectory called "broken" since these test cases are not working and won't be fixed anytime soon. With this change, `./list_testcases.py` and `./setup_testcases.py` won't pick up these tests because their driver config files aren't at the expected directory level.
The Maine, QU60 and SOQU60to15 test cases now have the links to the python script for defining their vertical grids that they need to be set up successfully.
We no longer define a path to metis in the config file, so the version from the conda environment needs to be used instead.
Add support for a "conda_mpi" attribute to "step" tags. If this attribute is set to "true" and MPI is present in the conda environment, that command will be called with `mpirun` from the conda envrionment. This is needed to support compass conda enviornments with mpich. Python scripts and modules that use the netcdf4 package with mpich support don't work properly on many compute nodes (e.g. Grizzly at LANL and Anvil at ANL) unless they are prefixed with `mpirun -np 1`
The paraview extractor can now be called as a function rather than a script, and this is done during base-mesh generation and culling. SCRIP files can now also be created with a function, so a script call is replaced with a function here as well. With these changes, calls to python scripts that use NetCDF in the parallel conda enviornment will now work as long as they are called with `mpirun -np 1`
Also rename the load script for convenience.
This will make sure the compatible version of MPI gets used.
Since we can't detect automatically that this is a python script, (and that it needs to support compass mpi) we need to say so explicitly
cd7ae92
to
190338d
Compare
Testing of all ocean test cases on grizzly:Successful testsTests checked here were successful, those unchecked have not run yet:
Tests that failAll these tests failed for reasons unrelated to the PR These tests need to copy
These tests are missing local links to a python script called
All the above test cases should be fixed in #514 This test case is missing a local link to a file called
This test case crashes during forward run on 4 nodes (144 cores) with insufficient memory and
Tests that were skippedSome have prerequisites that are broken, others are too big to test:
Ran only partly (because it takes too long):
|
@mark-petersen, this has been thoroughly tested on Grizzly and is now ready to test and merge. To test, please make sure you use the environment
Important: you need to use this environment both to set up the test cases and to run them. If you don't use this conda environment during setup, links to the wrong |
…evelop Optionally add links to load_compass_env.sh in test cases #492 This is specified either in the config file or at the command line. Like #480, this involves changes to common COMPASS infrastructure and we should consider making a separate PR to develop instead of merging those changes to ocean/develop. closes #490
@mark-petersen, I'm not sure what is different in So it seems like there's still something to sort out here and maybe the MPICH environment still isn't ready for general use. |
@xylar, |
|
Can you please be more specific about what "broken link" means? Also, what is best way you propose to remediate the issue now that the file location is identified? |
@pwolfram, I suggest you try setting up the test case with |
@xylar I confirmed on LANL IC that mapping_analysis fails using error message
I can get details
This may be a clue: the mpirun command from the details
So we don't need to revert this commit, we could temporarily point One side note, it seems to be hanging on something for a long time IC. With all steps disabled, the |
… ocean/develop Optionally add links to load_compass_env.sh in test cases MPAS-Dev#492 This is specified either in the config file or at the command line. Like MPAS-Dev#480, this involves changes to common COMPASS infrastructure and we should consider making a separate PR to develop instead of merging those changes to ocean/develop. closes MPAS-Dev#490
This merge converts several script calls to function calls, which seems to work more reliably with conda MPI:
With these changes, calls to python scripts that use NetCDF in the parallel conda enviornment will now work as long as they are called with
mpirun -np 1
This merge also adds support for a
conda_mpi
attribute tostep
tags in COMPASS XML files.If this is set to
true
orfalse
, the step will havempirun -np 1
prepended to the executable in conda environments with MPI support. If noconda_mpi
attribute is specified,mpirun -np 1
is prepended only to python scripts (calls starting withpython
or ending with.py
).This is needed to support compass conda environments with MPI. Python scripts and modules that use the
netcdf4
package with mpich support will work property if they are called withmpirun
.Changes are made to
setup_testcase.py
in b1a2b3b, so this commit should be merged todevelop
in a separate PR.