Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make GEOSldas work for SLES15 / Milan #693

Merged
merged 11 commits into from
Feb 8, 2024
43 changes: 27 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,18 +13,20 @@ module use -a (path)
module load GEOSenv
```

where `(path)` depends on the computer and operating system:
where `(path)` depends on the computing system; at NCCS, `(path)` also depends on the operating system (SLES12 on Skylake and Cascade Lake nodes; SLES15 on Milan nodes, as of Jan. 2024):

| System | Path |
| ------------- |---------------------------------------------------|
| NCCS | `/discover/swdev/gmao_SIteam/modulefiles-SLES12` |
| NCCS Discover | `/discover/swdev/gmao_SIteam/modulefiles-SLES12` |
| | `/discover/swdev/gmao_SIteam/modulefiles-SLES15` |
| NAS | `/nobackup/gmao_SIteam/modulefiles` |
| GMAO desktops | `/ford1/share/gmao_SIteam/modulefiles` |

Step 1 can be coded into the user's shell configuration file (e.g., `.bashrc` or `.cshrc`). See the [GEOSgcm Wiki](https://github.com/GEOS-ESM/GEOSgcm/wiki/) for sample shell configuration files.

### Step 2: Obtain the Model

For development work, clone the _entire_ repository and use the `develop` branch as your starting point (equivalent to the `UNSTABLE` tag in the old CVS repository):
For development work, clone the _entire_ repository and use the `develop` branch as your starting point:
```
git clone -b develop git@github.com:GEOS-ESM/GEOSldas.git
```
Expand All @@ -36,25 +38,32 @@ git clone -b v17.9.1 --single-branch git@github.com:GEOS-ESM/GEOSldas.git

### Step 3: Build the Model

To build the model in a single step, do the following:
To build the model in a single step, do the following from a head node:
```
cd ./GEOSldas
parallel_build.csh
```
from a head node. Doing so will check out all the external repositories of the model (albeit only on the first run, [see subsection on mepo below](#mepo)!) and build the model. When done, the resulting model build will be found in `build-SLES12/` and the installation will be found in `install-SLES12/`, with setup scripts like `ldas_setup` in `install-SLES12/bin`.
This checks out all the external repositories of the model (albeit only on the first run, [see subsection on mepo below](#mepo)!) and then builds and installs the model.

To obtain a build that is suitable for debugging, use `parallel_build.csh -debug`, which will build in `build-Debug-SLES12/` and install in `install-Debug-SLES12/`. There is also an option for aggressive optimization. For details, see [GEOSldas Wiki](https://github.com/GEOS-ESM/GEOSldas/wiki).
At **NCCS**, the default is to build GEOSldas on SLES12 (Skylake or Cascade Lake nodes); to build GEOSldas on SLES15 (Milan nodes), use `parallel_build.csh -mil`.

See below for how to build the model in multiple steps.
The resulting model build is found in `build[-SLESxx]/`, and the installation is found in `install[-SLESxx]/`, with setup scripts like `ldas_setup` in `install[-SLESxx]/bin`.

To obtain a build that is suitable for debugging, use `parallel_build.csh -debug`, which builds in `build-Debug[-SLESxx]/` and installs in `install-Debug[-SLESxx]/`. There is also an option for aggressive optimization. For details, see the [GEOSldas Wiki](https://github.com/GEOS-ESM/GEOSldas/wiki).

Instructions for building the model in multiple steps are provided below.

---

## How to Set Up (Configure) and Run GEOSldas

a) Set up the job as follows:

a) At **NCCS**, GEOSldas must be built, configured, and run on the same operating system. To run GEOSldas on Milan nodes (SLES15), start with `ssh discover-mil`.

b) Set up the job as follows:

```
cd (build_path)/GEOSldas/install/bin
cd (build_path)/GEOSldas/install[-SLESxx]/bin
source g5_modules [for bash or zsh: source g5_modules.[z]sh]
./ldas_setup setup [-v] (exp_path) ("exe"_input_filename) ("bat"_input_filename)
```
Expand Down Expand Up @@ -82,7 +91,7 @@ Edit these sample files following the examples and comments within the sample fi
The ldas_setup script creates a run directory and other directories at:
`[exp_path]/[exp_name]`

Configuration input files will be created at:
Configuration input files are created at:
`[exp_path]/[exp_name]/run`

For more options and documentation, use any of the following:
Expand All @@ -92,16 +101,19 @@ ldas_setup sample -h
ldas_setup setup -h
```

b) Configure the experiment output by editing the ```./run/HISTORY.rc``` file as needed.
c) Configure the experiment output by editing the ```./run/HISTORY.rc``` file as needed.

c) Run the job:
d) Run the job:
```
cd [exp_path]/[exp_name]/run/
sbatch lenkf.j
```

For more information, see the files in `./doc/`.
Moreover, descriptions of the configuration (resource) parameters are included in the sample "exeinp" and "batinp" files that can be generated using `ldas_setup`.
At **NCCS**, the appropriate SLURM directive `#SBATCH --constraint=[xxx]` is automatically added into `lenkf.j` depending on the operating system.

For more information, see the files in `./doc/`. Moreover, descriptions of the configuration (resource) parameters are included in the sample "exeinp" and "batinp" files that can be generated using `ldas_setup`.



-----------------------------------------------------------------------------------

Expand Down Expand Up @@ -138,15 +150,14 @@ We currently do not allow in-source builds of GEOSldas. So we must make a direct
```
mkdir build
```
The advantages of this is that you can build both a Debug and Release version with the same clone if desired.

#### Run CMake
CMake generates the Makefiles needed to build the model.
```
cd build
cmake .. -DBASEDIR=$BASEDIR/Linux -DCMAKE_Fortran_COMPILER=ifort -DCMAKE_INSTALL_PREFIX=../install
```
This will install to a directory parallel to your `build` directory. If you prefer to install elsewhere change the path in:
This installs into a directory parallel to your `build` directory. If you prefer to install elsewhere change the path in:
```
-DCMAKE_INSTALL_PREFIX=<path>
```
Expand Down
5 changes: 4 additions & 1 deletion src/Applications/LDAS_App/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ ecbuild_add_executable (
LIBS GEOSlandassim_GridComp)

set (scripts
ldas_setup
process_hist.csh
process_rst.py
ens_forcing/average_ensemble_forcing.py
Expand All @@ -35,6 +34,10 @@ install (
DESTINATION bin
)

set(file ldas_setup)
configure_file(${file} ${file} @ONLY)
install(PROGRAMS ${CMAKE_CURRENT_BINARY_DIR}/${file} DESTINATION bin)

file(GLOB rc_files GEOSldas_*rc)
file(GLOB nml_files LDASsa_DEFAULT*nml)

Expand Down
24 changes: 19 additions & 5 deletions src/Applications/LDAS_App/ldas_setup
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,13 @@ class LDASsetup:
'MINLON','MAXLON','MINLAT','MAXLAT','EXCLUDE_FILE','INCLUDE_FILE','MWRTM_PATH','GRIDNAME',
'ADAS_EXPDIR', 'BCS_RESOLUTION' ]

# if built on sles15, BUILT_ON_SLES15 is "TRUE", else empty ""
BUILT_ON_SLES15 = "@BUILT_ON_SLES15@"

if BUILT_ON_SLES15 == "TRUE":
self.BUILT_ON_SLES15 = True
else:
self.BUILT_ON_SLES15 = False

# ------
# Required resource manager input fields
Expand Down Expand Up @@ -693,7 +700,11 @@ class LDASsetup:
print ('\nCorrect the tile file if it is an old EASE tile format... \n')
EASEtile=self.bcsdir+'/MAPL_'+short_tile
cmd = './preprocess_ldas.x correctease '+ tile + ' '+ EASEtile
print ("cmd: " + cmd)
if self.BUILT_ON_SLES15 :
print ("Executables were built on SLES15 and must be run on SLES15: " + cmd)
else:
print ("cmd: " + cmd)

sp.call(shlex.split(cmd))

if os.path.isfile(EASEtile) :
Expand Down Expand Up @@ -1333,8 +1344,13 @@ class LDASsetup:
elif 'MY_NODES' in line :
line_ = line.replace('MY_NODES',str(self.optRmInp['nodes']))
fout.write(line_.replace('MY_NTASKS_PER_NODE',str(self.rqdRmInp['ntasks-per-node'])))
if int(self.rqdRmInp['ntasks-per-node']) > 40:
fout.write("#SBATCH --constraint=cas\n")

if self.BUILT_ON_SLES15 :
fout.write("#SBATCH --constraint=mil\n")
else:
assert int(self.rqdRmInp['ntasks-per-node']) <= 46, 'ntasks-per-node should be <=46 for cas'
fout.write("#SBATCH --constraint=cas\n")

elif 'MY_OSERVER_NODES' in line :
fout.write(line.replace('MY_OSERVER_NODES',str(self.optRmInp['oserver_nodes'])))
elif 'MY_WRITERS_NPES' in line :
Expand Down Expand Up @@ -1365,8 +1381,6 @@ class LDASsetup:
elif 'MY_ADAS_EXPDIR' in line :
if self.ladas_coupling > 0:
fout.write(line.replace('MY_ADAS_EXPDIR', self.rqdExeInp['ADAS_EXPDIR']))


else :
fout.write(line.replace('MY_EXPDIR',self.exphome+'/$EXPID'))

Expand Down
25 changes: 19 additions & 6 deletions src/Applications/LDAS_App/lenkf.j.template
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,23 @@ setenv argv

source $GEOSBIN/g5_modules

setenv I_MPI_DAPL_UD enable
# OPENMPI flags
# Turn off warning about TMPDIR on NFS
setenv OMPI_MCA_shmem_mmap_enable_nfs_warning 0
# pre-connect MPI procs on mpi_init
setenv OMPI_MCA_mpi_preconnect_all 1
setenv OMPI_MCA_coll_tuned_bcast_algorithm 7
setenv OMPI_MCA_coll_tuned_scatter_algorithm 2
setenv OMPI_MCA_coll_tuned_reduce_scatter_algorithm 3
setenv OMPI_MCA_coll_tuned_allreduce_algorithm 3
setenv OMPI_MCA_coll_tuned_allgather_algorithm 4
setenv OMPI_MCA_coll_tuned_allgatherv_algorithm 3
setenv OMPI_MCA_coll_tuned_gather_algorithm 1
setenv OMPI_MCA_coll_tuned_barrier_algorithm 0
# required for a tuned flag to be effective
setenv OMPI_MCA_coll_tuned_use_dynamic_rules 1
# disable file locks
setenv OMPI_MCA_sharedfp "^lockedfile,individual"

# By default, ensure 0-diff across processor architecture by limiting MKL's freedom to pick algorithms.
# As of June 2021, MKL_CBWR=AVX2 is fastest setting that works for both haswell and skylake at NCCS.
Expand All @@ -53,11 +69,8 @@ setenv MKL_CBWR "AVX2"
# reversed sequence for LADAS_COUPLING (Sep 2020) (needed when coupling with ADAS using different BASEDIR)
setenv LD_LIBRARY_PATH ${BASEDIR}/${ARCH}/lib:${ESMADIR}/lib:${LD_LIBRARY_PATH}

if ( -e /etc/os-release ) then
module load nco/4.8.1
else
module load other/nco-4.6.8-gcc-5.3-sp3
endif
module load nco

setenv RUN_CMD "$GEOSBIN/esma_mpirun -np "

#######################################################################
Expand Down