Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Discover site configs, fix CI buillds, install GEOS-enabled spack-stack on Discover / AWS ParallelCluster #993

Merged
merged 60 commits into from
Feb 26, 2024

Conversation

climbfuji
Copy link
Collaborator

@climbfuji climbfuji commented Feb 7, 2024

Summary

This PR unfortunately bundles several changes that became necessary as I worked through the issues.

  • Container CI tests:
    • Correct mapl-mpich incompatibility
    • Update versions in jedi-ci specs
    • Build mpich@4.2.0 externally for clang container(due to problems with mpich in spack-stack - issue created in spack develop)
    • Correct Intel Compiler version (don't use oneAPI version, but the actual Intel compiler version)
    • Update OpenMPI version and associated OpenMPI oversubscribe env variables/config settings
  • Ubuntu Intel CI tests: correct Intel Compiler version (don't use oneAPI version, but the actual Intel compiler version) and don't find external curl
  • configs/common/modules*.yaml: set openmpi@5: oversubscribe env variables
  • configs/common/packages.yaml: Use require syntax for boost
  • Site config update for aws-pcluster
  • Site config update for (existing) discover, rename to discover-scu16
  • Add discover-scu17 site config (Milan)
  • For Hercules, make current "alternative" GNU+OpenMPI stack the only option for GNU (i.e. remove mvapich2)
  • Make mysql an optional dependency (variant) for ewok-env, off by default
  • Documentation updates for all of the above

Testing

  • Discover SCU16 GNU
    • Build jedi-bundle / skylab and run experiments qg-fullDA, l95-fullDA, skylab-aero-weather (1 cycle)
    • Run jedi-bundle ctests (jedi-bundle branch feature/no_oops_trapfpe)
    • Build GEOS-GCM and run a c12 test
  • Discover SCU16 Intel
  • Discover SCU17 GNU
    • Note I had to load ecflow_ui like this: LD_PRELOAD="/usr/local/other/gcc/12.3.0/lib64/libstdc++.so" ecflow_ui
    • Build jedi-bundle / skylab and run experiments qg-fullDA, l95-fullDA, skylab-aero-weather (1 cycle)
    • Run jedi-bundle ctests (jedi-bundle branch feature/no_oops_trapfpe)
    • Build GEOS-GCM and run a c12 test
  • Discover SCU17 Intel
  • AWS ParallelCluster GNU
    • Build jedi-bundle / skylab and run experiments qg-fullDA, l95-fullDA, skylab-atm-land-small, skylab-trace-gas (1 cycle), skylab-aero-weather (1 cycle)
    • Build GEOS-GCM and run a c12 test
  • AWS ParallelCluster Intel
  • @climbfuji's macOS (M1 / Monterey) Clang (13.1.6)
    • Build jedi-bundle / skylab and run experiments qg-fullDA, l95-fullDA, skylab-atm-land-small
    • Build GEOS-GCM and run a c12 test
      • Used: cmake -DUSE_F2PY=OFF -DCMAKE_SHARED_LINKER_FLAGS="-Wl,-flat_namespace" -DCMAKE_EXE_LINKER_FLAGS="-Wl,-flat_namespace" -DCMAKE_INSTALL_PREFIX=/Users/heinzell/scratch/geos-gcm-spack-stack-20210118/GEOS_20240119/GEOSgcm/install-20240212 .. 2>&1 | tee log.cmake

Applications affected

List all known applications (UFS WM, JEDI, SRW, etc.) intentionally or unintentionally affected by this PR.

Systems affected

List all systems intentionally or unintentionally affected by this PR.

Dependencies

Issue(s) addressed

Resolves #1001
Resolves #996
Resolves #987

Checklist

  • This PR addresses one issue/problem/enhancement, or has a very good reason for not doing so.
  • These changes have been tested on the affected systems and applications.
  • All dependency PRs/issues have been resolved and this PR can be merged.

…ommented-out version before merging, after extensive testing
…os)' in spack-ext/lib/jcsda-emc/spack-stack/stack/stack_env.py
@climbfuji climbfuji self-assigned this Feb 7, 2024
@climbfuji climbfuji added the INFRA JEDI Infrastructure label Feb 7, 2024
@climbfuji climbfuji changed the title WIP: feature/discover_scu17 Update Discover site configs, fix CI buillds, install GEOS-enabled spack-stack on Discover / AWS ParallelCluster Feb 16, 2024
@climbfuji
Copy link
Collaborator Author

@mathomp4 FYI - here are the site config updates for Discover SCU16 and SCU17. I am going to add you as reviewer, but I don't expect you to look at all of this - if you could take a peek at the Discover stuff, that would be appreciated though. Thanks!

Copy link
Collaborator

@mathomp4 mathomp4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The discover stuff looks good to me!

Copy link
Collaborator

@srherbener srherbener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes for openmi 5.x and hdf5 1.14.3 look good. Thanks!

@climbfuji climbfuji merged commit b63249e into JCSDA:develop Feb 26, 2024
7 checks passed
@climbfuji climbfuji deleted the feature/discover_scu17 branch February 26, 2024 03:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
INFRA JEDI Infrastructure
Projects
No open projects
4 participants