Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PNNL CI w/ GitLab for GPU testing #64

Merged
merged 52 commits into from
Jan 3, 2023
Merged

Add PNNL CI w/ GitLab for GPU testing #64

merged 52 commits into from
Jan 3, 2023

Conversation

cameronrutherford
Copy link
Collaborator

@cameronrutherford cameronrutherford commented Nov 16, 2022

This seems to be functional with empty testing scripts in PNNL CI.

Remaining TODOs are in README in pnnl-ci directory.

@cameronrutherford
Copy link
Collaborator Author

Only run in merge requests

Copy link
Collaborator

@jeff-cohere jeff-cohere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @cameronrutherford, for the clear instructions and comments everywhere. Very nice.

.github/pnnl-ci/README.md Outdated Show resolved Hide resolved
.github/pnnl-ci/ci.sh Outdated Show resolved Hide resolved
@jaelynlitz
Copy link
Collaborator

Thanks, @cameronrutherford, for the clear instructions and comments everywhere. Very nice.

+1 to this 😄 thanks for all the docs along with the implementation

@cameronrutherford
Copy link
Collaborator Author

https://code.pnnl.gov/e3sm/eagles/mam4xx/-/jobs/7350 - this is now generating an error during the building of mam4xx in CI.

I was able to get HAERO to build with GPUs enabled using this pipeline, but for some reason I am unable to get the C/C++ compiler to correctly be configured.

Relevant script is in .github/pnnl-ci/ci.sh, and I was able to successfully build in the same partition running the following script manually myself:

#!/bin/bash

export BUILD_TYPE=Debug
export HAERO_INSTALL=$(pwd)/test-haero-install
export PRECISION=single
export SYSTEM_NAME=deception

# Assuming that this is the directory where you have mam4xx cloned
pushd mam4xx-git
./.github/pnnl-ci/ci.sh
popd

And this is the script that I used to build HAERO in testing this myself:

#!/bin/bash

export BUILD_TYPE=Debug
export HAERO_INSTALL=$(pwd)/test-haero-install
export PRECISION=single
export SYSTEM_NAME=deception

# Assuming mam4xx-git is where you have the repo cloned
./mam4xx-git/.github/pnnl-ci/rebuild-haero.sh

@cameronrutherford
Copy link
Collaborator Author

While CI is failing, I am also getting test failures when running mam4xx tests myself on the compute node with P100 GPU:

[ruth521@dl build]$ ctest -VV --rerun-failed
UpdateCTestConfiguration  from :/qfs/people/ruth521/projects/mam4xx-eagles/mam4xx-git/build/DartConfiguration.tcl
Parse Config file:/qfs/people/ruth521/projects/mam4xx-eagles/mam4xx-git/build/DartConfiguration.tcl
UpdateCTestConfiguration  from :/qfs/people/ruth521/projects/mam4xx-eagles/mam4xx-git/build/DartConfiguration.tcl
Parse Config file:/qfs/people/ruth521/projects/mam4xx-eagles/mam4xx-git/build/DartConfiguration.tcl
Test project /qfs/people/ruth521/projects/mam4xx-eagles/mam4xx-git/build
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 5
    Start 5: conversions_unit_tests

5: Test command: /usr/bin/sh "-c" "/qfs/people/ruth521/projects/mam4xx-eagles/mam4xx-git/build/bin/test-launcher -- ./conversions_unit_tests --use-colour no"
5: Environment variables:
5:  OMP_NUM_THREADS=1
5: Test timeout computed to be: 1500
5: Calling initialize_kokkos
5: Warning: command line argument '--kokkos-num-devices' is deprecated. Use '--kokkos-map-device-id-by=mpi_rank' instead. Raised by Kokkos::initialize().
5:  ExecSpace name: Cuda
5:  ExecSpace initialized: yes
5:  active avx set:
5:  compiler id: GCC
5:  FPE support is enabled, current FPE mask: 0 (NONE)
5:  #host threads: 1
5:
5: Starting catch session on rank 0 out of 1
5:
5: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5: conversions_unit_tests is a Catch v2.13.8 host application.
5: Run with -? for options
5:
5: -------------------------------------------------------------------------------
5: conversions
5: -------------------------------------------------------------------------------
5: /qfs/people/ruth521/projects/mam4xx-eagles/mam4xx-git/src/tests/conversions_unit_tests.cpp:22
5: ...............................................................................
5:
5: /qfs/people/ruth521/projects/mam4xx-eagles/mam4xx-git/src/tests/conversions_unit_tests.cpp:22: FAILED:
5: due to unexpected exception with message:
5:   /qfs/people/ruth521/projects/mam4xx-eagles/mam4xx-git/src/tests/
5:   atmosphere_utils.cpp:50: FAIL:
5:   FloatingPoint<Real>::equiv( psum, p0, std::numeric_limits<float>::epsilon())
5:
5: ===============================================================================
5: test cases: 1 | 1 failed
5: assertions: 1 | 1 failed                                                                                                                                                                                                                                            -Dec-22
5:
5: NO RESOURCE SPECS
5: RUN: OMP_PROC_BIND=spread OMP_PLACES=threads ./conversions_unit_tests --use-colour no
5: FROM: /qfs/people/ruth521/projects/mam4xx-eagles/mam4xx-git/build/src/tests
1/1 Test #5: conversions_unit_tests ...........***Failed    0.24 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) =   0.30 sec

The following tests FAILED:
          5 - conversions_unit_tests (Failed)
Errors while running CTest
Output from these tests are in: /qfs/people/ruth521/projects/mam4xx-eagles/mam4xx-git/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

Test 5 is the only test that fails.

@jeff-cohere
Copy link
Collaborator

I'm having trouble logging into PNNL's system (an "access policy" problem?). Can you paste any relevant text from the CI failure here?

Also, for your manual testing: are you using single or double precision?

@jeff-cohere
Copy link
Collaborator

By the way, it might be good to rebase this branch against main.

@cameronrutherford
Copy link
Collaborator Author

I'm having trouble logging into PNNL's system (an "access policy" problem?). Can you paste any relevant text from the CI failure here?

Also, for your manual testing: are you using single or double precision?

Manual testing was just with single precision - that is captured in the scripts that I added to re-produce.

w.r.t. GitLab, I see you have an account and should have access to eagles gitlab under the username @jeff. I will paste in the logs here, but I really don't want to have to do that at all moving forward as it can be quite the barrier to progress...

Single Precision Release mode PNNL CI Output
�[0KRunning with gitlab-runner 12.7.1 (003fe500)
�[0;m�[0K  on deception-gitlab-runner-gitlab-runner-57b7ff9579-lvn4w zzU3Vgs2
�[0;msection_start:1670005068:prepare_executor
�[0K�[0KUsing Kubernetes namespace: eagles
�[0;m�[0KUsing Kubernetes executor with image kfox1111/slurm:deception2 ...
�[0;msection_end:1670005068:prepare_executor
�[0Ksection_start:1670005068:prepare_script
�[0KWaiting for pod eagles/runner-zzu3vgs2-project-298-concurrent-1vnpsk to be running, status is Pending
Waiting for pod eagles/runner-zzu3vgs2-project-298-concurrent-1vnpsk to be running, status is Pending
Waiting for pod eagles/runner-zzu3vgs2-project-298-concurrent-1vnpsk to be running, status is Pending
Running on runner-zzu3vgs2-project-298-concurrent-1vnpsk via deception-gitlab-runner-gitlab-runner-57b7ff9579-lvn4w...
section_end:1670005078:prepare_script
�[0Ksection_start:1670005078:get_sources
�[0K�[32;1mFetching changes with git depth set to 20...�[0;m
Initialized empty Git repository in /builds/e3sm/eagles/mam4xx/.git/
�[32;1mCreated fresh repository.�[0;m
From https://code.pnnl.gov/e3sm/eagles/mam4xx
 * [new ref]         refs/pipelines/3419 -> refs/pipelines/3419
�[32;1mChecking out a3832b6c as refs/merge-requests/64/head...�[0;m

�[32;1mSkipping Git submodules setup�[0;m
section_end:1670005078:get_sources
�[0Ksection_start:1670005078:restore_cache
�[0Ksection_end:1670005079:restore_cache
�[0Ksection_start:1670005079:download_artifacts
�[0Ksection_end:1670005079:download_artifacts
�[0Ksection_start:1670005079:build_script
�[0K#  NOTES:  WORKDIR is on constance/deception/newell
#          ./      is only on the Kubernetes instance
#          Comments in multi line YAML are buggy,
#          So make sure to indent at same level

export WORKDIR="$HOME/gitlab/${CI_PIPELINE_ID}/${WORKDIR_SUFFIX}"
++ export WORKDIR=/people/svceagles/gitlab/3419/mam4xx_Debug_single
++ WORKDIR=/people/svceagles/gitlab/3419/mam4xx_Debug_single
mkdir -p $WORKDIR
++ mkdir -p /people/svceagles/gitlab/3419/mam4xx_Debug_single
�[32;1m$ set -xv # collapsed multi-line command�[0;m
cp -r . $WORKDIR
++ cp -r . /people/svceagles/gitlab/3419/mam4xx_Debug_single
cd $WORKDIR
++ cd /people/svceagles/gitlab/3419/mam4xx_Debug_single

# Unique output file for this stage
output="output"
++ output=output
echo -n > $output
++ echo -n
tail -f $output &
tailpid=$!
++ tailpid=18

# Some variables need to be exported to propogate to scripts
export HAERO_INSTALL=$HAERO_INSTALL 
++ export HAERO_INSTALL=/qfs/projects/eagles/pnnl-ci/haero_Debug_single
++ HAERO_INSTALL=/qfs/projects/eagles/pnnl-ci/haero_Debug_single
export BUILD_TYPE=$BUILD_TYPE
++ export BUILD_TYPE=Debug
++ BUILD_TYPE=Debug
export PRECISION=$PRECISION
++ export PRECISION=single
++ PRECISION=single
++ tail -f output
export SYSTEM_NAME=$SYSTEM_NAME
++ export SYSTEM_NAME=deception
++ SYSTEM_NAME=deception

# jobid used in pnnl_after_script_template to cancel job if cancelled or
# timed out by gitlab through the UI
# We use a template script name so that each pipeline stage can re-use same script configuration
jobid=$(sbatch --export=ALL -A EAGLES --gres=gpu:1 --ntasks=3 -p $SLURM_Q -o $output -e $output -t 1:00:00 $WORKDIR/.github/pnnl-ci/$SCRIPT_NAME)
+++ sbatch --export=ALL -A EAGLES --gres=gpu:1 --ntasks=3 -p dl_shared -o output -e output -t 1:00:00 /people/svceagles/gitlab/3419/mam4xx_Debug_single/.github/pnnl-ci/ci.sh
++ jobid='Submitted batch job 1356381'
export jobid=$(echo $jobid | cut -f4 -d' ')
+++ echo Submitted batch job 1356381
+++ cut -f4 '-d '
++ export jobid=1356381
++ jobid=1356381
# Unique jobid filename for this job
echo $jobid > "$WORKDIR/jobid_${jobid}"
++ echo 1356381
res=1
++ res=1

while :;
do
  if [[ "$(awk 'BEGIN{i=0}/BUILD_STATUS/{i++}END{print i}' $output)" != "0" ]]; then
    kill $tailpid
    echo 'Last tail of build $output:'
    tail -n 200 $output
    res=$(grep BUILD_STATUS $output | tail -n 1 | cut -f2 -d':')
    break
  fi
  sleep 10
done
++ :
+++ awk 'BEGIN{i=0}/BUILD_STATUS/{i++}END{print i}' output
++ [[ 0 != \0 ]]
++ sleep 10

# TODO - add more verification to ensure variables are set before proceeding
echo $BUILD_TYPE "detected for BUILD_TYPE"
+ echo Debug 'detected for BUILD_TYPE'
Debug detected for BUILD_TYPE
echo $HAERO_INSTALL "detected for HAERO install location"
+ echo /qfs/projects/eagles/pnnl-ci/haero_Debug_single 'detected for HAERO install location'
/qfs/projects/eagles/pnnl-ci/haero_Debug_single detected for HAERO install location
echo $PRECISION "detected for PRECISION"
+ echo single 'detected for PRECISION'
single detected for PRECISION

. /etc/profile.d/modules.sh
+ . /etc/profile.d/modules.sh
if [ "${MODULE_VERSION:-}" = "" ]; then
	MODULE_VERSION_STACK="3.2.10"
	MODULE_VERSION="3.2.10"
	export MODULE_VERSION
else
	MODULE_VERSION_STACK="$MODULE_VERSION"
fi
++ '[' '' = '' ']'
++ MODULE_VERSION_STACK=3.2.10
++ MODULE_VERSION=3.2.10
++ export MODULE_VERSION
export MODULE_VERSION_STACK
++ export MODULE_VERSION_STACK

module() { eval `/share/apps/modules/Modules/$MODULE_VERSION/bin/modulecmd bash $*`; }
export -f module
++ export -f module

MODULESHOME=/share/apps/modules/Modules/3.2.10
++ MODULESHOME=/share/apps/modules/Modules/3.2.10
export MODULESHOME
++ export MODULESHOME

if [ "${LOADEDMODULES:-}" = "" ]; then
  LOADEDMODULES=
  export LOADEDMODULES
fi
++ '[' '' = '' ']'
++ LOADEDMODULES=
++ export LOADEDMODULES

if [ "${MODULEPATH:-}" = "" ]; then
  MODULEPATH=`sed -n 's/[ 	#].*$//; /./H; $ { x; s/^\n//; s/\n/:/g; p; }' ${MODULESHOME}/init/.modulespath`
  export MODULEPATH
fi
++ '[' '' = '' ']'
sed -n 's/[ 	#].*$//; /./H; $ { x; s/^\n//; s/\n/:/g; p; }' ${MODULESHOME}/init/.modulespath
+++ sed -n 's/[ 	#].*$//; /./H; $ { x; s/^\n//; s/\n/:/g; p; }' /share/apps/modules/Modules/3.2.10/init/.modulespath
++ MODULEPATH='/share/apps/modules/Modules/versions:$MODULESHOME/modulefiles/environment:$MODULESHOME/modulefiles/development/mpi:$MODULESHOME/modulefiles/development/mlib:$MODULESHOME/modulefiles/development/compilers:$MODULESHOME/modulefiles/development/tools:$MODULESHOME/modulefiles/apps:$MODULESHOME/modulefiles/libs'
++ export MODULEPATH

if [ ${BASH_VERSINFO:-0} -ge 3 ] && [ -r ${MODULESHOME}/init/bash_completion ]; then
 . ${MODULESHOME}/init/bash_completion
fi
++ '[' 4 -ge 3 ']'
++ '[' -r /share/apps/modules/Modules/3.2.10/init/bash_completion ']'
++ . /share/apps/modules/Modules/3.2.10/init/bash_completion
#
# Bash commandline completion (bash 3.0 and above) for Modules 3.2.10
#
_module_avail() {
	/share/apps/modules/Modules/3.2.10/bin/modulecmd bash -t avail 2>&1 | sed '
		/:$/d;
		/:ERROR:/d;
		s#^\(.*\)/\(.\+\)(default)#\1\n\1\/\2#;
		s#/(default)##g;
		s#/*$##g;'
}

_module_not_yet_loaded() {
	comm -23  <(_module_avail|sort)  <(tr : '\n' <<<${LOADEDMODULES}|sort)
}

_module_long_arg_list() {
	local cur="$1" i

	if [[ ${COMP_WORDS[COMP_CWORD-2]} == sw* ]]
	then
		COMPREPLY=( $(compgen -W "$(_module_not_yet_loaded)" -- "$cur") )
		return
	fi
	for ((i = COMP_CWORD - 1; i > 0; i--))
	do case ${COMP_WORDS[$i]} in
	   add|load)
		COMPREPLY=( $(compgen -W "$(_module_not_yet_loaded)" -- "$cur") )
		break;;
	   rm|remove|unload|switch|swap)
		COMPREPLY=( $(IFS=: compgen -W "${LOADEDMODULES}" -- "$cur") )
		break;;
	   esac
	done
}

_module() {
	local cur="$2" prev="$3" cmds opts

	COMPREPLY=()

	cmds="add apropos avail clear display help\
	      initadd initclear initlist initprepend initrm initswitch\
	      keyword list load purge refresh rm show swap switch\
	      unload unuse update use whatis"

	opts="-c -f -h -i -l -s -t -u -v -H -V\
	      --create --force  --help  --human   --icase\
	      --long   --silent --terse --userlvl --verbose --version"

	case "$prev" in
	add|load)	COMPREPLY=( $(compgen -W "$(_module_not_yet_loaded)" -- "$cur") );;
	rm|remove|unload|switch|swap)
			COMPREPLY=( $(IFS=: compgen -W "${LOADEDMODULES}" -- "$cur") );;
	unuse)		COMPREPLY=( $(IFS=: compgen -W "${MODULEPATH}" -- "$cur") );;
	use|*-a*)	;;			# let readline handle the completion
	-u|--userlvl)	COMPREPLY=( $(compgen -W "novice expert advanced" -- "$cur") );;
	display|help|show|whatis)
			COMPREPLY=( $(compgen -W "$(_module_avail)" -- "$cur") );;
	*) if test $COMP_CWORD -gt 2
	   then
		_module_long_arg_list "$cur"
	   else
		case "$cur" in
		# The mappings below are optional abbreviations for convenience
		ls)	COMPREPLY="list";;	# map ls -> list
		r*)	COMPREPLY="rm";;	# also covers 'remove'
		sw*)	COMPREPLY="switch";;

		-*)	COMPREPLY=( $(compgen -W "$opts" -- "$cur") );;
		*)	COMPREPLY=( $(compgen -W "$cmds" -- "$cur") );;
		esac
	   fi;;
	esac
}
complete -o default -F _module module
+++ complete -o default -F _module module
module purge
+ module purge
/share/apps/modules/Modules/$MODULE_VERSION/bin/modulecmd bash $*
++ /share/apps/modules/Modules/3.2.10/bin/modulecmd bash purge
+ eval
module load cmake
+ module load cmake
/share/apps/modules/Modules/$MODULE_VERSION/bin/modulecmd bash $*
++ /share/apps/modules/Modules/3.2.10/bin/modulecmd bash load cmake
+ eval ACLOCAL_PATH=/share/apps/rocmapps/views/cmake/3.21.4/share/aclocal ';export' 'ACLOCAL_PATH;CMAKE_PREFIX_PATH=/share/apps/rocmapps/views/cmake/3.21.4' ';export' 'CMAKE_PREFIX_PATH;LOADEDMODULES=cmake/3.21.4' ';export' 'LOADEDMODULES;PATH=/share/apps/rocmapps/views/cmake/3.21.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin' ';export' 'PATH;_LMFILES_=/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cmake/3.21.4' ';export' '_LMFILES_;'
ACLOCAL_PATH=/share/apps/rocmapps/views/cmake/3.21.4/share/aclocal ;export ACLOCAL_PATH;CMAKE_PREFIX_PATH=/share/apps/rocmapps/views/cmake/3.21.4 ;export CMAKE_PREFIX_PATH;LOADEDMODULES=cmake/3.21.4 ;export LOADEDMODULES;PATH=/share/apps/rocmapps/views/cmake/3.21.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin ;export PATH;_LMFILES_=/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cmake/3.21.4 ;export _LMFILES_;
++ ACLOCAL_PATH=/share/apps/rocmapps/views/cmake/3.21.4/share/aclocal
++ export ACLOCAL_PATH
++ CMAKE_PREFIX_PATH=/share/apps/rocmapps/views/cmake/3.21.4
++ export CMAKE_PREFIX_PATH
++ LOADEDMODULES=cmake/3.21.4
++ export LOADEDMODULES
++ PATH=/share/apps/rocmapps/views/cmake/3.21.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
++ export PATH
++ _LMFILES_=/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cmake/3.21.4
++ export _LMFILES_
module load gcc/9.1.0
+ module load gcc/9.1.0
/share/apps/modules/Modules/$MODULE_VERSION/bin/modulecmd bash $*
++ /share/apps/modules/Modules/3.2.10/bin/modulecmd bash load gcc/9.1.0
+ eval CC=gcc ';export' 'CC;CXX=g++' ';export' 'CXX;F77=gfortran' ';export' 'F77;F90=gfortran' ';export' 'F90;FC=gfortran' ';export' 'FC;LD_LIBRARY_PATH=/share/apps/gcc/9.1.0/lib:/share/apps/gcc/9.1.0/lib64' ';export' 'LD_LIBRARY_PATH;LOADEDMODULES=cmake/3.21.4:gcc/9.1.0' ';export' 'LOADEDMODULES;MANPATH=/share/apps/gcc/9.1.0/share/man:/usr/share/man' ';export' 'MANPATH;PATH=/share/apps/gcc/9.1.0/bin:/share/apps/rocmapps/views/cmake/3.21.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin' ';export' 'PATH;PNNL_COMPILER=gcc' ';export' 'PNNL_COMPILER;PNNL_COMPILER_VERSION=9.1.0' ';export' 'PNNL_COMPILER_VERSION;_LMFILES_=/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cmake/3.21.4:/share/apps/modules/Modules/3.2.10/modulefiles/development/compilers/gcc/9.1.0' ';export' '_LMFILES_;'
CC=gcc ;export CC;CXX=g++ ;export CXX;F77=gfortran ;export F77;F90=gfortran ;export F90;FC=gfortran ;export FC;LD_LIBRARY_PATH=/share/apps/gcc/9.1.0/lib:/share/apps/gcc/9.1.0/lib64 ;export LD_LIBRARY_PATH;LOADEDMODULES=cmake/3.21.4:gcc/9.1.0 ;export LOADEDMODULES;MANPATH=/share/apps/gcc/9.1.0/share/man:/usr/share/man ;export MANPATH;PATH=/share/apps/gcc/9.1.0/bin:/share/apps/rocmapps/views/cmake/3.21.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin ;export PATH;PNNL_COMPILER=gcc ;export PNNL_COMPILER;PNNL_COMPILER_VERSION=9.1.0 ;export PNNL_COMPILER_VERSION;_LMFILES_=/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cmake/3.21.4:/share/apps/modules/Modules/3.2.10/modulefiles/development/compilers/gcc/9.1.0 ;export _LMFILES_;
++ CC=gcc
++ export CC
++ CXX=g++
++ export CXX
++ F77=gfortran
++ export F77
++ F90=gfortran
++ export F90
++ FC=gfortran
++ export FC
++ LD_LIBRARY_PATH=/share/apps/gcc/9.1.0/lib:/share/apps/gcc/9.1.0/lib64
++ export LD_LIBRARY_PATH
++ LOADEDMODULES=cmake/3.21.4:gcc/9.1.0
++ export LOADEDMODULES
++ MANPATH=/share/apps/gcc/9.1.0/share/man:/usr/share/man
++ export MANPATH
++ PATH=/share/apps/gcc/9.1.0/bin:/share/apps/rocmapps/views/cmake/3.21.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
++ export PATH
++ PNNL_COMPILER=gcc
++ export PNNL_COMPILER
++ PNNL_COMPILER_VERSION=9.1.0
++ export PNNL_COMPILER_VERSION
++ _LMFILES_=/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cmake/3.21.4:/share/apps/modules/Modules/3.2.10/modulefiles/development/compilers/gcc/9.1.0
++ export _LMFILES_
module load cuda/11.4
+ module load cuda/11.4
/share/apps/modules/Modules/$MODULE_VERSION/bin/modulecmd bash $*
++ /share/apps/modules/Modules/3.2.10/bin/modulecmd bash load cuda/11.4
+ eval CUDA_HOME=/share/apps/cuda/11.4 ';export' 'CUDA_HOME;LD_LIBRARY_PATH=/share/apps/cuda/11.4/lib:/share/apps/cuda/11.4/lib64:/share/apps/gcc/9.1.0/lib:/share/apps/gcc/9.1.0/lib64:/usr/lib64/:/share/apps/cuda/11.4/lib64/stubs:/share/apps/cuda/11.4/extras/CUPTI/lib64' ';export' 'LD_LIBRARY_PATH;LOADEDMODULES=cmake/3.21.4:gcc/9.1.0:cuda/11.4' ';export' 'LOADEDMODULES;PATH=/share/apps/cuda/11.4/bin::/share/apps/gcc/9.1.0/bin:/share/apps/rocmapps/views/cmake/3.21.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin' ';export' 'PATH;_LMFILES_=/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cmake/3.21.4:/share/apps/modules/Modules/3.2.10/modulefiles/development/compilers/gcc/9.1.0:/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cuda/11.4' ';export' '_LMFILES_;'
CUDA_HOME=/share/apps/cuda/11.4 ;export CUDA_HOME;LD_LIBRARY_PATH=/share/apps/cuda/11.4/lib:/share/apps/cuda/11.4/lib64:/share/apps/gcc/9.1.0/lib:/share/apps/gcc/9.1.0/lib64:/usr/lib64/:/share/apps/cuda/11.4/lib64/stubs:/share/apps/cuda/11.4/extras/CUPTI/lib64 ;export LD_LIBRARY_PATH;LOADEDMODULES=cmake/3.21.4:gcc/9.1.0:cuda/11.4 ;export LOADEDMODULES;PATH=/share/apps/cuda/11.4/bin::/share/apps/gcc/9.1.0/bin:/share/apps/rocmapps/views/cmake/3.21.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin ;export PATH;_LMFILES_=/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cmake/3.21.4:/share/apps/modules/Modules/3.2.10/modulefiles/development/compilers/gcc/9.1.0:/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cuda/11.4 ;export _LMFILES_;
++ CUDA_HOME=/share/apps/cuda/11.4
++ export CUDA_HOME
++ LD_LIBRARY_PATH=/share/apps/cuda/11.4/lib:/share/apps/cuda/11.4/lib64:/share/apps/gcc/9.1.0/lib:/share/apps/gcc/9.1.0/lib64:/usr/lib64/:/share/apps/cuda/11.4/lib64/stubs:/share/apps/cuda/11.4/extras/CUPTI/lib64
++ export LD_LIBRARY_PATH
++ LOADEDMODULES=cmake/3.21.4:gcc/9.1.0:cuda/11.4
++ export LOADEDMODULES
++ PATH=/share/apps/cuda/11.4/bin::/share/apps/gcc/9.1.0/bin:/share/apps/rocmapps/views/cmake/3.21.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
++ export PATH
++ _LMFILES_=/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cmake/3.21.4:/share/apps/modules/Modules/3.2.10/modulefiles/development/compilers/gcc/9.1.0:/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cuda/11.4
++ export _LMFILES_
module load python/3.7.0
+ module load python/3.7.0
/share/apps/modules/Modules/$MODULE_VERSION/bin/modulecmd bash $*
++ /share/apps/modules/Modules/3.2.10/bin/modulecmd bash load python/3.7.0
+ eval LD_LIBRARY_PATH=/share/apps/python/3.7.0/lib:/share/apps/cuda/11.4/lib:/share/apps/cuda/11.4/lib64:/share/apps/gcc/9.1.0/lib:/share/apps/gcc/9.1.0/lib64:/usr/lib64/:/share/apps/cuda/11.4/lib64/stubs:/share/apps/cuda/11.4/extras/CUPTI/lib64 ';export' 'LD_LIBRARY_PATH;LOADEDMODULES=cmake/3.21.4:gcc/9.1.0:cuda/11.4:python/3.7.0' ';export' 'LOADEDMODULES;MANPATH=/share/apps/python/3.7.0/man:/share/apps/python/3.7.0/share/man:/share/apps/gcc/9.1.0/share/man:/usr/share/man' ';export' 'MANPATH;PATH=/share/apps/python/3.7.0/bin:/share/apps/cuda/11.4/bin::/share/apps/gcc/9.1.0/bin:/share/apps/rocmapps/views/cmake/3.21.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin' ';export' 'PATH;_LMFILES_=/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cmake/3.21.4:/share/apps/modules/Modules/3.2.10/modulefiles/development/compilers/gcc/9.1.0:/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cuda/11.4:/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/python/3.7.0' ';export' '_LMFILES_;'
LD_LIBRARY_PATH=/share/apps/python/3.7.0/lib:/share/apps/cuda/11.4/lib:/share/apps/cuda/11.4/lib64:/share/apps/gcc/9.1.0/lib:/share/apps/gcc/9.1.0/lib64:/usr/lib64/:/share/apps/cuda/11.4/lib64/stubs:/share/apps/cuda/11.4/extras/CUPTI/lib64 ;export LD_LIBRARY_PATH;LOADEDMODULES=cmake/3.21.4:gcc/9.1.0:cuda/11.4:python/3.7.0 ;export LOADEDMODULES;MANPATH=/share/apps/python/3.7.0/man:/share/apps/python/3.7.0/share/man:/share/apps/gcc/9.1.0/share/man:/usr/share/man ;export MANPATH;PATH=/share/apps/python/3.7.0/bin:/share/apps/cuda/11.4/bin::/share/apps/gcc/9.1.0/bin:/share/apps/rocmapps/views/cmake/3.21.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin ;export PATH;_LMFILES_=/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cmake/3.21.4:/share/apps/modules/Modules/3.2.10/modulefiles/development/compilers/gcc/9.1.0:/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cuda/11.4:/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/python/3.7.0 ;export _LMFILES_;
++ LD_LIBRARY_PATH=/share/apps/python/3.7.0/lib:/share/apps/cuda/11.4/lib:/share/apps/cuda/11.4/lib64:/share/apps/gcc/9.1.0/lib:/share/apps/gcc/9.1.0/lib64:/usr/lib64/:/share/apps/cuda/11.4/lib64/stubs:/share/apps/cuda/11.4/extras/CUPTI/lib64
++ export LD_LIBRARY_PATH
++ LOADEDMODULES=cmake/3.21.4:gcc/9.1.0:cuda/11.4:python/3.7.0
++ export LOADEDMODULES
++ MANPATH=/share/apps/python/3.7.0/man:/share/apps/python/3.7.0/share/man:/share/apps/gcc/9.1.0/share/man:/usr/share/man
++ export MANPATH
++ PATH=/share/apps/python/3.7.0/bin:/share/apps/cuda/11.4/bin::/share/apps/gcc/9.1.0/bin:/share/apps/rocmapps/views/cmake/3.21.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
++ export PATH
++ _LMFILES_=/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cmake/3.21.4:/share/apps/modules/Modules/3.2.10/modulefiles/development/compilers/gcc/9.1.0:/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cuda/11.4:/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/python/3.7.0
++ export _LMFILES_

# Need to set env variables to get compiler set correctly
export CC=$(which gcc) CXX=$(which g++) FC=$(which gfortran)
++ which gcc
++ which g++
++ which gfortran
+ export CC=/share/apps/gcc/9.1.0/bin/gcc CXX=/share/apps/gcc/9.1.0/bin/g++ FC=/share/apps/gcc/9.1.0/bin/gfortran
+ CC=/share/apps/gcc/9.1.0/bin/gcc
+ CXX=/share/apps/gcc/9.1.0/bin/g++
+ FC=/share/apps/gcc/9.1.0/bin/gfortran

# Need to clone in validation submodule as we are unable to clone automatically
perl -i -p -e 's|git@(.*?):|https://\1/|g' .gitmodules || exit
+ perl -i -p -e 's|git@(.*?):|https://\1/|g' .gitmodules
git submodule update --init || exit
+ git submodule update --init
Submodule 'src/validation/mam_x_validation' (https://github.com/eagles-project/mam_x_validation.git) registered for path 'src/validation/mam_x_validation'
Cloning into 'src/validation/mam_x_validation'...
Submodule path 'src/validation/mam_x_validation': checked out 'dd074536fe11784a5e53c4de58a65959d6c9f241'

cmake \
  -DMAM4XX_HAERO_DIR=$HAERO_INSTALL \
  -DCMAKE_INSTALL_PREFIX=$(pwd)/install \
  -DCMAKE_BUILD_TYPE=$BUILD_TYPE \
  -DCMAKE_C_COMPILER=$CC \
  -DCMAKE_CXX_COMPILER=$CXX \
  -B build -S $(pwd) \
  -G "Unix Makefiles" && \

cmake --build build -- -j && \
cd build && ctest -V
++ pwd
++ pwd
+ cmake -DMAM4XX_HAERO_DIR=/qfs/projects/eagles/pnnl-ci/haero_Debug_single -DCMAKE_INSTALL_PREFIX=/people/svceagles/gitlab/3419/mam4xx_Debug_single/install -DCMAKE_BUILD_TYPE=Debug -DCMAKE_C_COMPILER=/share/apps/gcc/9.1.0/bin/gcc -DCMAKE_CXX_COMPILER=/share/apps/gcc/9.1.0/bin/g++ -B build -S /people/svceagles/gitlab/3419/mam4xx_Debug_single -G 'Unix Makefiles'
-- Configuring with build type: Debug
-- Building for GPU
-- The CXX compiler identification is unknown
CMake Error at CMakeLists.txt:32 (enable_language):
  The CMAKE_CXX_COMPILER:

    /people/svceagles/gitlab/3412/haero_Debug_single/.haero/ext/ekat/extern/kokkos/bin/nvcc_wrapper

  is not a full path to an existing compiler tool.

  Tell CMake where to find the compiler by setting either the environment
  variable "CXX" or the CMake cache entry CMAKE_CXX_COMPILER to the full path
  to the compiler, or to the compiler name if it is in the PATH.


-- Configuring incomplete, errors occurred!
See also "/people/svceagles/gitlab/3419/mam4xx_Debug_single/build/CMakeFiles/CMakeOutput.log".
See also "/people/svceagles/gitlab/3419/mam4xx_Debug_single/build/CMakeFiles/CMakeError.log".

EXIT_CODE=$?
+ EXIT_CODE=1

set +xv
+ set +xv
BUILD_STATUS:1
++ :
Last tail of build $output:
+++ awk 'BEGIN{i=0}/BUILD_STATUS/{i++}END{print i}' output
++ [[ 1 != \0 ]]
++ kill 18
	if [[ ${COMP_WORDS[COMP_CWORD-2]} == sw* ]]
	then
		COMPREPLY=( $(compgen -W "$(_module_not_yet_loaded)" -- "$cur") )
		return
	fi
	for ((i = COMP_CWORD - 1; i > 0; i--))
	do case ${COMP_WORDS[$i]} in
	   add|load)
		COMPREPLY=( $(compgen -W "$(_module_not_yet_loaded)" -- "$cur") )
		break;;
	   rm|remove|unload|switch|swap)
		COMPREPLY=( $(IFS=: compgen -W "${LOADEDMODULES}" -- "$cur") )
		break;;
	   esac
	done
}

_module() {
	local cur="$2" prev="$3" cmds opts

	COMPREPLY=()

	cmds="add apropos avail clear display help\
	      initadd initclear initlist initprepend initrm initswitch\
	      keyword list load purge refresh rm show swap switch\
	      unload unuse update use whatis"

	opts="-c -f -h -i -l -s -t -u -v -H -V\
	      --create --force  --help  --human   --icase\
	      --long   --silent --terse --userlvl --verbose --version"

	case "$prev" in
	add|load)	COMPREPLY=( $(compgen -W "$(_module_not_yet_loaded)" -- "$cur") );;
	rm|remove|unload|switch|swap)
			COMPREPLY=( $(IFS=: compgen -W "${LOADEDMODULES}" -- "$cur") );;
	unuse)		COMPREPLY=( $(IFS=: compgen -W "${MODULEPATH}" -- "$cur") );;
	use|*-a*)	;;			# let readline handle the completion
	-u|--userlvl)	COMPREPLY=( $(compgen -W "novice expert advanced" -- "$cur") );;
	display|help|show|whatis)
			COMPREPLY=( $(compgen -W "$(_module_avail)" -- "$cur") );;
	*) if test $COMP_CWORD -gt 2
	   then
		_module_long_arg_list "$cur"
	   else
		case "$cur" in
		# The mappings below are optional abbreviations for convenience
		ls)	COMPREPLY="list";;	# map ls -> list
		r*)	COMPREPLY="rm";;	# also covers 'remove'
		sw*)	COMPREPLY="switch";;

		-*)	COMPREPLY=( $(compgen -W "$opts" -- "$cur") );;
		*)	COMPREPLY=( $(compgen -W "$cmds" -- "$cur") );;
		esac
	   fi;;
	esac
}
complete -o default -F _module module
+++ complete -o default -F _module module
module purge
+ module purge
/share/apps/modules/Modules/$MODULE_VERSION/bin/modulecmd bash $*
++ /share/apps/modules/Modules/3.2.10/bin/modulecmd bash purge
+ eval
module load cmake
+ module load cmake
/share/apps/modules/Modules/$MODULE_VERSION/bin/modulecmd bash $*
++ /share/apps/modules/Modules/3.2.10/bin/modulecmd bash load cmake
+ eval ACLOCAL_PATH=/share/apps/rocmapps/views/cmake/3.21.4/share/aclocal ';export' 'ACLOCAL_PATH;CMAKE_PREFIX_PATH=/share/apps/rocmapps/views/cmake/3.21.4' ';export' 'CMAKE_PREFIX_PATH;LOADEDMODULES=cmake/3.21.4' ';export' 'LOADEDMODULES;PATH=/share/apps/rocmapps/views/cmake/3.21.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin' ';export' 'PATH;_LMFILES_=/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cmake/3.21.4' ';export' '_LMFILES_;'
ACLOCAL_PATH=/share/apps/rocmapps/views/cmake/3.21.4/share/aclocal ;export ACLOCAL_PATH;CMAKE_PREFIX_PATH=/share/apps/rocmapps/views/cmake/3.21.4 ;export CMAKE_PREFIX_PATH;LOADEDMODULES=cmake/3.21.4 ;export LOADEDMODULES;PATH=/share/apps/rocmapps/views/cmake/3.21.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin ;export PATH;_LMFILES_=/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cmake/3.21.4 ;export _LMFILES_;
++ ACLOCAL_PATH=/share/apps/rocmapps/views/cmake/3.21.4/share/aclocal
++ export ACLOCAL_PATH
++ CMAKE_PREFIX_PATH=/share/apps/rocmapps/views/cmake/3.21.4
++ export CMAKE_PREFIX_PATH
++ LOADEDMODULES=cmake/3.21.4
++ export LOADEDMODULES
++ PATH=/share/apps/rocmapps/views/cmake/3.21.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
++ export PATH
++ _LMFILES_=/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cmake/3.21.4
++ export _LMFILES_
module load gcc/9.1.0
+ module load gcc/9.1.0
/share/apps/modules/Modules/$MODULE_VERSION/bin/modulecmd bash $*
++ /share/apps/modules/Modules/3.2.10/bin/modulecmd bash load gcc/9.1.0
+ eval CC=gcc ';export' 'CC;CXX=g++' ';export' 'CXX;F77=gfortran' ';export' 'F77;F90=gfortran' ';export' 'F90;FC=gfortran' ';export' 'FC;LD_LIBRARY_PATH=/share/apps/gcc/9.1.0/lib:/share/apps/gcc/9.1.0/lib64' ';export' 'LD_LIBRARY_PATH;LOADEDMODULES=cmake/3.21.4:gcc/9.1.0' ';export' 'LOADEDMODULES;MANPATH=/share/apps/gcc/9.1.0/share/man:/usr/share/man' ';export' 'MANPATH;PATH=/share/apps/gcc/9.1.0/bin:/share/apps/rocmapps/views/cmake/3.21.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin' ';export' 'PATH;PNNL_COMPILER=gcc' ';export' 'PNNL_COMPILER;PNNL_COMPILER_VERSION=9.1.0' ';export' 'PNNL_COMPILER_VERSION;_LMFILES_=/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cmake/3.21.4:/share/apps/modules/Modules/3.2.10/modulefiles/development/compilers/gcc/9.1.0' ';export' '_LMFILES_;'
CC=gcc ;export CC;CXX=g++ ;export CXX;F77=gfortran ;export F77;F90=gfortran ;export F90;FC=gfortran ;export FC;LD_LIBRARY_PATH=/share/apps/gcc/9.1.0/lib:/share/apps/gcc/9.1.0/lib64 ;export LD_LIBRARY_PATH;LOADEDMODULES=cmake/3.21.4:gcc/9.1.0 ;export LOADEDMODULES;MANPATH=/share/apps/gcc/9.1.0/share/man:/usr/share/man ;export MANPATH;PATH=/share/apps/gcc/9.1.0/bin:/share/apps/rocmapps/views/cmake/3.21.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin ;export PATH;PNNL_COMPILER=gcc ;export PNNL_COMPILER;PNNL_COMPILER_VERSION=9.1.0 ;export PNNL_COMPILER_VERSION;_LMFILES_=/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cmake/3.21.4:/share/apps/modules/Modules/3.2.10/modulefiles/development/compilers/gcc/9.1.0 ;export _LMFILES_;
++ CC=gcc
++ export CC
++ CXX=g++
++ export CXX
++ F77=gfortran
++ export F77
++ F90=gfortran
++ export F90
++ FC=gfortran
++ export FC
++ LD_LIBRARY_PATH=/share/apps/gcc/9.1.0/lib:/share/apps/gcc/9.1.0/lib64
++ export LD_LIBRARY_PATH
++ LOADEDMODULES=cmake/3.21.4:gcc/9.1.0
++ export LOADEDMODULES
++ MANPATH=/share/apps/gcc/9.1.0/share/man:/usr/share/man
++ export MANPATH
++ PATH=/share/apps/gcc/9.1.0/bin:/share/apps/rocmapps/views/cmake/3.21.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
++ export PATH
++ PNNL_COMPILER=gcc
++ export PNNL_COMPILER
++ PNNL_COMPILER_VERSION=9.1.0
++ export PNNL_COMPILER_VERSION
++ _LMFILES_=/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cmake/3.21.4:/share/apps/modules/Modules/3.2.10/modulefiles/development/compilers/gcc/9.1.0
++ export _LMFILES_
module load cuda/11.4
+ module load cuda/11.4
/share/apps/modules/Modules/$MODULE_VERSION/bin/modulecmd bash $*
++ /share/apps/modules/Modules/3.2.10/bin/modulecmd bash load cuda/11.4
+ eval CUDA_HOME=/share/apps/cuda/11.4 ';export' 'CUDA_HOME;LD_LIBRARY_PATH=/share/apps/cuda/11.4/lib:/share/apps/cuda/11.4/lib64:/share/apps/gcc/9.1.0/lib:/share/apps/gcc/9.1.0/lib64:/usr/lib64/:/share/apps/cuda/11.4/lib64/stubs:/share/apps/cuda/11.4/extras/CUPTI/lib64' ';export' 'LD_LIBRARY_PATH;LOADEDMODULES=cmake/3.21.4:gcc/9.1.0:cuda/11.4' ';export' 'LOADEDMODULES;PATH=/share/apps/cuda/11.4/bin::/share/apps/gcc/9.1.0/bin:/share/apps/rocmapps/views/cmake/3.21.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin' ';export' 'PATH;_LMFILES_=/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cmake/3.21.4:/share/apps/modules/Modules/3.2.10/modulefiles/development/compilers/gcc/9.1.0:/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cuda/11.4' ';export' '_LMFILES_;'
CUDA_HOME=/share/apps/cuda/11.4 ;export CUDA_HOME;LD_LIBRARY_PATH=/share/apps/cuda/11.4/lib:/share/apps/cuda/11.4/lib64:/share/apps/gcc/9.1.0/lib:/share/apps/gcc/9.1.0/lib64:/usr/lib64/:/share/apps/cuda/11.4/lib64/stubs:/share/apps/cuda/11.4/extras/CUPTI/lib64 ;export LD_LIBRARY_PATH;LOADEDMODULES=cmake/3.21.4:gcc/9.1.0:cuda/11.4 ;export LOADEDMODULES;PATH=/share/apps/cuda/11.4/bin::/share/apps/gcc/9.1.0/bin:/share/apps/rocmapps/views/cmake/3.21.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin ;export PATH;_LMFILES_=/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cmake/3.21.4:/share/apps/modules/Modules/3.2.10/modulefiles/development/compilers/gcc/9.1.0:/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cuda/11.4 ;export _LMFILES_;
++ CUDA_HOME=/share/apps/cuda/11.4
++ export CUDA_HOME
++ LD_LIBRARY_PATH=/share/apps/cuda/11.4/lib:/share/apps/cuda/11.4/lib64:/share/apps/gcc/9.1.0/lib:/share/apps/gcc/9.1.0/lib64:/usr/lib64/:/share/apps/cuda/11.4/lib64/stubs:/share/apps/cuda/11.4/extras/CUPTI/lib64
++ export LD_LIBRARY_PATH
++ LOADEDMODULES=cmake/3.21.4:gcc/9.1.0:cuda/11.4
++ export LOADEDMODULES
++ PATH=/share/apps/cuda/11.4/bin::/share/apps/gcc/9.1.0/bin:/share/apps/rocmapps/views/cmake/3.21.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
++ export PATH
++ _LMFILES_=/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cmake/3.21.4:/share/apps/modules/Modules/3.2.10/modulefiles/development/compilers/gcc/9.1.0:/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cuda/11.4
++ export _LMFILES_
module load python/3.7.0
+ module load python/3.7.0
/share/apps/modules/Modules/$MODULE_VERSION/bin/modulecmd bash $*
++ /share/apps/modules/Modules/3.2.10/bin/modulecmd bash load python/3.7.0
+ eval LD_LIBRARY_PATH=/share/apps/python/3.7.0/lib:/share/apps/cuda/11.4/lib:/share/apps/cuda/11.4/lib64:/share/apps/gcc/9.1.0/lib:/share/apps/gcc/9.1.0/lib64:/usr/lib64/:/share/apps/cuda/11.4/lib64/stubs:/share/apps/cuda/11.4/extras/CUPTI/lib64 ';export' 'LD_LIBRARY_PATH;LOADEDMODULES=cmake/3.21.4:gcc/9.1.0:cuda/11.4:python/3.7.0' ';export' 'LOADEDMODULES;MANPATH=/share/apps/python/3.7.0/man:/share/apps/python/3.7.0/share/man:/share/apps/gcc/9.1.0/share/man:/usr/share/man' ';export' 'MANPATH;PATH=/share/apps/python/3.7.0/bin:/share/apps/cuda/11.4/bin::/share/apps/gcc/9.1.0/bin:/share/apps/rocmapps/views/cmake/3.21.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin' ';export' 'PATH;_LMFILES_=/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cmake/3.21.4:/share/apps/modules/Modules/3.2.10/modulefiles/development/compilers/gcc/9.1.0:/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cuda/11.4:/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/python/3.7.0' ';export' '_LMFILES_;'
LD_LIBRARY_PATH=/share/apps/python/3.7.0/lib:/share/apps/cuda/11.4/lib:/share/apps/cuda/11.4/lib64:/share/apps/gcc/9.1.0/lib:/share/apps/gcc/9.1.0/lib64:/usr/lib64/:/share/apps/cuda/11.4/lib64/stubs:/share/apps/cuda/11.4/extras/CUPTI/lib64 ;export LD_LIBRARY_PATH;LOADEDMODULES=cmake/3.21.4:gcc/9.1.0:cuda/11.4:python/3.7.0 ;export LOADEDMODULES;MANPATH=/share/apps/python/3.7.0/man:/share/apps/python/3.7.0/share/man:/share/apps/gcc/9.1.0/share/man:/usr/share/man ;export MANPATH;PATH=/share/apps/python/3.7.0/bin:/share/apps/cuda/11.4/bin::/share/apps/gcc/9.1.0/bin:/share/apps/rocmapps/views/cmake/3.21.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin ;export PATH;_LMFILES_=/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cmake/3.21.4:/share/apps/modules/Modules/3.2.10/modulefiles/development/compilers/gcc/9.1.0:/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cuda/11.4:/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/python/3.7.0 ;export _LMFILES_;
++ LD_LIBRARY_PATH=/share/apps/python/3.7.0/lib:/share/apps/cuda/11.4/lib:/share/apps/cuda/11.4/lib64:/share/apps/gcc/9.1.0/lib:/share/apps/gcc/9.1.0/lib64:/usr/lib64/:/share/apps/cuda/11.4/lib64/stubs:/share/apps/cuda/11.4/extras/CUPTI/lib64
++ export LD_LIBRARY_PATH
++ LOADEDMODULES=cmake/3.21.4:gcc/9.1.0:cuda/11.4:python/3.7.0
++ export LOADEDMODULES
++ MANPATH=/share/apps/python/3.7.0/man:/share/apps/python/3.7.0/share/man:/share/apps/gcc/9.1.0/share/man:/usr/share/man
++ export MANPATH
++ PATH=/share/apps/python/3.7.0/bin:/share/apps/cuda/11.4/bin::/share/apps/gcc/9.1.0/bin:/share/apps/rocmapps/views/cmake/3.21.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
++ export PATH
++ _LMFILES_=/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cmake/3.21.4:/share/apps/modules/Modules/3.2.10/modulefiles/development/compilers/gcc/9.1.0:/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/cuda/11.4:/share/apps/modules/Modules/3.2.10/modulefiles/development/tools/python/3.7.0
++ export _LMFILES_

# Need to set env variables to get compiler set correctly
export CC=$(which gcc) CXX=$(which g++) FC=$(which gfortran)
++ which gcc
++ which g++
++ which gfortran
+ export CC=/share/apps/gcc/9.1.0/bin/gcc CXX=/share/apps/gcc/9.1.0/bin/g++ FC=/share/apps/gcc/9.1.0/bin/gfortran
+ CC=/share/apps/gcc/9.1.0/bin/gcc
+ CXX=/share/apps/gcc/9.1.0/bin/g++
+ FC=/share/apps/gcc/9.1.0/bin/gfortran

# Need to clone in validation submodule as we are unable to clone automatically
perl -i -p -e 's|git@(.*?):|https://\1/|g' .gitmodules || exit
+ perl -i -p -e 's|git@(.*?):|https://\1/|g' .gitmodules
git submodule update --init || exit
+ git submodule update --init
Submodule 'src/validation/mam_x_validation' (https://github.com/eagles-project/mam_x_validation.git) registered for path 'src/validation/mam_x_validation'
Cloning into 'src/validation/mam_x_validation'...
Submodule path 'src/validation/mam_x_validation': checked out 'dd074536fe11784a5e53c4de58a65959d6c9f241'

cmake \
  -DMAM4XX_HAERO_DIR=$HAERO_INSTALL \
  -DCMAKE_INSTALL_PREFIX=$(pwd)/install \
  -DCMAKE_BUILD_TYPE=$BUILD_TYPE \
  -DCMAKE_C_COMPILER=$CC \
  -DCMAKE_CXX_COMPILER=$CXX \
  -B build -S $(pwd) \
  -G "Unix Makefiles" && \

cmake --build build -- -j && \
cd build && ctest -V
++ pwd
++ pwd
+ cmake -DMAM4XX_HAERO_DIR=/qfs/projects/eagles/pnnl-ci/haero_Debug_single -DCMAKE_INSTALL_PREFIX=/people/svceagles/gitlab/3419/mam4xx_Debug_single/install -DCMAKE_BUILD_TYPE=Debug -DCMAKE_C_COMPILER=/share/apps/gcc/9.1.0/bin/gcc -DCMAKE_CXX_COMPILER=/share/apps/gcc/9.1.0/bin/g++ -B build -S /people/svceagles/gitlab/3419/mam4xx_Debug_single -G 'Unix Makefiles'
-- Configuring with build type: Debug
-- Building for GPU
-- The CXX compiler identification is unknown
CMake Error at CMakeLists.txt:32 (enable_language):
  The CMAKE_CXX_COMPILER:

    /people/svceagles/gitlab/3412/haero_Debug_single/.haero/ext/ekat/extern/kokkos/bin/nvcc_wrapper

  is not a full path to an existing compiler tool.

  Tell CMake where to find the compiler by setting either the environment
  variable "CXX" or the CMake cache entry CMAKE_CXX_COMPILER to the full path
  to the compiler, or to the compiler name if it is in the PATH.


-- Configuring incomplete, errors occurred!
See also "/people/svceagles/gitlab/3419/mam4xx_Debug_single/build/CMakeFiles/CMakeOutput.log".
See also "/people/svceagles/gitlab/3419/mam4xx_Debug_single/build/CMakeFiles/CMakeError.log".

EXIT_CODE=$?
+ EXIT_CODE=1

set +xv
+ set +xv
BUILD_STATUS:1
Finished batch job with exit code: 1
++ echo 'Last tail of build $output:'
++ tail -n 200 output
/usr/bin/bash: line 175:    18 Terminated              tail -f $output
+++ grep BUILD_STATUS output
+++ tail -n 1
+++ cut -f2 -d:
++ res=1
++ break

echo "Finished batch job with exit code: $res"
++ echo 'Finished batch job with exit code: 1'
rm "$WORKDIR/jobid_${jobid}"
++ rm /people/svceagles/gitlab/3419/mam4xx_Debug_single/jobid_1356381
exit $res
++ exit 1
section_end:1670005090:build_script
�[0Ksection_start:1670005090:after_script
�[0K�[32;1mRunning after script...�[0;m
�[32;1m$ export WORKDIR=$HOME/gitlab/${CI_PIPELINE_ID}/${WORKDIR_SUFFIX}�[0;m
�[32;1m$ job_ids="$WORKDIR/jobid_*"�[0;m
�[32;1m$ for job in $job_ids # collapsed multi-line command�[0;m
�[32;1m$ rm -rf $WORKDIR�[0;m
section_end:1670005091:after_script
�[0Ksection_start:1670005091:upload_artifacts_on_failure
�[0Ksection_end:1670005092:upload_artifacts_on_failure
�[0K�[31;1mERROR: Job failed: command terminated with exit code 1
�[0;m

@cameronrutherford
Copy link
Collaborator Author

By the way, it might be good to rebase this branch against main.

I rebase whenever I start new development during a development session so this should be good to go.

@jeff-cohere
Copy link
Collaborator

You might have to add the --recursive flag to your git submodule update --init command. The ekat submodule contains a kokkos submodule that has the nvcc_wrapper script that your error is complaining about.

Comment on lines 97 to 118
# Checkout custom branch where working changes are
# TODO - REMOVE ONCE MERGED INTO HAERO
git checkout origin/deception-testing-ci || exit

# Need to modify .gitmodules file before cloning
perl -i -p -e 's|git@(.*?):|https://\1/|g' .gitmodules || exit
# Update just haero submodules
git submodule update --init || exit

# Go through and repeat the process for submodules with submodules
declare -a arr=("ekat" "skywalker")
for subm in "${arr[@]}"
do
pushd ./ext/$subm
perl -i -p -e 's|git@(.*?):|https://\1/|g' .gitmodules || exit
git submodule update --init || exit
popd
done
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeff-cohere here is the block where I take care of HAERO submodules. There is no issue on this side of things, as this recursively initializes each submodule that matters in HAERO. I would use --recursive when initializing submodules, but this was not feasible as we cannot use SSH based submodules.

Instead, I manually use Perl to in place modify .gitmodules where it is necessary, and use that as a workaround.

Comment on lines +53 to +54
perl -i -p -e 's|git@(.*?):|https://\1/|g' .gitmodules || exit
git submodule update --init || exit
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly for HAERO I have to modify mam4xx's .gitmodules file.

@cameronrutherford
Copy link
Collaborator Author

You might have to add the --recursive flag to your git submodule update --init command. The ekat submodule contains a kokkos submodule that has the nvcc_wrapper script that your error is complaining about.

I set CMAKE_C_COMPILER and CMAKE_CXX_COMPILER to the full path of GCC/G++ that I am using when invoking CMake. mam4xx seems to then override that and use the HAERO nvcc wrapper, but fails to configure successfully when using that as the compiler.

@jeff-cohere
Copy link
Collaborator

I set CMAKE_C_COMPILER and CMAKE_CXX_COMPILER to the full path of GCC/G++ that I am using when invoking CMake. mam4xx seems to then override that and use the HAERO nvcc wrapper, but fails to configure successfully when using that as the compiler.

It is true that mam4xx overrides the compilers using Haero's settings. This is another Kokkos thing. :-) It was fun trying to get this set up in the first place.

@cameronrutherford
Copy link
Collaborator Author

cameronrutherford commented Dec 2, 2022

I set CMAKE_C_COMPILER and CMAKE_CXX_COMPILER to the full path of GCC/G++ that I am using when invoking CMake. mam4xx seems to then override that and use the HAERO nvcc wrapper, but fails to configure successfully when using that as the compiler.

It is true that mam4xx overrides the compilers using Haero's settings. This is another Kokkos thing. :-) It was fun trying to get this set up in the first place.

Thinking about this more, it seems like the nvcc wrapper that mam4xx is pointing to is located in the haero source directory, and not in the installed configuration. This leads me to believe that when the HAERO CI install is completed, the old source files are deleted, and so the pointer to this file location is invalid, as nothing is there.

Is there a way to patch this in HAERO? Do I have to now also keep a copy of the HAERO source directory that I used to install around?

I'm also noticing that the error is with the kokkos submodule within HAERO - perhaps we need to just have an externally installed Kokkos that we can keep around that is outside of HAERO, and not using the submodule?

@jeff-cohere
Copy link
Collaborator

jeff-cohere commented Dec 2, 2022

Thinking about this more, it seems like the nvcc wrapper that mam4xx is pointing to is located in the haero source directory, and not in the installed configuration. This leads me to believe that when the HAERO CI install is completed, the old source files are deleted, and so the pointer to this file location is invalid, as nothing is there.

Is there a way to patch this in HAERO? Do I have to now also keep a copy of the HAERO source directory that I used to install around?

That is a good idea. We already install nvcc_wrapper to ${CMAKE_INSTALL_PREFIX}/bin, but we should find a way to refer to it there and not in the Haero source directory.

Can you create an issue in the Haero repo for this?

I'm also noticing that the error is with the kokkos submodule within HAERO - perhaps we need to just have an externally installed Kokkos that we can keep around that is outside of HAERO, and not using the submodule?

The problem with this approach is that the version of Kokkos we use is a custom branch linked to EKAT. If we used an external version, we'd just be kicking the can down the road to address this problem when we integrate with EAMxx, which also relies on EKAT's version.

Sorry about the complicated environment! Most of these decisions were made for us by the EAMxx project (which itself is no picnic to build and run). We're in a tough spot where we have to work with what we have and not try to be too ambitious, because our focus in the near term has to be on porting the parameterizations. Believe me, if we were doing this from scratch, a lot of things would look different if I had any say in it.

@cameronrutherford
Copy link
Collaborator Author

Single precision tests are failing in Debug, but passing in Release mode configuration.

If we are happy with the infrastructure, then we can merge.

Otherwise, we have to also debug the failing test before merging.

@cameronrutherford cameronrutherford changed the title Draft: Add pnnl-ci script with hello world output. Add pnnl-ci script with hello world output. Dec 20, 2022
@cameronrutherford cameronrutherford changed the title Add pnnl-ci script with hello world output. Add pnnl-ci for GPU testing Dec 20, 2022
@cameronrutherford cameronrutherford changed the title Add pnnl-ci for GPU testing Add PNNL CI w/ GitLab for GPU testing Dec 20, 2022
@jeff-cohere
Copy link
Collaborator

I am still unable to log into the PNNL GitLab setup (I have an email to the support folks there). @pressel , do you want to take a look at it and verify that there's a test failure in the single-precision Debug build?

Is anyone at Sandia able to connect to code.pnnl.gov?

@jeff-cohere
Copy link
Collaborator

jeff-cohere commented Dec 21, 2022

Okay, I got access to the PNNL CI pipeline. Here's the error (with CI kibble removed):

...
/people/svceagles/gitlab/3430/haero_Debug_single/src/tests/conversions_unit_tests.cpp:23: FAILED:
due to unexpected exception with message:
/people/svceagles/gitlab/3430/haero_Debug_single/src/tests/atmosphere_utils.cpp:50: FAIL: FloatingPoint<Real>::equiv( psum, p0, std::numeric_limits<float>::epsilon())
...

Evidently in this environment, psum and p0 are not equivalent within machine precision in a single-precision build. @pbosler and @jaelynlitz, I don't know if this rings any bells, but maybe we need to loosen the tolerance or chase down something to which the CI environment has pronounced sensitivity?

@jaelynlitz
Copy link
Collaborator

@jeff-cohere I can test it out on Deception by hand and see how different psum and p0 are

@jeff-cohere
Copy link
Collaborator

Thanks!

@jaelynlitz
Copy link
Collaborator

This is what the atmosphere_utils test is putting out in single precision GPU Debug mode:
psum = 99999.992188 p0 = 100000.000000
And the tolerance needed for the test to pass:
tol = 65550 * std::numeric_limits<float>::epsilon(); epsilon = 0.000000 tol = 0.007814

@jeff-cohere
Copy link
Collaborator

Thanks, @jaelynlitz . @pbosler , this tolerance is evidently not sufficient for the compilers that Deception is using. I don't have access to my GPU machine today (Seattle is iced over), but if we can reproduce this issue on a single-precision GPU build, we may need to make this a little more permissive.

@jeff-cohere
Copy link
Collaborator

I can't seem to reproduce this failure on my GPU-equipped machine (building Haero and mam4xx in single precision using CUDA). Maybe we should merge this PR and create an issue to track this particular error on Deception. Let's discuss in the new year.

Copy link
Contributor

@pressel pressel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cameronrutherford Thanks for your efforts on this. I'm fine with merging this PR and creating an issue for the single precision issues on deception as @jeff-cohere suggested above.

@jeff-cohere
Copy link
Collaborator

I've logged this in #93.

@jeff-cohere jeff-cohere merged commit 92f53cc into main Jan 3, 2023
@jeff-cohere jeff-cohere deleted the pnnl-ci branch January 3, 2023 20:22
@jeff-cohere jeff-cohere mentioned this pull request Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants