Skip to content

Commit

Permalink
Merge #6248
Browse files Browse the repository at this point in the history
6248: Fix CUDA/HIP Jenkins pipelines  r=hkaiser a=G-071

This PR aims to fix the unstable CUDA/HIP tests.

Changes so far:
- Fixes CUDA architectures for Jenkins nodes
- Exclude faulty Jenkins node for now (bahram: listed as V100 GPU node but does not list any GPUs according to nvidia-smi)
- Allow Jenkins HIP builds with warnings (in order to run tests despite of the warnings, just like the CUDA tests already do)
- I further looked into #5799 again to see if the new compilers and cuda versions made any difference. Unfortunately not, using gcc/12 and cuda/12 compilation still fails with:
`'cudafe++' died due to signal 11 (Invalid memory reference)` 
I adapted the guards in the failing examples accordingly (to also skip the tests with newer cuda versions)

The PR is a bit of a work-in-progress as I imagine it will take a few iterations to get everything right. That being said, let's see if the current changes already help and trigger the Jenkins tests!

Co-authored-by: Gregor Daiss <Gregor.Daiss+git@gmail.com>
  • Loading branch information
StellarBot and G-071 committed Jun 1, 2023
2 parents d40564d + 6e28071 commit 3d95d66
Show file tree
Hide file tree
Showing 7 changed files with 12 additions and 7 deletions.
2 changes: 1 addition & 1 deletion .jenkins/lsu/batch.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ ulimit -l unlimited

set +e
ctest \
--verbose \
${ctest_extra_args} \
-S ${src_dir}/.jenkins/lsu/ctest.cmake \
-DCTEST_CONFIGURE_EXTRA_OPTIONS="${configure_extra_options}" \
-DCTEST_BUILD_CONFIGURATION_NAME="${configuration_name_with_build_type}" \
Expand Down
1 change: 1 addition & 0 deletions .jenkins/lsu/entry.sh
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ sbatch \
--job-name="${job_name}" \
--nodes="${configuration_slurm_num_nodes}" \
--partition="${configuration_slurm_partition}" \
--exclude="bahram" \
--time="03:00:00" \
--output="jenkins-hpx-${configuration_name_with_build_type}.out" \
--error="jenkins-hpx-${configuration_name_with_build_type}.err" \
Expand Down
2 changes: 2 additions & 0 deletions .jenkins/lsu/env-common.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,7 @@ if [ "${build_type}" = "Debug" ]; then
configure_extra_options+=" -DLCI_DEBUG=ON"
fi

ctest_extra_args+=" --verbose "

hostname
module avail
2 changes: 1 addition & 1 deletion .jenkins/lsu/env-gcc-10-cuda-11.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,4 @@ configure_extra_options+=" -DHPX_WITH_CUDA=ON"
configure_extra_options+=" -DHPX_WITH_NETWORKING=OFF"
configure_extra_options+=" -DHPX_WITH_DISTRIBUTED_RUNTIME=OFF"
configure_extra_options+=" -DHPX_WITH_ASYNC_MPI=ON"
configure_extra_options+=" -DCMAKE_CUDA_ARCHITECTURES='37;70'"
configure_extra_options+=" -DCMAKE_CUDA_ARCHITECTURES='70;80'"
4 changes: 3 additions & 1 deletion .jenkins/lsu/env-hipcc.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,6 @@ configure_extra_options+=" -DHPX_WITH_FETCH_ASIO=ON"
configure_extra_options+=" -DHPX_WITH_MAX_CPU_COUNT=128"
configure_extra_options+=" -DHPX_WITH_DEPRECATION_WARNINGS=OFF"
configure_extra_options+=" -DHPX_WITH_COMPILER_WARNINGS=ON"
configure_extra_options+=" -DHPX_WITH_COMPILER_WARNINGS_AS_ERRORS=ON"
configure_extra_options+=" -DHPX_WITH_COMPILER_WARNINGS_AS_ERRORS=OFF"

ctest_extra_args+=" -E tests.unit.modules.algorithms.detail "
4 changes: 2 additions & 2 deletions libs/core/async_cuda/tests/performance/synchronize.cu
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@

#include <hpx/config.hpp>

// NVCC fails unceremoniously with this test at least until V11.5
#if !defined(HPX_CUDA_VERSION) || (HPX_CUDA_VERSION > 1105)
// NVCC fails unceremoniously with this test at least until V12.1
#if !defined(HPX_CUDA_VERSION) || (HPX_CUDA_VERSION > 1201)

#include <hpx/chrono.hpp>
#include <hpx/execution.hpp>
Expand Down
4 changes: 2 additions & 2 deletions libs/core/async_cuda/tests/unit/transform_stream.cu
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@

#include <hpx/config.hpp>

// NVCC fails unceremoniously with this test at least until V11.5
#if !defined(HPX_CUDA_VERSION) || (HPX_CUDA_VERSION > 1105)
// NVCC fails unceremoniously with this test at least until V12.1
#if !defined(HPX_CUDA_VERSION) || (HPX_CUDA_VERSION > 1201)

#include <hpx/execution.hpp>
#include <hpx/init.hpp>
Expand Down

0 comments on commit 3d95d66

Please sign in to comment.