Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update cuda arch and change cuda version #5473

Conversation

aurianer
Copy link
Contributor

@aurianer aurianer commented Jul 28, 2021

- Downgrade the cuda version on rostam to avoid the errors currently on master
https://cdash.cscs.ch/viewBuildError.php?buildid=172746

  • Update at the same time the cuda architecture as the sc15 nodes seems to have V100 now

Created this ticket #5472 to fix the problem with cuda/11.3

EDIT:

  • Workaround another bug in using noexcept with variadic templates in CUDA 11.4, @msimberg confirmed that it compiles with CUDA 11.3
  • Update at the same time the cuda architecture as the sc15 nodes seems to have V100 now
  • Pin CUDA version to 11.4 to avoid breaking master after a rostam default cuda package update
  • Workaround bug in mismatch between pointers type which appeared with CUDA 11.3 and CUDA 11.4 but not in older versions

@aurianer
Copy link
Contributor Author

Looks like it only solves the gcc-9-cuda-11-release build

@aurianer aurianer force-pushed the update_cuda_arch_and_change_cuda_version branch 2 times, most recently from eb67577 to 2d6294e Compare July 29, 2021 16:21
hkaiser
hkaiser previously approved these changes Jul 29, 2021
Copy link
Member

@hkaiser hkaiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@aurianer
Copy link
Contributor Author

retest lsu

@aurianer aurianer force-pushed the update_cuda_arch_and_change_cuda_version branch 2 times, most recently from 3d4627a to 3e081ef Compare July 30, 2021 15:21
@msimberg
Copy link
Contributor

msimberg commented Aug 6, 2021

retest lsu

@msimberg msimberg force-pushed the update_cuda_arch_and_change_cuda_version branch from 3e081ef to 90c298c Compare August 6, 2021 11:41
@aurianer aurianer force-pushed the update_cuda_arch_and_change_cuda_version branch from 90c298c to 2a0cf4c Compare August 6, 2021 16:45
@msimberg msimberg force-pushed the update_cuda_arch_and_change_cuda_version branch 2 times, most recently from eeabcb3 to 88acaf8 Compare August 11, 2021 13:36
@aurianer aurianer force-pushed the update_cuda_arch_and_change_cuda_version branch 2 times, most recently from 04776a9 to 53d0285 Compare August 26, 2021 14:57
@msimberg
Copy link
Contributor

retest lsu

@msimberg
Copy link
Contributor

Thanks @aurianer for updating this! It looks good except that there were odd failures on the release build with nvcc (executables not found even though they were built). My guess is that it was unfortunate timing with the maintenance on rostam, but let's see. I've retriggered the builds on rostam.

hkaiser
hkaiser previously approved these changes Aug 31, 2021
Copy link
Member

@hkaiser hkaiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@aurianer
Copy link
Contributor Author

aurianer commented Sep 1, 2021

Still have to investigate a compilation error which triggers the CI error (executables do not exist) so not ready to merge

@msimberg msimberg marked this pull request as draft September 1, 2021 08:49
@aurianer aurianer force-pushed the update_cuda_arch_and_change_cuda_version branch from 53d0285 to b3cd2bf Compare September 3, 2021 09:39
@aurianer aurianer marked this pull request as ready for review September 3, 2021 11:31
Removing the variadic ... of the tag_override_dispatch makes it
compile.
The version change should be done manually in a PR to address all
the problems and not by default when the default module on rostam
is upgraded.
@aurianer aurianer force-pushed the update_cuda_arch_and_change_cuda_version branch 3 times, most recently from ae487e5 to cbbc318 Compare September 7, 2021 18:23
Mismatch of pointers between __global__ and __host__ __device__.
@aurianer aurianer force-pushed the update_cuda_arch_and_change_cuda_version branch from cbbc318 to d5c778b Compare September 8, 2021 08:25
@msimberg
Copy link
Contributor

msimberg commented Sep 8, 2021

retest lsu

Copy link
Contributor

@msimberg msimberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, and works nicely on rostam!

I'd just like to wait for the CUDA configuration on CSCS to verify that it doesn't break with older CUDA versions. Piz Daint maintenance is almost over and should be available again later today, but it may first require changes to the CUDA configuration on Piz Daint.

@msimberg
Copy link
Contributor

retest cscs

@msimberg msimberg merged commit c3e47ff into STEllAR-GROUP:master Sep 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants