Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute capability fix #2996

Merged
merged 4 commits into from Aug 15, 2020
Merged

Conversation

umar456
Copy link
Member

@umar456 umar456 commented Aug 15, 2020

Fixes checkAndSet

Description

Fixes an issue where the device compute capability is larger than
the supported maximum of the CUDA runtime used to build ArrayFire.
This happens for example when you run the Turing card with a CUDA
runtime of 9.0. The compute capability of Turing is 7.5 and the
maximum supported by the runtime is 7.0/7.2. Before this change
we were only checking the major compute capability and not checking
the minor version to set the max compute capability of the device.
This caused errors like:

In file src/backend/cuda/compile_module.cpp:266
NVRTC Error(5): NVRTC_ERROR_INVALID_OPTION
Log:
nvrtc: error: invalid value for --gpu-architecture (-arch)

This PR also updates the error messages for failure cases.

The utility header in cuda_fp16.hpp is not included automatically
in CUDA 9. Additionally we need to pass the
--device-as-default-execution-space flag to nvrtc for JIT and
non-JIT kernels

  • The moduleKey is an size_t object so the maximum number of digits
    it can have is 20 so the format length for that value is updated

  • The runtime check messages are always logged (but not displayed)
    Errors are still only thrown in debug modes

  • Display the compute capability of the CUDA device along with
    its name and other stats

    example:

  Found device: Quadro T2000 (sm_75) (3.82 GB | ~3164.06 GFLOPs | 16 SMs)

Changes to Users

Better error messages and better support for newer devices with older CUDA toolkits

Checklist

  • Rebased on latest master
  • Code compiles
  • Tests pass
  • [ ] Functions added to unified API
  • [ ] Functions documented

Fixes an issue where the device compute capability is larger than
the supported maximum of the CUDA runtime used to build ArrayFire.
This happens for example when you run the Turing card with a CUDA
runtime of 9.0. The compute capability of Turing is 7.5 and the
maximum supported by the runtime is 7.0/7.2. Before this change
we were only checking the major compute capability and not checking
the minor version to set the max compute capability of the device.
This caused errors like:

In file src/backend/cuda/compile_module.cpp:266
NVRTC Error(5): NVRTC_ERROR_INVALID_OPTION
Log:
nvrtc: error: invalid value for --gpu-architecture (-arch)

This commit also updates the error messages for failure cases.
The utility header in cuda_fp16.hpp is not included automatically
in CUDA 9. Additionally we need to pass the
--device-as-default-execution-space flag to nvrtc for JIT and
non-JIT kernels
* The moduleKey is an size_t object so the maximum number of digits
  it can have is 20 so the format length for that value is updated
* The runtime check messages are always logged (but not displayed)
  Errors are still only thrown in debug modes
* Display the compute capability of the CUDA device along with
  its name and other stats

  example:
  Found device: Quadro T2000 (sm_75) (3.82 GB | ~3164.06 GFLOPs | 16 SMs)
@umar456 umar456 added the build label Aug 15, 2020
@umar456 umar456 added this to the 3.7.3 milestone Aug 15, 2020
src/backend/cuda/compile_module.cpp Show resolved Hide resolved
src/backend/cuda/compile_module.cpp Show resolved Hide resolved
src/backend/cuda/compile_module.cpp Show resolved Hide resolved
src/backend/common/half.hpp Show resolved Hide resolved
src/backend/cuda/device_manager.cpp Show resolved Hide resolved
Copy link
Member

@9prady9 9prady9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like the commit messages 👍

@umar456 umar456 merged commit e62aab0 into arrayfire:master Aug 15, 2020
@tvandera
Copy link

tvandera commented Mar 2, 2021

Hi,

I'm having a similar issue on this configuration:

ArrayFire v3.7.3 (CUDA, 64-bit Linux, build 59ac7b980)
Platform: CUDA Runtime 10.1, Driver: 460.32.03
[0] A100-SXM4-40GB, 40537 MB, CUDA Compute 8.0

This is the error:

In function cuda::Module common::compileModule(const string&, const std::vector<std::__cxx11::basic_string<char> >&, const std::vector<std::__cxx11::basic_string<char
> >&, const std::vector<std::__cxx11::basic_string<char> >&, bool)
In file src/backend/cuda/compile_module.cpp:277
NVRTC Error(5): NVRTC_ERROR_INVALID_OPTION
Log:
nvrtc: error: invalid value for --gpu-architecture (-arch)

I do not understand what I should do to fix this. Maybe I should upgrade to CUDA 11?

@9prady9
Copy link
Member

9prady9 commented Mar 2, 2021

@tvandera That is expected outcome given that v3.7.3 installers aren't built with CUDA 11 support. Please check the latest 3.8 release which has CUDA 11 support.

https://arrayfire.com/arrayfire-v3-8-release/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants