BUG: No GPU Support for Modern CUDA #3251

alexk101 · 2023-09-08T06:04:34Z

Issue Description

This is somewhat related to a number of GPU issues people have had, and probably more directly to #3150, but as is, the GPU kernels for this project do not compile with modern versions of CUDA. From my understanding this should be all versions >11.7, as that was when the sm_37 compute capability was deprecated from CUDA, which is required for building the kernels, as referenced here

shap/setup.py

Lines 77 to 78 in 2262893

    
           "-arch=sm_37 " 
        
           "-gencode=arch=compute_37,code=sm_37 "

I think that there are a couple of ways that this could be approached. The first, I would say is a short term solution, and that is capping the maximum allowed CUDA version to 11.7. I don't have a machine to test this, but from my understanding of this error, that should fix it.

The second approach, which is more sustainable in the long term, but will require more work, is removing support for sm_37 (which is the kepler architecture btw). Removing the previously referenced lines would be the first step, but as I have found will still require some changes to the kernels themselves for CUDA 12 and Thrust 2.0.0, as I get compile errors related to __device__-only lambda’s return type being queried from host code here

shap/shap/cext/gpu_treeshap.h

Line 292 in 2262893

__device__ void Extend() {

I don't know which approach the maintainers would like to take, but it would be nice to have shap on modern versions of CUDA. I am willing to help with that, but I must confess that my experience with c++ is fairly limited, so I would need some help from someone that is more informed. Familiarity with those kernels would also probably be helpful.

Additionally, I think that CUDA/GPU support should be a required parameter provided by the user when building, since as it is now, the setup simply has with_cuda hardcoded to True. This way, if the cuda build fails, the entire build can fail, though this would have to be checked for in this function by monitoring stderr

shap/setup.py

Line 67 in 2262893

def compile_cuda_module(host_args):

As it stands, even if the CUDA build fails, the setup will still continue. I didn't notice the CUDA compile errors until later because they are never propagated in the logs and just sit at the top of the install log.

Thanks, Alex

Minimal Reproducible Example

git clone https://github.com/shap/shap.git
cd shap
python setup.py install

Traceback

Modern CUDA compile error snippet:

NVCC ==>  /opt/cuda/bin/nvcc
Compiling cuda extension, calling nvcc with arguments:
['/opt/cuda/bin/nvcc', '-allow-unsupported-compiler', 'shap/cext/_cext_gpu.cu', '-lib', '-o', 'build/lib_cext_gpu.a', '-Xcompiler', '-fPIC', '--include-path', '/home/alexk101/.mambaforge/envs/general/include/python3.10', '--std', 'c++14', '--expt-extended-lambda', '--expt-relaxed-constexpr', '-arch=sm_37', '-gencode=arch=compute_37,code=sm_37', '-gencode=arch=compute_70,code=sm_70', '-gencode=arch=compute_75,code=sm_75', '-gencode=arch=compute_75,code=compute_75']
nvcc fatal   : Value 'sm_37' is not defined for option 'gpu-architecture'
Exception occurred during setup, Error building cuda module: CalledProcessError(1, ['/opt/cuda/bin/nvcc', '-allow-unsupported-compiler', 'shap/cext/_cext_gpu.cu', '-lib', '-o', 'build/lib_cext_gpu.a', '-Xcompiler', '-fPIC', '--include-path', '/home/alexk101/.mambaforge/envs/general/include/python3.10', '--std', 'c++14', '--expt-extended-lambda', '--expt-relaxed-constexpr', '-arch=sm_37', '-gencode=arch=compute_37,code=sm_37', '-gencode=arch=compute_70,code=sm_70', '-gencode=arch=compute_75,code=sm_75', '-gencode=arch=compute_75,code=compute_75'])
WARNING: Could not compile cuda extensions.

Compile errors with old architecture removed:

NVCC ==>  /opt/cuda/bin/nvcc
Compiling cuda extension, calling nvcc with arguments:
['/opt/cuda/bin/nvcc', '-allow-unsupported-compiler', 'shap/cext/_cext_gpu.cu', '-lib', '-o', 'build/lib_cext_gpu.a', '-Xcompiler', '-fPIC', '--include-path', '/home/alexk101/.mambaforge/envs/general/include/python3.10', '--std', 'c++14', '--expt-extended-lambda', '--expt-relaxed-constexpr', '-arch=sm_52', '-gencode=arch=compute_70,code=sm_70', '-gencode=arch=compute_75,code=sm_75', '-gencode=arch=compute_75,code=compute_75']
/opt/cuda/bin/../targets/x86_64-linux/include/cuda/std/detail/libcxx/include/__functional/invoke.h: In instantiation of ‘struct cuda::std::__4::__invoke_of<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(thrust::device_vector<gpu_treeshap::PathElement<ShapSplitCondition>, thrust::device_allocator<gpu_treeshap::PathElement<ShapSplitCondition> > >*, thrust::device_vector<gpu_treeshap::PathElement<ShapSplitCondition>, thrust::device_allocator<gpu_treeshap::PathElement<ShapSplitCondition> > >*), gpu_treeshap::detail::DeduplicatePaths<thrust::device_vector<gpu_treeshap::PathElement<ShapSplitCondition>, thrust::device_allocator<gpu_treeshap::PathElement<ShapSplitCondition> > >, thrust::device_allocator<int>, ShapSplitCondition>, 2> >, gpu_treeshap::PathElement<ShapSplitCondition>, gpu_treeshap::PathElement<ShapSplitCondition> >’:
/opt/cuda/bin/../targets/x86_64-linux/include/cuda/std/detail/libcxx/include/__type_traits/result_of.h:31:48:   required from ‘class cuda::std::__4::result_of<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(thrust::device_vector<gpu_treeshap::PathElement<ShapSplitCondition>, thrust::device_allocator<gpu_treeshap::PathElement<ShapSplitCondition> > >*, thrust::device_vector<gpu_treeshap::PathElement<ShapSplitCondition>, thrust::device_allocator<gpu_treeshap::PathElement<ShapSplitCondition> > >*), gpu_treeshap::detail::DeduplicatePaths<thrust::device_vector<gpu_treeshap::PathElement<ShapSplitCondition>, thrust::device_allocator<gpu_treeshap::PathElement<ShapSplitCondition> > >, thrust::device_allocator<int>, ShapSplitCondition>, 2> >(gpu_treeshap::PathElement<ShapSplitCondition>, gpu_treeshap::PathElement<ShapSplitCondition>)>’
/opt/cuda/bin/../targets/x86_64-linux/include/cub/detail/type_traits.cuh:53:61:   required by substitution of ‘template<class Invokable, class InitT, class InputT> using accumulator_t = typename cuda::std::__4::decay<typename cuda::std::__4::result_of<Invokable(InitT, InputT)>::type>::type [with Invokable = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(thrust::device_vector<gpu_treeshap::PathElement<ShapSplitCondition>, thrust::device_allocator<gpu_treeshap::PathElement<ShapSplitCondition> > >*, thrust::device_vector<gpu_treeshap::PathElement<ShapSplitCondition>, thrust::device_allocator<gpu_treeshap::PathElement<ShapSplitCondition> > >*), gpu_treeshap::detail::DeduplicatePaths<thrust::device_vector<gpu_treeshap::PathElement<ShapSplitCondition>, thrust::device_allocator<gpu_treeshap::PathElement<ShapSplitCondition> > >, thrust::device_allocator<int>, ShapSplitCondition>, 2> >; InitT = gpu_treeshap::PathElement<ShapSplitCondition>; InputT = gpu_treeshap::PathElement<ShapSplitCondition>]’
/tmp/tmpxft_000018ed_00000000-6__cext_gpu.compute_75.cudafe1.stub.c:217:849:   required from here
nvcc_internal_extended_lambda_implementation:293:146: error: static assertion failed: Attempt to use an extended __device__ lambda in a context that requires querying its return type in host code. Use a named function object, a __host__ __device__ lambda, or cuda::proclaim_return_type instead.
nvcc_internal_extended_lambda_implementation:293:146: note: ‘!(bool)__nv_extended_device_lambda_trait_helper<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(thrust::device_vector<gpu_treeshap::PathElement<ShapSplitCondition>, thrust::device_allocator<gpu_treeshap::PathElement<ShapSplitCondition> > >*, thrust::device_vector<gpu_treeshap::PathElement<ShapSplitCondition>, thrust::device_allocator<gpu_treeshap::PathElement<ShapSplitCondition> > >*), gpu_treeshap::detail::DeduplicatePaths<thrust::device_vector<gpu_treeshap::PathElement<ShapSplitCondition>, thrust::device_allocator<gpu_treeshap::PathElement<ShapSplitCondition> > >, thrust::device_allocator<int>, ShapSplitCondition>, 2> > >::value’ evaluates to false
Exception occurred during setup, Error building cuda module: CalledProcessError(1, ['/opt/cuda/bin/nvcc', '-allow-unsupported-compiler', 'shap/cext/_cext_gpu.cu', '-lib', '-o', 'build/lib_cext_gpu.a', '-Xcompiler', '-fPIC', '--include-path', '/home/alexk101/.mambaforge/envs/general/include/python3.10', '--std', 'c++14', '--expt-extended-lambda', '--expt-relaxed-constexpr', '-arch=sm_52', '-gencode=arch=compute_70,code=sm_70', '-gencode=arch=compute_75,code=sm_75', '-gencode=arch=compute_75,code=compute_75'])
WARNING: Could not compile cuda extensions.

Expected Behavior

NVCC compiles the kernels and GPUTreeExplainer works then.

Bug report checklist

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest release of shap.
I have confirmed this bug exists on the master branch of shap.
I'd be interested in making a PR to fix this bug

Installed Versions

NVCC/CUDA=release 12.2, V12.2.91
shap=0.42.1

The text was updated successfully, but these errors were encountered:

alexk101 · 2023-09-12T17:54:40Z

I have done some more research into this, and found that the GPU version of shap is hosted as an entirely seperate project within nvidia's rapids ecosystem. This seems to be actively maintained and using the header file from here fixes all compilation issues.

https://github.com/rapidsai/gputreeshap/blob/854070d9fcc2e99ab725175abbcb58c4560b8dce/GPUTreeShap/gpu_treeshap.h

Removing the support for deprecated architectures and updating the header file is all that is required. I can open a pull request with these changes, but would appreciate some feedback from a maintainer on if they think that this is reasonable.

joaomh · 2023-10-04T10:26:12Z

I have the same issue here, every time I try to run shap.explainers.GPUTree a get an error that

cuda extension was not built during install

So I check the setup.py install

Compiling cuda extension, calling nvcc with arguments:
['/usr/local/cuda-12/bin/nvcc', '-allow-unsupported-compiler', 'shap/cext/_cext_gpu.cu', '-lib', '-o', 'build/lib_cext_gpu.a', '-Xcompiler', '-fPIC', '--include-path', '/home/joao/anaconda3/include/python3.11', '--std', 'c++14', '--expt-extended-lambda', '--expt-relaxed-constexpr', '-arch=sm_37', '-gencode=arch=compute_37,code=sm_37', '-gencode=arch=compute_70,code=sm_70', '-gencode=arch=compute_75,code=sm_75', '-gencode=arch=compute_75,code=compute_75']
nvcc fatal   : Value 'sm_37' is not defined for option 'gpu-architecture'
Exception occurred during setup, Error building cuda module: CalledProcessError(1, ['/usr/local/cuda-12/bin/nvcc', '-allow-unsupported-compiler', 'shap/cext/_cext_gpu.cu', '-lib', '-o', 'build/lib_cext_gpu.a', '-Xcompiler', '-fPIC', '--include-path', '/home/joao/anaconda3/include/python3.11', '--std', 'c++14', '--expt-extended-lambda', '--expt-relaxed-constexpr', '-arch=sm_37', '-gencode=arch=compute_37,code=sm_37', '-gencode=arch=compute_70,code=sm_70', '-gencode=arch=compute_75,code=sm_75', '-gencode=arch=compute_75,code=compute_75'])
WARNING: Could not compile cuda extensions.

NVCC/CUDA=release 12
shap=0.42.1
GPU: RTX 4090

alexk101 · 2023-10-04T20:41:13Z

@joaomh If you replace shap/shap/cext/gpu_treeshap.h with the version I mentioned in #3251 (comment) and remove lines 77-78

shap/setup.py

Lines 77 to 78 in 2262893

    
           "-arch=sm_37 " 
        
           "-gencode=arch=compute_37,code=sm_37 "

from shap/setup.py, it should compile. I don't know why any of the maintainers haven't responded to this issue, since it doesn't seem to be a very difficult fix and provides a big performance benefit, but I suppose that is the nature of large open source projects.

CloseChoice · 2023-10-05T14:18:26Z

@alexk101 Thanks for reporting and looking into this so deeply. I implemented your suggestions in the linked PR, I can run code similar to the GPUTree notebook with with CUDA 12.2. Seems like this is the way to go.

connortann · 2024-02-20T14:26:04Z

Closing as I believe this is addressed by #3462

alexk101 added the bug Indicates an unexpected problem or unintended behaviour label Sep 8, 2023

CloseChoice mentioned this issue Oct 5, 2023

WIP: ENH: support cuda 12 #3311

Open

2 tasks

CloseChoice mentioned this issue Feb 5, 2024

Fixes for XGBoost model load and GPU build. #3462

Merged

2 tasks

connortann closed this as completed Feb 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: No GPU Support for Modern CUDA #3251

BUG: No GPU Support for Modern CUDA #3251

alexk101 commented Sep 8, 2023

alexk101 commented Sep 12, 2023

joaomh commented Oct 4, 2023

alexk101 commented Oct 4, 2023

CloseChoice commented Oct 5, 2023

connortann commented Feb 20, 2024

BUG: No GPU Support for Modern CUDA #3251

BUG: No GPU Support for Modern CUDA #3251

Comments

alexk101 commented Sep 8, 2023

Issue Description

Minimal Reproducible Example

Traceback

Expected Behavior

Bug report checklist

Installed Versions

alexk101 commented Sep 12, 2023

joaomh commented Oct 4, 2023

alexk101 commented Oct 4, 2023

CloseChoice commented Oct 5, 2023

connortann commented Feb 20, 2024