Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: No GPU Support for Modern CUDA #3251

Closed
4 tasks done
alexk101 opened this issue Sep 8, 2023 · 5 comments · May be fixed by #3311
Closed
4 tasks done

BUG: No GPU Support for Modern CUDA #3251

alexk101 opened this issue Sep 8, 2023 · 5 comments · May be fixed by #3311
Labels
bug Indicates an unexpected problem or unintended behaviour

Comments

@alexk101
Copy link

alexk101 commented Sep 8, 2023

Issue Description

This is somewhat related to a number of GPU issues people have had, and probably more directly to #3150, but as is, the GPU kernels for this project do not compile with modern versions of CUDA. From my understanding this should be all versions >11.7, as that was when the sm_37 compute capability was deprecated from CUDA, which is required for building the kernels, as referenced here

shap/setup.py

Lines 77 to 78 in 2262893

"-arch=sm_37 "
"-gencode=arch=compute_37,code=sm_37 "

I think that there are a couple of ways that this could be approached. The first, I would say is a short term solution, and that is capping the maximum allowed CUDA version to 11.7. I don't have a machine to test this, but from my understanding of this error, that should fix it.

The second approach, which is more sustainable in the long term, but will require more work, is removing support for sm_37 (which is the kepler architecture btw). Removing the previously referenced lines would be the first step, but as I have found will still require some changes to the kernels themselves for CUDA 12 and Thrust 2.0.0, as I get compile errors related to __device__-only lambda’s return type being queried from host code here

__device__ void Extend() {

I don't know which approach the maintainers would like to take, but it would be nice to have shap on modern versions of CUDA. I am willing to help with that, but I must confess that my experience with c++ is fairly limited, so I would need some help from someone that is more informed. Familiarity with those kernels would also probably be helpful.

Additionally, I think that CUDA/GPU support should be a required parameter provided by the user when building, since as it is now, the setup simply has with_cuda hardcoded to True. This way, if the cuda build fails, the entire build can fail, though this would have to be checked for in this function by monitoring stderr

shap/setup.py

Line 67 in 2262893

def compile_cuda_module(host_args):

As it stands, even if the CUDA build fails, the setup will still continue. I didn't notice the CUDA compile errors until later because they are never propagated in the logs and just sit at the top of the install log.

Thanks, Alex

Minimal Reproducible Example

git clone https://github.com/shap/shap.git
cd shap
python setup.py install

Traceback

Modern CUDA compile error snippet:

NVCC ==>  /opt/cuda/bin/nvcc
Compiling cuda extension, calling nvcc with arguments:
['/opt/cuda/bin/nvcc', '-allow-unsupported-compiler', 'shap/cext/_cext_gpu.cu', '-lib', '-o', 'build/lib_cext_gpu.a', '-Xcompiler', '-fPIC', '--include-path', '/home/alexk101/.mambaforge/envs/general/include/python3.10', '--std', 'c++14', '--expt-extended-lambda', '--expt-relaxed-constexpr', '-arch=sm_37', '-gencode=arch=compute_37,code=sm_37', '-gencode=arch=compute_70,code=sm_70', '-gencode=arch=compute_75,code=sm_75', '-gencode=arch=compute_75,code=compute_75']
nvcc fatal   : Value 'sm_37' is not defined for option 'gpu-architecture'
Exception occurred during setup, Error building cuda module: CalledProcessError(1, ['/opt/cuda/bin/nvcc', '-allow-unsupported-compiler', 'shap/cext/_cext_gpu.cu', '-lib', '-o', 'build/lib_cext_gpu.a', '-Xcompiler', '-fPIC', '--include-path', '/home/alexk101/.mambaforge/envs/general/include/python3.10', '--std', 'c++14', '--expt-extended-lambda', '--expt-relaxed-constexpr', '-arch=sm_37', '-gencode=arch=compute_37,code=sm_37', '-gencode=arch=compute_70,code=sm_70', '-gencode=arch=compute_75,code=sm_75', '-gencode=arch=compute_75,code=compute_75'])
WARNING: Could not compile cuda extensions.

Compile errors with old architecture removed:

NVCC ==>  /opt/cuda/bin/nvcc
Compiling cuda extension, calling nvcc with arguments:
['/opt/cuda/bin/nvcc', '-allow-unsupported-compiler', 'shap/cext/_cext_gpu.cu', '-lib', '-o', 'build/lib_cext_gpu.a', '-Xcompiler', '-fPIC', '--include-path', '/home/alexk101/.mambaforge/envs/general/include/python3.10', '--std', 'c++14', '--expt-extended-lambda', '--expt-relaxed-constexpr', '-arch=sm_52', '-gencode=arch=compute_70,code=sm_70', '-gencode=arch=compute_75,code=sm_75', '-gencode=arch=compute_75,code=compute_75']
/opt/cuda/bin/../targets/x86_64-linux/include/cuda/std/detail/libcxx/include/__functional/invoke.h: In instantiation of ‘struct cuda::std::__4::__invoke_of<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(thrust::device_vector<gpu_treeshap::PathElement<ShapSplitCondition>, thrust::device_allocator<gpu_treeshap::PathElement<ShapSplitCondition> > >*, thrust::device_vector<gpu_treeshap::PathElement<ShapSplitCondition>, thrust::device_allocator<gpu_treeshap::PathElement<ShapSplitCondition> > >*), gpu_treeshap::detail::DeduplicatePaths<thrust::device_vector<gpu_treeshap::PathElement<ShapSplitCondition>, thrust::device_allocator<gpu_treeshap::PathElement<ShapSplitCondition> > >, thrust::device_allocator<int>, ShapSplitCondition>, 2> >, gpu_treeshap::PathElement<ShapSplitCondition>, gpu_treeshap::PathElement<ShapSplitCondition> >’:
/opt/cuda/bin/../targets/x86_64-linux/include/cuda/std/detail/libcxx/include/__type_traits/result_of.h:31:48:   required from ‘class cuda::std::__4::result_of<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(thrust::device_vector<gpu_treeshap::PathElement<ShapSplitCondition>, thrust::device_allocator<gpu_treeshap::PathElement<ShapSplitCondition> > >*, thrust::device_vector<gpu_treeshap::PathElement<ShapSplitCondition>, thrust::device_allocator<gpu_treeshap::PathElement<ShapSplitCondition> > >*), gpu_treeshap::detail::DeduplicatePaths<thrust::device_vector<gpu_treeshap::PathElement<ShapSplitCondition>, thrust::device_allocator<gpu_treeshap::PathElement<ShapSplitCondition> > >, thrust::device_allocator<int>, ShapSplitCondition>, 2> >(gpu_treeshap::PathElement<ShapSplitCondition>, gpu_treeshap::PathElement<ShapSplitCondition>)>’
/opt/cuda/bin/../targets/x86_64-linux/include/cub/detail/type_traits.cuh:53:61:   required by substitution of ‘template<class Invokable, class InitT, class InputT> using accumulator_t = typename cuda::std::__4::decay<typename cuda::std::__4::result_of<Invokable(InitT, InputT)>::type>::type [with Invokable = __nv_dl_wrapper_t<__nv_dl_tag<void (*)(thrust::device_vector<gpu_treeshap::PathElement<ShapSplitCondition>, thrust::device_allocator<gpu_treeshap::PathElement<ShapSplitCondition> > >*, thrust::device_vector<gpu_treeshap::PathElement<ShapSplitCondition>, thrust::device_allocator<gpu_treeshap::PathElement<ShapSplitCondition> > >*), gpu_treeshap::detail::DeduplicatePaths<thrust::device_vector<gpu_treeshap::PathElement<ShapSplitCondition>, thrust::device_allocator<gpu_treeshap::PathElement<ShapSplitCondition> > >, thrust::device_allocator<int>, ShapSplitCondition>, 2> >; InitT = gpu_treeshap::PathElement<ShapSplitCondition>; InputT = gpu_treeshap::PathElement<ShapSplitCondition>]’
/tmp/tmpxft_000018ed_00000000-6__cext_gpu.compute_75.cudafe1.stub.c:217:849:   required from here
nvcc_internal_extended_lambda_implementation:293:146: error: static assertion failed: Attempt to use an extended __device__ lambda in a context that requires querying its return type in host code. Use a named function object, a __host__ __device__ lambda, or cuda::proclaim_return_type instead.
nvcc_internal_extended_lambda_implementation:293:146: note: ‘!(bool)__nv_extended_device_lambda_trait_helper<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(thrust::device_vector<gpu_treeshap::PathElement<ShapSplitCondition>, thrust::device_allocator<gpu_treeshap::PathElement<ShapSplitCondition> > >*, thrust::device_vector<gpu_treeshap::PathElement<ShapSplitCondition>, thrust::device_allocator<gpu_treeshap::PathElement<ShapSplitCondition> > >*), gpu_treeshap::detail::DeduplicatePaths<thrust::device_vector<gpu_treeshap::PathElement<ShapSplitCondition>, thrust::device_allocator<gpu_treeshap::PathElement<ShapSplitCondition> > >, thrust::device_allocator<int>, ShapSplitCondition>, 2> > >::value’ evaluates to false
Exception occurred during setup, Error building cuda module: CalledProcessError(1, ['/opt/cuda/bin/nvcc', '-allow-unsupported-compiler', 'shap/cext/_cext_gpu.cu', '-lib', '-o', 'build/lib_cext_gpu.a', '-Xcompiler', '-fPIC', '--include-path', '/home/alexk101/.mambaforge/envs/general/include/python3.10', '--std', 'c++14', '--expt-extended-lambda', '--expt-relaxed-constexpr', '-arch=sm_52', '-gencode=arch=compute_70,code=sm_70', '-gencode=arch=compute_75,code=sm_75', '-gencode=arch=compute_75,code=compute_75'])
WARNING: Could not compile cuda extensions.

Expected Behavior

NVCC compiles the kernels and GPUTreeExplainer works then.

Bug report checklist

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest release of shap.
  • I have confirmed this bug exists on the master branch of shap.
  • I'd be interested in making a PR to fix this bug

Installed Versions

NVCC/CUDA=release 12.2, V12.2.91
shap=0.42.1

@alexk101 alexk101 added the bug Indicates an unexpected problem or unintended behaviour label Sep 8, 2023
@alexk101
Copy link
Author

I have done some more research into this, and found that the GPU version of shap is hosted as an entirely seperate project within nvidia's rapids ecosystem. This seems to be actively maintained and using the header file from here fixes all compilation issues.

https://github.com/rapidsai/gputreeshap/blob/854070d9fcc2e99ab725175abbcb58c4560b8dce/GPUTreeShap/gpu_treeshap.h

Removing the support for deprecated architectures and updating the header file is all that is required. I can open a pull request with these changes, but would appreciate some feedback from a maintainer on if they think that this is reasonable.

@joaomh
Copy link

joaomh commented Oct 4, 2023

I have the same issue here, every time I try to run shap.explainers.GPUTree a get an error that

cuda extension was not built during install

So I check the setup.py install

Compiling cuda extension, calling nvcc with arguments:
['/usr/local/cuda-12/bin/nvcc', '-allow-unsupported-compiler', 'shap/cext/_cext_gpu.cu', '-lib', '-o', 'build/lib_cext_gpu.a', '-Xcompiler', '-fPIC', '--include-path', '/home/joao/anaconda3/include/python3.11', '--std', 'c++14', '--expt-extended-lambda', '--expt-relaxed-constexpr', '-arch=sm_37', '-gencode=arch=compute_37,code=sm_37', '-gencode=arch=compute_70,code=sm_70', '-gencode=arch=compute_75,code=sm_75', '-gencode=arch=compute_75,code=compute_75']
nvcc fatal   : Value 'sm_37' is not defined for option 'gpu-architecture'
Exception occurred during setup, Error building cuda module: CalledProcessError(1, ['/usr/local/cuda-12/bin/nvcc', '-allow-unsupported-compiler', 'shap/cext/_cext_gpu.cu', '-lib', '-o', 'build/lib_cext_gpu.a', '-Xcompiler', '-fPIC', '--include-path', '/home/joao/anaconda3/include/python3.11', '--std', 'c++14', '--expt-extended-lambda', '--expt-relaxed-constexpr', '-arch=sm_37', '-gencode=arch=compute_37,code=sm_37', '-gencode=arch=compute_70,code=sm_70', '-gencode=arch=compute_75,code=sm_75', '-gencode=arch=compute_75,code=compute_75'])
WARNING: Could not compile cuda extensions.

NVCC/CUDA=release 12
shap=0.42.1
GPU: RTX 4090

@alexk101
Copy link
Author

alexk101 commented Oct 4, 2023

@joaomh If you replace shap/shap/cext/gpu_treeshap.h with the version I mentioned in #3251 (comment) and remove lines 77-78

shap/setup.py

Lines 77 to 78 in 2262893

"-arch=sm_37 "
"-gencode=arch=compute_37,code=sm_37 "

from shap/setup.py, it should compile. I don't know why any of the maintainers haven't responded to this issue, since it doesn't seem to be a very difficult fix and provides a big performance benefit, but I suppose that is the nature of large open source projects.

@CloseChoice
Copy link
Collaborator

@alexk101 Thanks for reporting and looking into this so deeply. I implemented your suggestions in the linked PR, I can run code similar to the GPUTree notebook with with CUDA 12.2. Seems like this is the way to go.

@connortann
Copy link
Collaborator

Closing as I believe this is addressed by #3462

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behaviour
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants