New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: No GPU Support for Modern CUDA #3251
Comments
I have done some more research into this, and found that the GPU version of shap is hosted as an entirely seperate project within nvidia's rapids ecosystem. This seems to be actively maintained and using the header file from here fixes all compilation issues. Removing the support for deprecated architectures and updating the header file is all that is required. I can open a pull request with these changes, but would appreciate some feedback from a maintainer on if they think that this is reasonable. |
I have the same issue here, every time I try to run shap.explainers.GPUTree a get an error that
So I check the setup.py install
NVCC/CUDA=release 12 |
@joaomh If you replace Lines 77 to 78 in 2262893
from |
@alexk101 Thanks for reporting and looking into this so deeply. I implemented your suggestions in the linked PR, I can run code similar to the GPUTree notebook with with CUDA 12.2. Seems like this is the way to go. |
Closing as I believe this is addressed by #3462 |
Issue Description
This is somewhat related to a number of GPU issues people have had, and probably more directly to #3150, but as is, the GPU kernels for this project do not compile with modern versions of CUDA. From my understanding this should be all versions >11.7, as that was when the
sm_37
compute capability was deprecated from CUDA, which is required for building the kernels, as referenced hereshap/setup.py
Lines 77 to 78 in 2262893
I think that there are a couple of ways that this could be approached. The first, I would say is a short term solution, and that is capping the maximum allowed CUDA version to 11.7. I don't have a machine to test this, but from my understanding of this error, that should fix it.
The second approach, which is more sustainable in the long term, but will require more work, is removing support for
sm_37
(which is the kepler architecture btw). Removing the previously referenced lines would be the first step, but as I have found will still require some changes to the kernels themselves for CUDA 12 and Thrust 2.0.0, as I get compile errors related to__device__
-only lambda’s return type being queried from host code hereshap/shap/cext/gpu_treeshap.h
Line 292 in 2262893
I don't know which approach the maintainers would like to take, but it would be nice to have shap on modern versions of CUDA. I am willing to help with that, but I must confess that my experience with c++ is fairly limited, so I would need some help from someone that is more informed. Familiarity with those kernels would also probably be helpful.
Additionally, I think that CUDA/GPU support should be a required parameter provided by the user when building, since as it is now, the setup simply has
with_cuda
hardcoded to True. This way, if the cuda build fails, the entire build can fail, though this would have to be checked for in this function by monitoring stderrshap/setup.py
Line 67 in 2262893
As it stands, even if the CUDA build fails, the setup will still continue. I didn't notice the CUDA compile errors until later because they are never propagated in the logs and just sit at the top of the install log.
Thanks, Alex
Minimal Reproducible Example
Traceback
Expected Behavior
NVCC compiles the kernels and GPUTreeExplainer works then.
Bug report checklist
Installed Versions
NVCC/CUDA=release 12.2, V12.2.91
shap=0.42.1
The text was updated successfully, but these errors were encountered: