Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(cmake) Fix cuda arch selection #1091

Conversation

matthewdouglas
Copy link
Contributor

@matthewdouglas matthewdouglas commented Feb 26, 2024

I noticed a few problems with the way the compute capabilities would be applied when compiling.

Currently CMAKE_CUDA_ARCHITECTURES_ALL is used to create a default list. This is defined in cmake>=3.23.0 but our minimum requirement is below that, cmake==3.22.1. The GitHub Actions for the Ubuntu builds are running the 3.21.0 version, and therefore have empty lists:

Ubuntu, x86-64, CUDA 12.1.0

-- Building with backend cuda
-- NO_CUBLASLT := OFF
-- CUDA Version: 121 (12.1.105)
-- CUDA Compiler: /usr/local/cuda/bin/nvcc
-- CUDA Capabilities Available: 
-- CUDA Capabilities  Selected: 
-- CUDA NVCC Flags:  --use_fast_math
-- Configuring done
-- Generating done
-- Build files have been written to: /src

However, the Windows builds have a newer version of CMake set up, so it's closer to what we expect:

Windows, x86-64, CUDA 12.1

-- NO_CUBLASLT := ON
-- The CUDA compiler identification is NVIDIA 12.1.66
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.1/bin/nvcc.exe - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.1/include (found version "12.1.66") 
-- CUDA Version: 121 (12.1.66)
-- CUDA Compiler: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.1/bin/nvcc.exe
-- CUDA Capabilities Available: 50;52;53;60;61;62;70;72;75;80;86;87;89;90
-- CUDA Capabilities  Selected: 50;52;53;60;61;62;70;72;75;80;86;87;89;90

Besides this issue, there's two others resolved here:

  1. When CMAKE_CUDA_ARCHITECTURES is not user-defined, it's initialized to the nvcc default. In the case of 12.x, this is adding to the actual flags used when invoking nvcc: --generate-code=arch=compute_52,code=[sm_52]
  2. By default we're only building cubins for the "real" architectures. When a new arch is released, we'd have to rebuild to support it. Instead, by including the "virtual" PTX build for at least the highest architecture selected, we can help ensure the PTX can run on newer cards.

After these changes, the generated flags look like this (Windows, CUDA 12.3, all capabilities selected)

nvcc.exe -forward-unknown-to-host-compiler -DBUILD_CUDA -Dbitsandbytes_EXPORTS -IC:\\dev\\bitsandbytes\\csrc -IC:\\dev\\bitsandbytes\\include -isystem \"C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.3\\include\" -D_WINDOWS -Xcompiler=\" /EHsc\" --use_fast_math -Xcompiler=\"-O2 -Ob2\" -DNDEBUG -std=c++14 \"--generate-code=arch=compute_50,code=[sm_50]\" \"--generate-code=arch=compute_52,code=[sm_52]\" \"--generate-code=arch=compute_53,code=[sm_53]\" \"--generate-code=arch=compute_60,code=[sm_60]\" \"--generate-code=arch=compute_61,code=[sm_61]\" \"--generate-code=arch=compute_62,code=[sm_62]\" \"--generate-code=arch=compute_70,code=[sm_70]\" \"--generate-code=arch=compute_72,code=[sm_72]\" \"--generate-code=arch=compute_75,code=[sm_75]\" \"--generate-code=arch=compute_80,code=[sm_80]\" \"--generate-code=arch=compute_86,code=[sm_86]\" \"--generate-code=arch=compute_87,code=[sm_87]\" \"--generate-code=arch=compute_89,code=[sm_89]\" \"--generate-code=arch=compute_90,code=[compute_90,sm_90]\" -Xcompiler=-MD -x cu -rdc=true -c C:\\dev\\bitsandbytes\\csrc\\kernels.cu -o CMakeFiles\\bitsandbytes.dir\\csrc\\kernels.cu.obj -Xcompiler=-FdTARGET_COMPILE_PDB,-FS"

Note that in a future PR, I'd like to consider replacing the build option -DCOMPUTE_CAPABILITY=... with the more canonical CMAKE_CUDA_ARCHITECTURES option. That can be a later change stacked after this PR.

@@ -125,7 +125,7 @@ jobs:
docker run --platform linux/$build_arch -i -w /src -v $PWD:/src $image sh -c \
"apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends cmake \
&& cmake -DCOMPUTE_BACKEND=cuda -DNO_CUBLASLT=${NO_CUBLASLT} . \
&& cmake -DCOMPUTE_BACKEND=cuda -DCOMPUTE_CAPABILITY=\"50;52;60;61;70;75;80;86;89;90\" -DNO_CUBLASLT=${NO_CUBLASLT} . \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I selected these for the CI to bring parity with what the old Makefile was selecting. When we get around to doing tests on GPU instances, we may want to narrow this down to "native" so we don't waste so much build time.

Copy link

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Titus-von-Koeller
Copy link
Collaborator

Thanks @matthewdouglas! Great work.

@rickardp @wkpark @akx Could one of you please do a first review of this?

Copy link
Contributor

@rickardp rickardp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I left a few comments / questions

@@ -82,6 +83,31 @@ if(BUILD_CUDA)
message(FATAL_ERROR "CUDA Version > 12 is not supported")
endif()

# CMake < 3.23.0 does not define CMAKE_CUDA_ARCHITECTURES_ALL.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to just kill this code and bump cmake requirement to 3.23.0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had thought about that, but considering Ubuntu 22.04 LTS as a popular host system, the default is 3.22.1. I believe this is what is on the Docker images that we're using too. I didn't want this PR to change too much outside of the scope of the issues I saw.

I know we could install a newer cmake on the Docker builds though, and GitHub's latest runner images all have CMake 3.28.x on them. So maybe a separate PR to bump up the minimum?

CMakeLists.txt Outdated Show resolved Hide resolved
# Ensure we build the PTX for the latest version.
# This is similar to the "all" and "all-major" options in CMake >= 23.
# TODO: Consider bumping CMake requirement and using CMAKE_CUDA_ARCHITECTURES=[all | native] by default
list(REMOVE_DUPLICATES COMPUTE_CAPABILITY)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside of removing unsupported archs is that we now remove them silently if the build agent no longer supports them (for example after a CUDA update). Maybe it would be better to fail if specifying an an architecture that is not available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that CI/release builds should be specific and set COMPUTE_CAPABILITY (or CMAKE_CUDA_ARCHITECTURES if moving to use that directly). But otherwise, all we're doing here is adding the virtual architecture to the highest capability selected, e.g. generating both compute_90 and sm_90: arch=compute_90,code=[compute_90,sm_90]. This is for forward compatibility.

We already have a guard for CUDA >= 13, so that would require manual intervention. We'd find out ahead of time as the architectures to be dropped would be deprecated in 12.x first. That should emit warnings from nvcc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense

@Titus-von-Koeller Titus-von-Koeller merged commit 753df25 into bitsandbytes-foundation:main Feb 27, 2024
31 checks passed
@Titus-von-Koeller
Copy link
Collaborator

Thanks @matthewdouglas, this is really helpful! Really glad to have your support with this :)

Would you be up to join our Slack channel to be able to coordinate about stuff every now and then? If yes, please send me an email at my firstname at huggingface dot co with the email that I should use for the invite.

I'm planning to merge some more workflow related PRs these days and aim to do a release to test PyPi by the end of the week in preparation for a full release next week.

@Titus-von-Koeller
Copy link
Collaborator

And thanks @rickardp for the review, very much appreciated as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants