Build MSBuild CUDA for the same GPU architectures as CMake#6359
Merged
Conversation
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Grantim
approved these changes
Jul 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The MSBuild build of MRCuda never set
CodeGeneration, so it fell back to the default from NVIDIA'sCUDA <ver>.props:compute_52,sm_52on CUDA 11.4/12.0 andcompute_75,sm_75on CUDA 13.2. The shipped binaries carried native SASS for that single old architecture plus its PTX, so every modern GPU JIT-compiled the PTX at first kernel load (startup delay) and then ran code not tuned for its architecture.This PR makes MSBuild target the same GPU architectures as the CMake build (
cmake/Modules/ConfigureCuda.cmake). The change lives entirely inplatform.props, so any CUDA project that imports it (and chains%(AdditionalOptions)in itsCudaCompileblocks, as MRCuda.vcxproj does) inherits the architectures automatically — including CUDA projects in dependent repositories; MRCuda.vcxproj itself is untouched.platform.propsdefinesMRCudaGencodefromMRCudaVersionand injects it via aCudaCompileitem definition. NVIDIA's defaultCodeGeneration(defined unconditionally in the CUDA props, which are imported afterplatform.props, so it cannot be overridden from there) is left in place: it always supplies the oldest architecture of the corresponding CMake set, andMRCudaGencodeadds the remaining ones plus PTX for the newest:The only difference from the CMake fatbins is one extra embedded PTX for that oldest architecture (the driver always JIT-picks the newest compatible PTX, so behavior is identical). A custom
MRCudaVersion(viaCustomMRPlatform.props) that matches no branch leavesMRCudaGencodeempty and keeps the stock behavior.Verified locally on the v143/CUDA 12.0 and v145/CUDA 13.2 paths:
cuobjdumpon the produced objects shows the full CMake set (e.g. 7 SASS cubins + sm_52/sm_89 PTX for CUDA 12.0). The previous revision of this PR (same architectures via explicit flags in MRCuda.vcxproj) passed full CI on all six MSBuild jobs including msvc-2019/CUDA 11.4.Cost, measured on
MRCudaFastWindingNumber.cuwith CUDA 12.0: compile time 3.8 s → 10.9 s per.cufile (the arch-independent host compile and CUDA front-end run once; only ptx/sass generation multiplies), object size 115 KB → ~460 KB. In exchange, no JIT delay at first use and arch-tuned kernels on all supported GPUs, same as the CMake-built packages.🤖 Generated with Claude Code