Skip to content

Add a pass to apply fastmath attributes.#804

Merged
maleadt merged 2 commits into
mainfrom
tb/fastmath
May 20, 2026
Merged

Add a pass to apply fastmath attributes.#804
maleadt merged 2 commits into
mainfrom
tb/fastmath

Conversation

@maleadt
Copy link
Copy Markdown
Member

@maleadt maleadt commented May 20, 2026

GPUCompiler.jl/CUDA.jl are currently using CUDA's libdevice in a way that's not aligned with LLVM: We use the library whenever possible, setting flags like __CUDA_PREC_SQRT to make it behave "precisely" when fastmath=false (the default). That's a fine way of doing things, but doesn't align with LLVM, which instead treats libdevice as "the fast library", never sets __CUDA_PREC_SQRT, and instead uses regular instructions (@llvm.sqrt) for the precise versions. This surfaces when switching to LLVM's NVVMReflect pass, #785, so let's align GPUCompiler.jl with LLVM.

To that end, I'll be switching to LLVM intrinsics in CUDA.jl, however, that requires us setting appropriate function/instruction attributes when compiling with fastmath=true. Which is what this PR does.

cc @vchuravy

@maleadt maleadt merged commit ea44b77 into main May 20, 2026
71 of 73 checks passed
@maleadt maleadt deleted the tb/fastmath branch May 20, 2026 08:56
@vchuravy
Copy link
Copy Markdown
Member

Nice this aligns with the direction I am exploring in #800 where the special lowering of fdiv fast is moved to the "backend".

@maleadt
Copy link
Copy Markdown
Member Author

maleadt commented May 20, 2026

Yep; I'll rebase that PR onto this.

maleadt added a commit to JuliaGPU/CUDA.jl that referenced this pull request May 21, 2026
Building on JuliaGPU/GPUCompiler.jl#805, JuliaGPU/GPUCompiler.jl#804, JuliaGPU/GPUCompiler.jl#800, avoid some of the uses of `libdevice`'s intrinsics, instead emitting vanilla LLVM IR and having GPUCompiler.jl post-process it into what we need in PTX. This has many advantages, including (potentially) better optimization, compatibility with LLVM tools like Enzyme, etc.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants