Add a pass to apply fastmath attributes. by maleadt · Pull Request #804 · JuliaGPU/GPUCompiler.jl

maleadt · 2026-05-20T08:05:17Z

GPUCompiler.jl/CUDA.jl are currently using CUDA's libdevice in a way that's not aligned with LLVM: We use the library whenever possible, setting flags like __CUDA_PREC_SQRT to make it behave "precisely" when fastmath=false (the default). That's a fine way of doing things, but doesn't align with LLVM, which instead treats libdevice as "the fast library", never sets __CUDA_PREC_SQRT, and instead uses regular instructions (@llvm.sqrt) for the precise versions. This surfaces when switching to LLVM's NVVMReflect pass, #785, so let's align GPUCompiler.jl with LLVM.

To that end, I'll be switching to LLVM intrinsics in CUDA.jl, however, that requires us setting appropriate function/instruction attributes when compiling with fastmath=true. Which is what this PR does.

cc @vchuravy

vchuravy · 2026-05-20T09:07:32Z

Nice this aligns with the direction I am exploring in #800 where the special lowering of fdiv fast is moved to the "backend".

maleadt · 2026-05-20T09:13:50Z

Yep; I'll rebase that PR onto this.

Building on JuliaGPU/GPUCompiler.jl#805, JuliaGPU/GPUCompiler.jl#804, JuliaGPU/GPUCompiler.jl#800, avoid some of the uses of `libdevice`'s intrinsics, instead emitting vanilla LLVM IR and having GPUCompiler.jl post-process it into what we need in PTX. This has many advantages, including (potentially) better optimization, compatibility with LLVM tools like Enzyme, etc. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

maleadt added 2 commits May 20, 2026 09:49

Add a pass to apply fastmath attributes.

c2593d5

Add a test.

af1dcc3

maleadt merged commit ea44b77 into main May 20, 2026
71 of 73 checks passed

maleadt deleted the tb/fastmath branch May 20, 2026 08:56

This was referenced May 20, 2026

Add PTXFDivFastPass to lower fdiv fast to NVPTX approximate division #800

Merged

Reduce usage of libdevice, relying more on LLVM JuliaGPU/CUDA.jl#3149

Merged

PTX: add PTXRSqrtFastPass to fold afn 1/sqrt(x) to nvvm.rsqrt.approx #807

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a pass to apply fastmath attributes.#804

Add a pass to apply fastmath attributes.#804
maleadt merged 2 commits into
mainfrom
tb/fastmath

maleadt commented May 20, 2026

Uh oh!

Uh oh!

vchuravy commented May 20, 2026

Uh oh!

maleadt commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

maleadt commented May 20, 2026

Uh oh!

Uh oh!

vchuravy commented May 20, 2026

Uh oh!

maleadt commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants