PTX: lower @fastmath sqrt to NVPTX approx intrinsics (for LLVM 18) by maleadt · Pull Request #805 · JuliaGPU/GPUCompiler.jl

maleadt · 2026-05-20T13:18:54Z

Add PTXFSqrtFastPass alongside PTXFDivFastPass. It rewrites afn-flagged llvm.sqrt.f{32,64} to llvm.nvvm.sqrt.approx{,.ftz}.f (f32) and rcp(rsqrt(x)) (f64), which is the same sequences NVPTX' getSqrtEstimate emits on LLVM 21+ where per-instruction afn and the function unsafe-fp-math attribute are honored. Like PTXFDivFastPass's f32 path, both sqrt paths are temporary backports for LLVM 18, which only consults TargetMachine.Options.UnsafeFPMath (unreachable through LLVM.jl).

This will allow us to get rid of most of libdevice in CUDA.jl.

Add `PTXFSqrtFastPass` alongside `PTXFDivFastPass`. It rewrites `afn`-flagged `llvm.sqrt.f{32,64}` to `llvm.nvvm.sqrt.approx{,.ftz}.f` (f32) and `rcp(rsqrt(x))` (f64), which is the same sequences NVPTX' `getSqrtEstimate` emits on LLVM 21+, where per-instruction `afn` and the function `unsafe-fp-math` attribute are honored. Like `PTXFDivFastPass`'s f32 path, both sqrt paths are temporary backports for LLVM 18, which only consults `TargetMachine.Options.UnsafeFPMath` (unreachable through LLVM.jl). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codecov · 2026-05-20T18:25:41Z

Codecov Report

❌ Patch coverage is 97.87234% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 75.95%. Comparing base (eded413) to head (afd4765).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
src/ptx.jl	97.87%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #805      +/-   ##
==========================================
+ Coverage   75.69%   75.95%   +0.25%     
==========================================
  Files          25       25              
  Lines        3983     4026      +43     
==========================================
+ Hits         3015     3058      +43     
  Misses        968      968

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Building on JuliaGPU/GPUCompiler.jl#805, JuliaGPU/GPUCompiler.jl#804, JuliaGPU/GPUCompiler.jl#800, avoid some of the uses of `libdevice`'s intrinsics, instead emitting vanilla LLVM IR and having GPUCompiler.jl post-process it into what we need in PTX. This has many advantages, including (potentially) better optimization, compatibility with LLVM tools like Enzyme, etc. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

maleadt changed the title ~~PTX: lower @fastmath sqrt to NVPTX approx intrinsics.~~ PTX: lower @fastmath sqrt to NVPTX approx intrinsics (for LLVM 18) May 20, 2026

maleadt merged commit f7d7418 into main May 20, 2026
36 of 37 checks passed

maleadt deleted the tb/ptx_fast_sqrt branch May 20, 2026 14:06

maleadt mentioned this pull request May 20, 2026

Reduce usage of libdevice, relying more on LLVM JuliaGPU/CUDA.jl#3149

Merged

maleadt mentioned this pull request May 21, 2026

PTX: add PTXRSqrtFastPass to fold afn 1/sqrt(x) to nvvm.rsqrt.approx #807

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PTX: lower @fastmath sqrt to NVPTX approx intrinsics (for LLVM 18)#805

PTX: lower @fastmath sqrt to NVPTX approx intrinsics (for LLVM 18)#805
maleadt merged 1 commit into
mainfrom
tb/ptx_fast_sqrt

maleadt commented May 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

codecov Bot commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

maleadt commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 20, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

maleadt commented May 20, 2026 •

edited

Loading