Skip to content

Bump downgrader, and add code_air.#804

Merged
maleadt merged 1 commit into
mainfrom
tb/code_air
Jun 4, 2026
Merged

Bump downgrader, and add code_air.#804
maleadt merged 1 commit into
mainfrom
tb/code_air

Conversation

@maleadt

@maleadt maleadt commented Jun 4, 2026

Copy link
Copy Markdown
Member

Adapts to / builds on JuliaGPU/GPUCompiler.jl#829.

Demo:

julia> function kernel(arr)
           @inbounds arr[1] = sqrt(arr[1])
           return
       end

code_llvm now shows the LLVM IR we use with our in-process LLVM back-end:

julia> Metal.code_llvm(kernel, Tuple{MtlDeviceArray{Float32,1,Metal.AS.Generic}}; debuginfo=:none)
define void @julia_kernel_6055(ptr nocapture noundef nonnull readonly align 8 dereferenceable(16) %"arr::MtlDeviceArray") local_unnamed_addr {
top:
  %"arr::MtlDeviceArray.unbox" = load ptr, ptr %"arr::MtlDeviceArray", align 8
  %0 = load float, ptr %"arr::MtlDeviceArray.unbox", align 4
  %1 = fcmp uge float %0, 0.000000e+00
  br i1 %1, label %L22, label %L20

L20:                                              ; preds = %top
  call fastcc void @julia_throw_complex_domainerror_6130() #6
  unreachable

L22:                                              ; preds = %top
  %2 = call float @llvm.sqrt.f32(float %0)
  store float %2, ptr %"arr::MtlDeviceArray.unbox", align 4
  ret void
}

... while code_air shows what comes out of the downgrader (i.e., with AIR intrinsics, typed pointers, etc):

julia> Metal.code_air(kernel, Tuple{MtlDeviceArray{Float32,1,Metal.AS.Generic}})
; ...

define void @julia_kernel_6195({}* nocapture nonnull readonly align 8 dereferenceable(16) %"arr::MtlDeviceArray") local_unnamed_addr {
top:
  %0 = bitcast {}* %"arr::MtlDeviceArray" to {}**
  %"arr::MtlDeviceArray.unbox" = load {}*, {}** %0, align 8, !tbaa !2, !alias.scope !6, !noalias !9
  %1 = bitcast {}* %"arr::MtlDeviceArray.unbox" to float*
  %2 = load float, float* %1, align 4
  %3 = fcmp uge float %2, 0.000000e+00
  br i1 %3, label %L22, label %L20

L20:                                              ; preds = %top
  call fastcc void @julia_throw_complex_domainerror_6270()
  unreachable

L22:                                              ; preds = %top
  %4 = call float @air.sqrt.f32(float %2)
  %5 = bitcast {}* %"arr::MtlDeviceArray.unbox" to float*
  store float %4, float* %5, align 4
  ret void
}

; ...

Also bumps the downgrader.

Closes #800
Closes #799

@codecov

codecov Bot commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 92.30769% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 81.38%. Comparing base (e2e8c0a) to head (9377316).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/compiler/compilation.jl 91.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #804      +/-   ##
==========================================
- Coverage   82.02%   81.38%   -0.64%     
==========================================
  Files          66       66              
  Lines        3165     3148      -17     
==========================================
- Hits         2596     2562      -34     
- Misses        569      586      +17     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metal Benchmarks

Details
Benchmark suite Current: 9377316 Previous: e2e8c0a Ratio
array/accumulate/Float32/1d 1472562.5 ns 1474458 ns 1.00
array/accumulate/Float32/dims=1 1109916 ns 1109979.5 ns 1.00
array/accumulate/Float32/dims=1L 10258000 ns 10281083.5 ns 1.00
array/accumulate/Float32/dims=2 1413000 ns 1425645.5 ns 0.99
array/accumulate/Float32/dims=2L 7295938 ns 7227500 ns 1.01
array/accumulate/Int64/1d 1464333 ns 1455354 ns 1.01
array/accumulate/Int64/dims=1 1265687.5 ns 1239791.5 ns 1.02
array/accumulate/Int64/dims=1L 12438833.5 ns 12261208 ns 1.01
array/accumulate/Int64/dims=2 1565604 ns 1585813 ns 0.99
array/accumulate/Int64/dims=2L 9801208 ns 9732166 ns 1.01
array/broadcast 370959 ns 375250 ns 0.99
array/construct 5708 ns 6042 ns 0.94
array/permutedims/2d 633666.5 ns 635250 ns 1.00
array/permutedims/3d 1132833 ns 1135834 ns 1.00
array/permutedims/4d 1987500 ns 1988042 ns 1.00
array/private/copy 412583 ns 440834 ns 0.94
array/private/copyto!/cpu_to_gpu 365833 ns 366792 ns 1.00
array/private/copyto!/gpu_to_cpu 359437.5 ns 366875 ns 0.98
array/private/copyto!/gpu_to_gpu 338000 ns 341625 ns 0.99
array/private/iteration/findall/bool 1558750 ns 1562396 ns 1.00
array/private/iteration/findall/int 1694875 ns 1699000 ns 1.00
array/private/iteration/findfirst/bool 1459708 ns 1475416 ns 0.99
array/private/iteration/findfirst/int 1495187.5 ns 1508833 ns 0.99
array/private/iteration/findmin/1d 1572999.5 ns 1583083 ns 0.99
array/private/iteration/findmin/2d 1309250 ns 1312375 ns 1.00
array/private/iteration/logical 2170458 ns 2180291 ns 1.00
array/private/iteration/scalar 2693145.5 ns 2756709 ns 0.98
array/random/rand/Float32 619875 ns 611417 ns 1.01
array/random/rand/Int64 682833.5 ns 673334 ns 1.01
array/random/rand!/Float32 572833 ns 593917 ns 0.96
array/random/rand!/Int64 506875 ns 507250 ns 1.00
array/random/randn/Float32 573250 ns 599833 ns 0.96
array/random/randn!/Float32 522750 ns 548292 ns 0.95
array/reductions/mapreduce/Float32/1d 567375 ns 769083 ns 0.74
array/reductions/mapreduce/Float32/dims=1 499937.5 ns 515750 ns 0.97
array/reductions/mapreduce/Float32/dims=1L 824125 ns 804646 ns 1.02
array/reductions/mapreduce/Float32/dims=2 520541 ns 518125 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 1355333 ns 1362458 ns 0.99
array/reductions/mapreduce/Int64/1d 934541.5 ns 949209 ns 0.98
array/reductions/mapreduce/Int64/dims=1 783917 ns 805625 ns 0.97
array/reductions/mapreduce/Int64/dims=1L 1482041.5 ns 1655000 ns 0.90
array/reductions/mapreduce/Int64/dims=2 964125 ns 985645.5 ns 0.98
array/reductions/mapreduce/Int64/dims=2L 2256604.5 ns 2262604.5 ns 1.00
array/reductions/reduce/Float32/1d 750125 ns 758584 ns 0.99
array/reductions/reduce/Float32/dims=1 499708.5 ns 515917 ns 0.97
array/reductions/reduce/Float32/dims=1L 791624.5 ns 780604 ns 1.01
array/reductions/reduce/Float32/dims=2 497667 ns 513708 ns 0.97
array/reductions/reduce/Float32/dims=2L 1345125 ns 1357750 ns 0.99
array/reductions/reduce/Int64/1d 933708 ns 944708 ns 0.99
array/reductions/reduce/Int64/dims=1 797166 ns 796084 ns 1.00
array/reductions/reduce/Int64/dims=1L 1589666.5 ns 1493124.5 ns 1.06
array/reductions/reduce/Int64/dims=2 1036208 ns 973542 ns 1.06
array/reductions/reduce/Int64/dims=2L 2260958.5 ns 2253375 ns 1.00
array/shared/copy 202166.5 ns 230458 ns 0.88
array/shared/copyto!/cpu_to_gpu 40250 ns 42375 ns 0.95
array/shared/copyto!/gpu_to_cpu 42042 ns 41709 ns 1.01
array/shared/copyto!/gpu_to_gpu 50250 ns 42500 ns 1.18
array/shared/iteration/findall/bool 1563292 ns 1565208 ns 1.00
array/shared/iteration/findall/int 1687291 ns 1701000 ns 0.99
array/shared/iteration/findfirst/bool 1194041 ns 1197646 ns 1.00
array/shared/iteration/findfirst/int 1214084 ns 1229167 ns 0.99
array/shared/iteration/findmin/1d 1331166 ns 1323500 ns 1.01
array/shared/iteration/findmin/2d 1314812.5 ns 1313000 ns 1.00
array/shared/iteration/logical 2020750 ns 2026750 ns 1.00
array/shared/iteration/scalar 5937.5 ns 9708 ns 0.61
integration/byval/reference 1158375 ns 1161042 ns 1.00
integration/byval/slices=1 1159791 ns 1165291 ns 1.00
integration/byval/slices=2 2087708 ns 2092583 ns 1.00
integration/byval/slices=3 7912979.5 ns 7852292 ns 1.01
integration/metaldevrt 543333 ns 548584 ns 0.99
kernel/indexing 362166 ns 367375 ns 0.99
kernel/indexing_checked 485416.5 ns 497291.5 ns 0.98
kernel/launch 13583 ns 14833 ns 0.92
kernel/rand 499042 ns 503000 ns 0.99
latency/import 1404456958 ns 1396322937.5 ns 1.01
latency/precompile 30336193396 ns 31339498708 ns 0.97
latency/ttfp 1717121209 ns 1665409542 ns 1.03
metal/synchronization/context 835.1063829787234 ns 1175 ns 0.71
metal/synchronization/stream 444.8636363636364 ns 798.0408163265306 ns 0.56

This comment was automatically generated by workflow using github-action-benchmark.

@maleadt maleadt merged commit ec6df83 into main Jun 4, 2026
15 checks passed
@maleadt maleadt deleted the tb/code_air branch June 4, 2026 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

code_llvm is confusing Julia 1.13 regression. KernelAbstractions printing fails to compile

1 participant