Skip to content

Fix KeyError in hipcompile when fetching early hostcall metadata#924

Merged
luraess merged 2 commits into
mainfrom
lr/compiler
May 27, 2026
Merged

Fix KeyError in hipcompile when fetching early hostcall metadata#924
luraess merged 2 commits into
mainfrom
lr/compiler

Conversation

@luraess
Copy link
Copy Markdown
Member

@luraess luraess commented May 26, 2026

The global Dict side channel between link_libraries! and hipcompile could throw a KeyError if the key was missing (hash mismatch or concurrent compilation). Seen in Trixi.jl CI after #907 .

Replace it with task-local storage: link_libraries! writes into task_local_storage() and hipcompile reads from it on the same task, which is guaranteed since link_libraries! is called synchronously inside GPUCompiler.compile with fallback to late IR-based detection if the early pass was skipped.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMDGPU.jl Benchmarks

Details
Benchmark suite Current: 2786537 Previous: 30db2cc Ratio
amdgpu/synchronization/context/device 600 ns 600 ns 1
amdgpu/synchronization/stream/blocking 240 ns 250 ns 0.96
amdgpu/synchronization/stream/nonblocking 340 ns 340 ns 1
array/accumulate/Float32/1d 85291 ns 87912 ns 0.97
array/accumulate/Float32/dims=1 281074 ns 377806 ns 0.74
array/accumulate/Float32/dims=1L 136052 ns 132962 ns 1.02
array/accumulate/Float32/dims=2 132082 ns 130232 ns 1.01
array/accumulate/Float32/dims=2L 2831838 ns 2828730 ns 1.00
array/accumulate/Int64/1d 93182 ns 94792 ns 0.98
array/accumulate/Int64/dims=1 282853 ns 296295 ns 0.95
array/accumulate/Int64/dims=1L 167762 ns 168273 ns 1.00
array/accumulate/Int64/dims=2 126802 ns 124332 ns 1.02
array/accumulate/Int64/dims=2L 3011530 ns 3012253 ns 1.00
array/broadcast 134902 ns 91621 ns 1.47
array/construct 1720 ns 1740 ns 0.99
array/copy 37390 ns 38370 ns 0.97
array/copyto!/cpu_to_gpu 115101 ns 114781 ns 1.00
array/copyto!/gpu_to_cpu 114992 ns 115392 ns 1.00
array/copyto!/gpu_to_gpu 61280 ns 59901 ns 1.02
array/iteration/findall/bool 178903 ns 183663 ns 0.97
array/iteration/findall/int 195812 ns 191663 ns 1.02
array/iteration/findfirst/bool 119832 ns 116691 ns 1.03
array/iteration/findfirst/int 115711 ns 116192 ns 1.00
array/iteration/findmin/1d 168992 ns 170762 ns 0.99
array/iteration/findmin/2d 155072 ns 156362 ns 0.99
array/iteration/logical 354645 ns 358726 ns 0.99
array/iteration/scalar 289054 ns 297164 ns 0.97
array/permutedims/2d 73441 ns 74201 ns 0.99
array/permutedims/3d 73481 ns 74851 ns 0.98
array/permutedims/4d 75531 ns 77271 ns 0.98
array/random/rand/Float32 51451 ns 51681 ns 1.00
array/random/rand/Int64 57051 ns 57821 ns 0.99
array/random/rand!/Float32 145032 ns 74771 ns 1.94
array/random/rand!/Int64 131881 ns 93162 ns 1.42
array/random/randn/Float32 88551 ns 92041 ns 0.96
array/random/randn!/Float32 115981 ns 84951 ns 1.37
array/reductions/mapreduce/Float32/1d 133162 ns 134042 ns 0.99
array/reductions/mapreduce/Float32/dims=1 94921 ns 95651 ns 0.99
array/reductions/mapreduce/Float32/dims=1L 775700 ns 778621 ns 1.00
array/reductions/mapreduce/Float32/dims=2 97151 ns 97701 ns 0.99
array/reductions/mapreduce/Float32/dims=2L 300664 ns 299124 ns 1.01
array/reductions/mapreduce/Int64/1d 133461 ns 134972 ns 0.99
array/reductions/mapreduce/Int64/dims=1 95401 ns 95752 ns 1.00
array/reductions/mapreduce/Int64/dims=1L 784751 ns 782442 ns 1.00
array/reductions/mapreduce/Int64/dims=2 96312 ns 96952 ns 0.99
array/reductions/mapreduce/Int64/dims=2L 305104 ns 300554 ns 1.02
array/reductions/reduce/Float32/1d 133132 ns 134001 ns 0.99
array/reductions/reduce/Float32/dims=1 94591 ns 95701 ns 0.99
array/reductions/reduce/Float32/dims=1L 776690 ns 775212 ns 1.00
array/reductions/reduce/Float32/dims=2 96461 ns 97311 ns 0.99
array/reductions/reduce/Float32/dims=2L 296444 ns 303524 ns 0.98
array/reductions/reduce/Int64/1d 133292 ns 134762 ns 0.99
array/reductions/reduce/Int64/dims=1 94581 ns 95241 ns 0.99
array/reductions/reduce/Int64/dims=1L 778970 ns 782341 ns 1.00
array/reductions/reduce/Int64/dims=2 95802 ns 96752 ns 0.99
array/reductions/reduce/Int64/dims=2L 295445 ns 300534 ns 0.98
array/reverse/1d 43991 ns 44471 ns 0.99
array/reverse/1dL 74741 ns 75792 ns 0.99
array/reverse/1dL_inplace 108561 ns 139412 ns 0.78
array/reverse/1d_inplace 133172 ns 74861 ns 1.78
array/reverse/2d 51180 ns 52100 ns 0.98
array/reverse/2dL 101101 ns 102302 ns 0.99
array/reverse/2dL_inplace 129242 ns 179532 ns 0.72
array/reverse/2d_inplace 135592 ns 115342 ns 1.18
array/sorting/1d 344244 ns 343644 ns 1.00
integration/byval/reference 38881 ns 39161 ns 0.99
integration/byval/slices=1 39691 ns 40220 ns 0.99
integration/byval/slices=2 148002 ns 143882 ns 1.03
integration/byval/slices=3 237913 ns 237113 ns 1.00
integration/volumerhs 5027049 ns 5034652 ns 1.00
kernel/indexing 44150 ns 107302 ns 0.41
kernel/indexing_checked 131332 ns 46721 ns 2.81
kernel/launch 1290 ns 1290 ns 1
kernel/rand 169732 ns 123912 ns 1.37
latency/import 1476602726 ns 1470667415 ns 1.00
latency/precompile 11915889321 ns 11877284792 ns 1.00
latency/ttfp 10278643291 ns 10304855031 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@luraess luraess marked this pull request as ready for review May 27, 2026 10:01
@luraess luraess merged commit 250313f into main May 27, 2026
4 of 5 checks passed
@luraess luraess deleted the lr/compiler branch May 27, 2026 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant