Skip to content

Use GPUCompiler 1.13.3#700

Merged
vchuravy merged 2 commits into
mainfrom
vc/upgrade_gpuc
May 29, 2026
Merged

Use GPUCompiler 1.13.3#700
vchuravy merged 2 commits into
mainfrom
vc/upgrade_gpuc

Conversation

@vchuravy
Copy link
Copy Markdown
Member

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 28, 2026

Benchmark Results

main f95f72d... main / f95f72d...
saxpy/default/Float32/1024 0.0445 ± 0.028 ms 0.0437 ± 0.028 ms 1.02 ± 0.92
saxpy/default/Float32/1048576 0.186 ± 0.018 ms 0.184 ± 0.015 ms 1.01 ± 0.13
saxpy/default/Float32/16384 0.0498 ± 0.021 ms 0.05 ± 0.021 ms 0.996 ± 0.59
saxpy/default/Float32/2048 0.0449 ± 0.026 ms 0.0454 ± 0.026 ms 0.989 ± 0.82
saxpy/default/Float32/256 0.0375 ± 0.028 ms 0.0466 ± 0.028 ms 0.805 ± 0.77
saxpy/default/Float32/262144 0.0843 ± 0.023 ms 0.0844 ± 0.023 ms 0.999 ± 0.38
saxpy/default/Float32/32768 0.048 ± 0.022 ms 0.048 ± 0.022 ms 1 ± 0.65
saxpy/default/Float32/4096 0.0477 ± 0.024 ms 0.0468 ± 0.025 ms 1.02 ± 0.76
saxpy/default/Float32/512 0.0445 ± 0.028 ms 0.044 ± 0.028 ms 1.01 ± 0.91
saxpy/default/Float32/64 0.0381 ± 0.028 ms 0.0459 ± 0.028 ms 0.83 ± 0.79
saxpy/default/Float32/65536 0.0482 ± 0.023 ms 0.0504 ± 0.023 ms 0.956 ± 0.63
saxpy/default/Float64/1024 0.0427 ± 0.026 ms 0.0445 ± 0.026 ms 0.96 ± 0.82
saxpy/default/Float64/1048576 0.266 ± 0.017 ms 0.257 ± 0.018 ms 1.04 ± 0.098
saxpy/default/Float64/16384 0.0478 ± 0.021 ms 0.0483 ± 0.021 ms 0.988 ± 0.62
saxpy/default/Float64/2048 0.0489 ± 0.024 ms 0.0485 ± 0.026 ms 1.01 ± 0.74
saxpy/default/Float64/256 0.0383 ± 0.028 ms 0.0447 ± 0.028 ms 0.857 ± 0.82
saxpy/default/Float64/262144 0.103 ± 0.019 ms 0.109 ± 0.019 ms 0.943 ± 0.24
saxpy/default/Float64/32768 0.0518 ± 0.022 ms 0.0523 ± 0.022 ms 0.99 ± 0.59
saxpy/default/Float64/4096 0.0369 ± 0.025 ms 0.0442 ± 0.027 ms 0.836 ± 0.77
saxpy/default/Float64/512 0.0437 ± 0.028 ms 0.0466 ± 0.028 ms 0.938 ± 0.82
saxpy/default/Float64/64 0.0374 ± 0.028 ms 0.0448 ± 0.028 ms 0.834 ± 0.81
saxpy/default/Float64/65536 0.0544 ± 0.022 ms 0.058 ± 0.022 ms 0.938 ± 0.52
saxpy/static workgroup=(1024,)/Float32/1024 0.0432 ± 0.028 ms 0.046 ± 0.028 ms 0.939 ± 0.84
saxpy/static workgroup=(1024,)/Float32/1048576 0.184 ± 0.018 ms 0.185 ± 0.015 ms 0.997 ± 0.12
saxpy/static workgroup=(1024,)/Float32/16384 0.0496 ± 0.021 ms 0.0506 ± 0.021 ms 0.98 ± 0.58
saxpy/static workgroup=(1024,)/Float32/2048 0.0449 ± 0.026 ms 0.0445 ± 0.027 ms 1.01 ± 0.85
saxpy/static workgroup=(1024,)/Float32/256 0.0435 ± 0.028 ms 0.0449 ± 0.028 ms 0.97 ± 0.88
saxpy/static workgroup=(1024,)/Float32/262144 0.083 ± 0.024 ms 0.083 ± 0.022 ms 0.999 ± 0.39
saxpy/static workgroup=(1024,)/Float32/32768 0.0464 ± 0.022 ms 0.0473 ± 0.021 ms 0.98 ± 0.64
saxpy/static workgroup=(1024,)/Float32/4096 0.0474 ± 0.023 ms 0.0488 ± 0.024 ms 0.97 ± 0.67
saxpy/static workgroup=(1024,)/Float32/512 0.044 ± 0.028 ms 0.0453 ± 0.028 ms 0.971 ± 0.86
saxpy/static workgroup=(1024,)/Float32/64 0.042 ± 0.029 ms 0.0453 ± 0.028 ms 0.927 ± 0.86
saxpy/static workgroup=(1024,)/Float32/65536 0.0472 ± 0.023 ms 0.0491 ± 0.022 ms 0.96 ± 0.64
saxpy/static workgroup=(1024,)/Float64/1024 0.0426 ± 0.027 ms 0.045 ± 0.026 ms 0.947 ± 0.81
saxpy/static workgroup=(1024,)/Float64/1048576 0.267 ± 0.017 ms 0.255 ± 0.016 ms 1.05 ± 0.094
saxpy/static workgroup=(1024,)/Float64/16384 0.0467 ± 0.022 ms 0.0473 ± 0.021 ms 0.987 ± 0.64
saxpy/static workgroup=(1024,)/Float64/2048 0.0493 ± 0.024 ms 0.0473 ± 0.026 ms 1.04 ± 0.76
saxpy/static workgroup=(1024,)/Float64/256 0.0506 ± 0.027 ms 0.0433 ± 0.028 ms 1.17 ± 0.98
saxpy/static workgroup=(1024,)/Float64/262144 0.105 ± 0.02 ms 0.108 ± 0.018 ms 0.977 ± 0.25
saxpy/static workgroup=(1024,)/Float64/32768 0.0493 ± 0.023 ms 0.0519 ± 0.021 ms 0.949 ± 0.59
saxpy/static workgroup=(1024,)/Float64/4096 0.0432 ± 0.024 ms 0.0458 ± 0.026 ms 0.944 ± 0.75
saxpy/static workgroup=(1024,)/Float64/512 0.0459 ± 0.028 ms 0.0467 ± 0.027 ms 0.984 ± 0.82
saxpy/static workgroup=(1024,)/Float64/64 0.0448 ± 0.029 ms 0.0443 ± 0.028 ms 1.01 ± 0.92
saxpy/static workgroup=(1024,)/Float64/65536 0.0523 ± 0.023 ms 0.0566 ± 0.022 ms 0.924 ± 0.54
time_to_load 0.831 ± 0.0038 s 0.825 ± 0.001 s 1.01 ± 0.0047

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@vchuravy
Copy link
Copy Markdown
Member Author

@maleadt didn't have time to dig in, but that failure looks new.

@maleadt
Copy link
Copy Markdown
Member

maleadt commented May 28, 2026

Note that this may be a legitimate bug in KA.jl or so, because without the trap to ret PoCL was aggressive enough to optimize out the print statement too (making it fully invisible in many cases). For example, in OpenCL.jl I had to fix JuliaGPU/OpenCL.jl#436.

@maleadt
Copy link
Copy Markdown
Member

maleadt commented May 28, 2026

Oh my god this is a PoCL miscompilation because of a lane potentially exiting before it hits a barrier. I do have a workaround though... I'm also having Claude investigate a fix in PoCL itself.

@maleadt maleadt force-pushed the vc/upgrade_gpuc branch from 9313501 to f95f72d Compare May 28, 2026 21:26
@maleadt
Copy link
Copy Markdown
Member

maleadt commented May 29, 2026

So the workaround is benign; @vchuravy please review f95f72d. It essentially avoids some pesky bounds checks when running with --check-bounds=true; normal code is unaffected.

I'll fix this in PoCL next.

@vchuravy
Copy link
Copy Markdown
Member Author

Neat! Thank you.

@vchuravy vchuravy merged commit 304b743 into main May 29, 2026
35 of 38 checks passed
@vchuravy vchuravy deleted the vc/upgrade_gpuc branch May 29, 2026 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants