Skip to content

Conversation

pxl-th
Copy link
Member

@pxl-th pxl-th commented Aug 7, 2024

Needed for AMDGPU backend to enable hardware atomics.

Copy link
Contributor

github-actions bot commented Aug 7, 2024

Benchmark Results

main 6e7c349... main/6e7c349192bca3...
saxpy/default/Float16/1024 2.79 ± 0.19 μs 2.79 ± 0.19 μs 0.999
saxpy/default/Float16/1048576 2.07 ± 0.012 ms 2.07 ± 0.016 ms 0.998
saxpy/default/Float16/16384 0.0328 ± 0.00014 ms 0.0328 ± 0.00013 ms 1
saxpy/default/Float16/2048 5.21 ± 0.02 μs 5.22 ± 0.02 μs 0.999
saxpy/default/Float16/256 1.7 ± 0.024 μs 1.71 ± 0.027 μs 0.994
saxpy/default/Float16/262144 0.516 ± 0.0012 ms 0.516 ± 0.0015 ms 1
saxpy/default/Float16/32768 0.065 ± 0.00017 ms 0.065 ± 0.00018 ms 1
saxpy/default/Float16/4096 10.1 ± 0.061 μs 10.1 ± 0.06 μs 1
saxpy/default/Float16/512 2.08 ± 0.036 μs 2.08 ± 0.046 μs 1
saxpy/default/Float16/64 1.42 ± 0.009 μs 1.42 ± 0.01 μs 1.01
saxpy/default/Float16/65536 0.129 ± 0.00031 ms 0.129 ± 0.00035 ms 1
saxpy/default/Float32/1024 1.04 ± 0.17 μs 1.03 ± 0.16 μs 1.01
saxpy/default/Float32/1048576 0.885 ± 0.013 ms 0.886 ± 0.014 ms 0.999
saxpy/default/Float32/16384 14.4 ± 0.12 μs 14.4 ± 0.12 μs 1
saxpy/default/Float32/2048 1.73 ± 0.18 μs 1.73 ± 0.18 μs 1
saxpy/default/Float32/256 1.24 ± 0.014 μs 1.22 ± 0.012 μs 1.01
saxpy/default/Float32/262144 0.221 ± 0.00059 ms 0.221 ± 0.0006 ms 1
saxpy/default/Float32/32768 28.3 ± 0.12 μs 28.3 ± 0.14 μs 0.999
saxpy/default/Float32/4096 3.04 ± 0.02 μs 3.03 ± 0.023 μs 1
saxpy/default/Float32/512 1.28 ± 0.013 μs 1.26 ± 0.012 μs 1.02
saxpy/default/Float32/64 1.21 ± 0.012 μs 1.2 ± 0.012 μs 1.01
saxpy/default/Float32/65536 0.056 ± 0.00026 ms 0.0563 ± 0.00039 ms 0.995
saxpy/default/Float64/1024 1.07 ± 0.17 μs 1.07 ± 0.16 μs 0.998
saxpy/default/Float64/1048576 1.01 ± 0.028 ms 1.01 ± 0.031 ms 0.991
saxpy/default/Float64/16384 16.3 ± 0.6 μs 16.5 ± 0.41 μs 0.988
saxpy/default/Float64/2048 1.8 ± 0.16 μs 1.79 ± 0.16 μs 1
saxpy/default/Float64/256 1.34 ± 0.011 μs 1.35 ± 0.015 μs 0.992
saxpy/default/Float64/262144 0.243 ± 0.0043 ms 0.245 ± 0.0074 ms 0.994
saxpy/default/Float64/32768 0.0317 ± 0.00095 ms 0.0325 ± 0.0011 ms 0.976
saxpy/default/Float64/4096 3.06 ± 0.048 μs 3.07 ± 0.054 μs 0.997
saxpy/default/Float64/512 1.36 ± 0.015 μs 1.37 ± 0.017 μs 0.996
saxpy/default/Float64/64 1.34 ± 0.02 μs 1.34 ± 0.026 μs 0.997
saxpy/default/Float64/65536 0.063 ± 0.0013 ms 0.0621 ± 0.0019 ms 1.02
saxpy/static workgroup=(1024,)/Float16/1024 2.06 ± 0.2 μs 2.07 ± 0.21 μs 0.992
saxpy/static workgroup=(1024,)/Float16/1048576 0.163 ± 0.0072 ms 0.197 ± 0.021 ms 0.828
saxpy/static workgroup=(1024,)/Float16/16384 4.28 ± 0.21 μs 4.25 ± 0.21 μs 1.01
saxpy/static workgroup=(1024,)/Float16/2048 2.1 ± 0.21 μs 2.12 ± 0.21 μs 0.987
saxpy/static workgroup=(1024,)/Float16/256 2.61 ± 0.045 μs 2.63 ± 0.037 μs 0.992
saxpy/static workgroup=(1024,)/Float16/262144 0.044 ± 0.0024 ms 0.0484 ± 0.0046 ms 0.909
saxpy/static workgroup=(1024,)/Float16/32768 6.8 ± 0.26 μs 6.85 ± 0.25 μs 0.993
saxpy/static workgroup=(1024,)/Float16/4096 2.38 ± 0.03 μs 2.41 ± 0.033 μs 0.988
saxpy/static workgroup=(1024,)/Float16/512 3.13 ± 0.058 μs 3.14 ± 0.069 μs 0.996
saxpy/static workgroup=(1024,)/Float16/64 2.24 ± 0.023 μs 2.25 ± 0.022 μs 0.997
saxpy/static workgroup=(1024,)/Float16/65536 12.9 ± 0.53 μs 13.2 ± 1.2 μs 0.976
saxpy/static workgroup=(1024,)/Float32/1024 1.93 ± 0.023 μs 1.93 ± 0.023 μs 0.998
saxpy/static workgroup=(1024,)/Float32/1048576 0.247 ± 0.0056 ms 0.275 ± 0.032 ms 0.9
saxpy/static workgroup=(1024,)/Float32/16384 4.8 ± 0.28 μs 4.85 ± 0.23 μs 0.988
saxpy/static workgroup=(1024,)/Float32/2048 2.26 ± 0.22 μs 2.25 ± 0.22 μs 1
saxpy/static workgroup=(1024,)/Float32/256 2.84 ± 1.9 μs 2.76 ± 1.6 μs 1.03
saxpy/static workgroup=(1024,)/Float32/262144 0.0632 ± 0.0056 ms 0.0628 ± 0.0098 ms 1.01
saxpy/static workgroup=(1024,)/Float32/32768 8.26 ± 0.97 μs 8.26 ± 1 μs 0.999
saxpy/static workgroup=(1024,)/Float32/4096 2.42 ± 0.19 μs 2.4 ± 0.19 μs 1.01
saxpy/static workgroup=(1024,)/Float32/512 2.43 ± 0.22 μs 2.45 ± 0.21 μs 0.991
saxpy/static workgroup=(1024,)/Float32/64 2.41 ± 0.047 μs 2.42 ± 0.055 μs 0.996
saxpy/static workgroup=(1024,)/Float32/65536 17.7 ± 2.8 μs 18.2 ± 3.4 μs 0.976
saxpy/static workgroup=(1024,)/Float64/1024 2.05 ± 0.055 μs 2.04 ± 0.063 μs 1
saxpy/static workgroup=(1024,)/Float64/1048576 0.461 ± 0.061 ms 0.542 ± 0.071 ms 0.85
saxpy/static workgroup=(1024,)/Float64/16384 8.32 ± 0.83 μs 8.5 ± 0.85 μs 0.979
saxpy/static workgroup=(1024,)/Float64/2048 2.51 ± 0.3 μs 2.51 ± 0.24 μs 0.999
saxpy/static workgroup=(1024,)/Float64/256 2.39 ± 0.042 μs 2.41 ± 0.052 μs 0.993
saxpy/static workgroup=(1024,)/Float64/262144 0.117 ± 0.015 ms 0.117 ± 0.021 ms 0.993
saxpy/static workgroup=(1024,)/Float64/32768 17.3 ± 2.7 μs 16.7 ± 2.7 μs 1.04
saxpy/static workgroup=(1024,)/Float64/4096 3.02 ± 0.33 μs 3.02 ± 0.31 μs 1
saxpy/static workgroup=(1024,)/Float64/512 2.38 ± 0.038 μs 2.4 ± 0.042 μs 0.992
saxpy/static workgroup=(1024,)/Float64/64 2.37 ± 0.073 μs 2.39 ± 0.081 μs 0.992
saxpy/static workgroup=(1024,)/Float64/65536 0.0321 ± 0.0044 ms 31.5 ± 4.5 μs 1.02
time_to_load 0.454 ± 0.0021 s 0.466 ± 0.0024 s 0.975

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@pxl-th pxl-th requested a review from vchuravy August 7, 2024 10:11
@vchuravy vchuravy merged commit ef5fd9e into main Aug 7, 2024
@vchuravy vchuravy deleted the pxl-th/deps branch August 7, 2024 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants