Skip to content

Conversation

kshyatt
Copy link
Member

@kshyatt kshyatt commented Sep 9, 2025

No description provided.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: d5523a2 Previous: dfaabdb Ratio
latency/precompile 43740998178.5 ns 43590558495.5 ns 1.00
latency/ttfp 7294945333 ns 7344099780 ns 0.99
latency/import 3845870338 ns 3861207415.5 ns 1.00
integration/volumerhs 9610799 ns 9615837.5 ns 1.00
integration/byval/slices=1 146930 ns 147037 ns 1.00
integration/byval/slices=3 426056 ns 425997.5 ns 1.00
integration/byval/reference 145085 ns 145256 ns 1.00
integration/byval/slices=2 286755 ns 286737 ns 1.00
integration/cudadevrt 103489 ns 103728 ns 1.00
kernel/indexing 14238 ns 14435 ns 0.99
kernel/indexing_checked 14869 ns 14949 ns 0.99
kernel/occupancy 668.5723270440252 ns 666.73125 ns 1.00
kernel/launch 2174 ns 2251.222222222222 ns 0.97
kernel/rand 15043 ns 14910 ns 1.01
array/reverse/1d 19834 ns 20055 ns 0.99
array/reverse/2d 25085.5 ns 25280.5 ns 0.99
array/reverse/1d_inplace 10585 ns 10892 ns 0.97
array/reverse/2d_inplace 12245 ns 12408 ns 0.99
array/copy 20749.5 ns 21086 ns 0.98
array/iteration/findall/int 157276 ns 159097.5 ns 0.99
array/iteration/findall/bool 139656.5 ns 140713 ns 0.99
array/iteration/findfirst/int 2160964 ns 2162799.5 ns 1.00
array/iteration/findfirst/bool 2140311.5 ns 2141641 ns 1.00
array/iteration/scalar 72248 ns 74637 ns 0.97
array/iteration/logical 234026 ns 238771 ns 0.98
array/iteration/findmin/1d 257743 ns 259592 ns 0.99
array/iteration/findmin/2d 96270.5 ns 97114 ns 0.99
array/reductions/reduce/Int64/1d 146998.5 ns 148270 ns 0.99
array/reductions/reduce/Int64/dims=1 43621 ns 44696 ns 0.98
array/reductions/reduce/Int64/dims=2 61235.5 ns 61857 ns 0.99
array/reductions/reduce/Int64/dims=1L 88705 ns 89351 ns 0.99
array/reductions/reduce/Int64/dims=2L 666609 ns 658275 ns 1.01
array/reductions/reduce/Float32/1d 103959 ns 104593.5 ns 0.99
array/reductions/reduce/Float32/dims=1 40918 ns 41280 ns 0.99
array/reductions/reduce/Float32/dims=2 59576 ns 59725 ns 1.00
array/reductions/reduce/Float32/dims=1L 52384 ns 52485 ns 1.00
array/reductions/reduce/Float32/dims=2L 547655 ns 544164 ns 1.01
array/reductions/mapreduce/Int64/1d 148530.5 ns 149156 ns 1.00
array/reductions/mapreduce/Int64/dims=1 44006 ns 44370 ns 0.99
array/reductions/mapreduce/Int64/dims=2 61378 ns 62012 ns 0.99
array/reductions/mapreduce/Int64/dims=1L 88739 ns 89327 ns 0.99
array/reductions/mapreduce/Int64/dims=2L 685775 ns 682325 ns 1.01
array/reductions/mapreduce/Float32/1d 104777.5 ns 105007 ns 1.00
array/reductions/mapreduce/Float32/dims=1 40821.5 ns 41225 ns 0.99
array/reductions/mapreduce/Float32/dims=2 59278 ns 59806 ns 0.99
array/reductions/mapreduce/Float32/dims=1L 52875 ns 52777 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 546561 ns 546720 ns 1.00
array/broadcast 20266 ns 20550 ns 0.99
array/copyto!/gpu_to_gpu 12701 ns 13092 ns 0.97
array/copyto!/cpu_to_gpu 215887 ns 217526 ns 0.99
array/copyto!/gpu_to_cpu 286532 ns 285472 ns 1.00
array/accumulate/Int64/1d 124860 ns 125445.5 ns 1.00
array/accumulate/Int64/dims=1 83612 ns 84607 ns 0.99
array/accumulate/Int64/dims=2 158367 ns 159195 ns 0.99
array/accumulate/Int64/dims=1L 1720479.5 ns 1711030.5 ns 1.01
array/accumulate/Int64/dims=2L 968217 ns 967534 ns 1.00
array/accumulate/Float32/1d 109158 ns 110309 ns 0.99
array/accumulate/Float32/dims=1 80566 ns 81245 ns 0.99
array/accumulate/Float32/dims=2 147383.5 ns 148275 ns 0.99
array/accumulate/Float32/dims=1L 1618511 ns 1620617.5 ns 1.00
array/accumulate/Float32/dims=2L 697938 ns 701776.5 ns 0.99
array/construct 1619.5 ns 1282.2 ns 1.26
array/random/randn/Float32 43358 ns 48163 ns 0.90
array/random/randn!/Float32 24972 ns 25283 ns 0.99
array/random/rand!/Int64 27556 ns 27992 ns 0.98
array/random/rand!/Float32 8714 ns 8808.666666666666 ns 0.99
array/random/rand/Int64 29823 ns 30736 ns 0.97
array/random/rand/Float32 13034 ns 13312 ns 0.98
array/permutedims/4d 60306.5 ns 61276.5 ns 0.98
array/permutedims/2d 54244 ns 54680 ns 0.99
array/permutedims/3d 54918 ns 55949.5 ns 0.98
array/sorting/1d 2756031.5 ns 2759773 ns 1.00
array/sorting/by 3342585 ns 3355912 ns 1.00
array/sorting/2d 1079663 ns 1085708 ns 0.99
cuda/synchronization/stream/auto 1067.1 ns 1042.5 ns 1.02
cuda/synchronization/stream/nonblocking 6984.5 ns 7909 ns 0.88
cuda/synchronization/stream/blocking 844.506329113924 ns 842.8571428571429 ns 1.00
cuda/synchronization/context/auto 1190.3 ns 1164.4 ns 1.02
cuda/synchronization/context/nonblocking 7423.5 ns 7798.6 ns 0.95
cuda/synchronization/context/blocking 916.2432432432432 ns 891.3921568627451 ns 1.03

This comment was automatically generated by workflow using github-action-benchmark.

maleadt
maleadt previously requested changes Sep 9, 2025
Copy link
Member

@maleadt maleadt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing doesn't seem safe. CUPTI should be versioned correctly nowadays, so we should compare the 13.0 vs 13.0.1 one and update the check accordingly.

@kshyatt
Copy link
Member Author

kshyatt commented Sep 9, 2025

I think I sourced the problem, CUDA_Runtime_jll needs an update which I'm working on now.

@maleadt
Copy link
Member

maleadt commented Sep 9, 2025

JuliaPackaging/Yggdrasil#12039
I'll check what the CUPTI version changes into after merging that.

Copy link
Contributor

github-actions bot commented Sep 9, 2025

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.
diff --git a/test/core/profile.jl b/test/core/profile.jl
index ef95d183c..a139b8b04 100644
--- a/test/core/profile.jl
+++ b/test/core/profile.jl
@@ -74,7 +74,7 @@ let
     @test occursin("cuCtxGetCurrent", str)
 end
 
-if CUPTI.version() != v"13.0.0" # NVIDIA/NVTX#125
+            if CUPTI.version() != v"13.0.0" # NVIDIA/NVTX#125
 
 # NVTX markers
 let
@@ -90,7 +90,7 @@ let
     @test occursin("a range", str)
 end
 
-end
+            end
 
 end
 end

@maleadt maleadt dismissed their stale review September 9, 2025 15:47

Implemented.

@maleadt maleadt merged commit d670186 into master Sep 9, 2025
2 of 3 checks passed
@maleadt maleadt deleted the ksh/cupti branch September 9, 2025 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants