Skip to content

Rationalize and try to fix failing ldiv tests #2809

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kshyatt
Copy link
Member

@kshyatt kshyatt commented Jul 2, 2025

Trying to fix intermittently failing CI. Doesn't make sense to have these checks for only one of the inplace/not-inplace versions. Hopefully this helps stability.

@kshyatt kshyatt requested a review from maleadt July 2, 2025 17:17
@kshyatt kshyatt added cuda libraries Stuff about CUDA library wrappers. tests Adds or changes tests. labels Jul 2, 2025
Copy link
Contributor

github-actions bot commented Jul 2, 2025

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.
diff --git a/test/libraries/cusparse/interfaces.jl b/test/libraries/cusparse/interfaces.jl
index fa25d8330..34f9d75f8 100644
--- a/test/libraries/cusparse/interfaces.jl
+++ b/test/libraries/cusparse/interfaces.jl
@@ -258,7 +258,7 @@ nB = 2
                                 end
                             end
                             @testset "\\ -- CuMatrix" begin
-                                C  = triangle(opa(A)) \ opb(B)
+                                C = triangle(opa(A)) \ opb(B)
                                 dC = triangle(opa(dA)) \ opb(dB)
                                 @test C ≈ collect(dC)
                                 if CUSPARSE.version() < v"12.0"

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: 5db6744 Previous: fb0a528 Ratio
latency/precompile 43353735535.5 ns 43329572726 ns 1.00
latency/ttfp 7064329610 ns 7031277701 ns 1.00
latency/import 3452525802 ns 3465549489 ns 1.00
integration/volumerhs 9609113.5 ns 9597721 ns 1.00
integration/byval/slices=1 146862 ns 147057 ns 1.00
integration/byval/slices=3 425687 ns 426256 ns 1.00
integration/byval/reference 144957 ns 145230 ns 1.00
integration/byval/slices=2 286192.5 ns 286757 ns 1.00
integration/cudadevrt 103494 ns 103670 ns 1.00
kernel/indexing 14279 ns 14524 ns 0.98
kernel/indexing_checked 14912 ns 15000.5 ns 0.99
kernel/occupancy 754.2209302325581 ns 741.6906474820144 ns 1.02
kernel/launch 2269.8888888888887 ns 2251.3333333333335 ns 1.01
kernel/rand 18036 ns 14919 ns 1.21
array/reverse/1d 19937 ns 19819 ns 1.01
array/reverse/2d 24174 ns 25017 ns 0.97
array/reverse/1d_inplace 10330 ns 10579 ns 0.98
array/reverse/2d_inplace 11826 ns 12305 ns 0.96
array/copy 21210 ns 21174 ns 1.00
array/iteration/findall/int 159709.5 ns 158557 ns 1.01
array/iteration/findall/bool 140129 ns 139738 ns 1.00
array/iteration/findfirst/int 162678 ns 157479 ns 1.03
array/iteration/findfirst/bool 163518.5 ns 158520.5 ns 1.03
array/iteration/scalar 73054 ns 73433 ns 0.99
array/iteration/logical 215506.5 ns 217515.5 ns 0.99
array/iteration/findmin/1d 45938 ns 46904 ns 0.98
array/iteration/findmin/2d 95981 ns 96886 ns 0.99
array/reductions/reduce/Int64/1d 42305 ns 43179 ns 0.98
array/reductions/reduce/Int64/dims=1 55110.5 ns 47755.5 ns 1.15
array/reductions/reduce/Int64/dims=2 61831 ns 62189 ns 0.99
array/reductions/reduce/Int64/dims=1L 89151 ns 89100 ns 1.00
array/reductions/reduce/Int64/dims=2L 86599.5 ns 87187.5 ns 0.99
array/reductions/reduce/Float32/1d 34392 ns 35302 ns 0.97
array/reductions/reduce/Float32/dims=1 51824 ns 41632 ns 1.24
array/reductions/reduce/Float32/dims=2 59683 ns 59745 ns 1.00
array/reductions/reduce/Float32/dims=1L 53027 ns 53265 ns 1.00
array/reductions/reduce/Float32/dims=2L 69910.5 ns 70365 ns 0.99
array/reductions/mapreduce/Int64/1d 41808 ns 42192.5 ns 0.99
array/reductions/mapreduce/Int64/dims=1 45499.5 ns 55413.5 ns 0.82
array/reductions/mapreduce/Int64/dims=2 61577 ns 62472 ns 0.99
array/reductions/mapreduce/Int64/dims=1L 89162 ns 89096 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 86429 ns 87031 ns 0.99
array/reductions/mapreduce/Float32/1d 34423 ns 34963.5 ns 0.98
array/reductions/mapreduce/Float32/dims=1 41823 ns 42410 ns 0.99
array/reductions/mapreduce/Float32/dims=2 60232 ns 60141 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 53392 ns 53371 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 70498 ns 70585 ns 1.00
array/broadcast 20874 ns 21207 ns 0.98
array/copyto!/gpu_to_gpu 11337 ns 11289 ns 1.00
array/copyto!/cpu_to_gpu 215316 ns 217320 ns 0.99
array/copyto!/gpu_to_cpu 284614 ns 283093 ns 1.01
array/accumulate/Int64/1d 125256 ns 125220 ns 1.00
array/accumulate/Int64/dims=1 83748 ns 83742 ns 1.00
array/accumulate/Int64/dims=2 157761 ns 158355 ns 1.00
array/accumulate/Int64/dims=1L 1709202.5 ns 1709296 ns 1.00
array/accumulate/Int64/dims=2L 966206 ns 966418.5 ns 1.00
array/accumulate/Float32/1d 109184 ns 109589 ns 1.00
array/accumulate/Float32/dims=1 80881.5 ns 80692 ns 1.00
array/accumulate/Float32/dims=2 147791 ns 148042 ns 1.00
array/accumulate/Float32/dims=1L 1618440 ns 1618330 ns 1.00
array/accumulate/Float32/dims=2L 698196 ns 698511 ns 1.00
array/construct 1283.9 ns 1255.2 ns 1.02
array/random/randn/Float32 47374 ns 43548 ns 1.09
array/random/randn!/Float32 24964 ns 25021 ns 1.00
array/random/rand!/Int64 27404 ns 27406 ns 1.00
array/random/rand!/Float32 8755.666666666666 ns 8860.333333333334 ns 0.99
array/random/rand/Int64 29882 ns 38250 ns 0.78
array/random/rand/Float32 13077.5 ns 13162 ns 0.99
array/permutedims/4d 61086 ns 61907 ns 0.99
array/permutedims/2d 54866.5 ns 55440 ns 0.99
array/permutedims/3d 56006 ns 56158 ns 1.00
array/sorting/1d 2756544 ns 2757412.5 ns 1.00
array/sorting/by 3343162 ns 3361155 ns 0.99
array/sorting/2d 1084151 ns 1088259 ns 1.00
cuda/synchronization/stream/auto 1052.4 ns 1030.8 ns 1.02
cuda/synchronization/stream/nonblocking 7722.1 ns 8005.9 ns 0.96
cuda/synchronization/stream/blocking 828.0882352941177 ns 853.1914893617021 ns 0.97
cuda/synchronization/context/auto 1215.8 ns 1178.1 ns 1.03
cuda/synchronization/context/nonblocking 7555.7 ns 7403.6 ns 1.02
cuda/synchronization/context/blocking 913.1162790697674 ns 914.6428571428571 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda libraries Stuff about CUDA library wrappers. tests Adds or changes tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants