Conversation
|
@maleadt Can I rebase? |
|
Of course! I didn't have time to investigate the failure though. |
9f70751 to
45c622b
Compare
They were fixed as part of #740. Now the failures should be related. |
There was a problem hiding this comment.
Metal Benchmarks
Details
| Benchmark suite | Current: c2099f9 | Previous: 1d2f000 | Ratio |
|---|---|---|---|
latency/precompile |
30595055750 ns |
25549419083 ns |
1.20 |
latency/ttfp |
1719044667 ns |
2346831687.5 ns |
0.73 |
latency/import |
1444265000 ns |
1427666042 ns |
1.01 |
integration/metaldevrt |
881709 ns |
877750 ns |
1.00 |
integration/byval/slices=1 |
1625167 ns |
1568625 ns |
1.04 |
integration/byval/slices=3 |
20737458 ns |
8402792 ns |
2.47 |
integration/byval/reference |
1621459 ns |
1559958 ns |
1.04 |
integration/byval/slices=2 |
2743708.5 ns |
2629875 ns |
1.04 |
kernel/indexing |
514083 ns |
627417 ns |
0.82 |
kernel/indexing_checked |
500375 ns |
608750 ns |
0.82 |
kernel/launch |
14417 ns |
12667 ns |
1.14 |
kernel/rand |
536062.5 ns |
576167 ns |
0.93 |
array/construct |
7000 ns |
6500 ns |
1.08 |
array/broadcast |
527917 ns |
606708 ns |
0.87 |
array/random/randn/Float32 |
1059583.5 ns |
1011104 ns |
1.05 |
array/random/randn!/Float32 |
737166 ns |
753875 ns |
0.98 |
array/random/rand!/Int64 |
539417 ns |
548708 ns |
0.98 |
array/random/rand!/Float32 |
542958 ns |
586208.5 ns |
0.93 |
array/random/rand/Int64 |
935917 ns |
789709 ns |
1.19 |
array/random/rand/Float32 |
797541.5 ns |
645000 ns |
1.24 |
array/accumulate/Int64/1d |
1290917 ns |
1260667 ns |
1.02 |
array/accumulate/Int64/dims=1 |
1914958 ns |
1859104.5 ns |
1.03 |
array/accumulate/Int64/dims=2 |
2317125 ns |
2179083 ns |
1.06 |
array/accumulate/Int64/dims=1L |
12186541 ns |
11673271 ns |
1.04 |
array/accumulate/Int64/dims=2L |
9856875 ns |
9628146 ns |
1.02 |
array/accumulate/Float32/1d |
1070229.5 ns |
1121395.5 ns |
0.95 |
array/accumulate/Float32/dims=1 |
1642750 ns |
1571667 ns |
1.05 |
array/accumulate/Float32/dims=2 |
2074083 ns |
1889459 ns |
1.10 |
array/accumulate/Float32/dims=1L |
10521791.5 ns |
9834209 ns |
1.07 |
array/accumulate/Float32/dims=2L |
7366250 ns |
7249666.5 ns |
1.02 |
array/reductions/reduce/Int64/1d |
1300812.5 ns |
1386875 ns |
0.94 |
array/reductions/reduce/Int64/dims=1 |
1142750 ns |
1117250 ns |
1.02 |
array/reductions/reduce/Int64/dims=2 |
1170125 ns |
1152958 ns |
1.01 |
array/reductions/reduce/Int64/dims=1L |
2037687.5 ns |
2013209 ns |
1.01 |
array/reductions/reduce/Int64/dims=2L |
4051479.5 ns |
4244083 ns |
0.95 |
array/reductions/reduce/Float32/1d |
790042 ns |
988750 ns |
0.80 |
array/reductions/reduce/Float32/dims=1 |
805125 ns |
843520.5 ns |
0.95 |
array/reductions/reduce/Float32/dims=2 |
857833 ns |
857917 ns |
1.00 |
array/reductions/reduce/Float32/dims=1L |
1361958.5 ns |
1326625 ns |
1.03 |
array/reductions/reduce/Float32/dims=2L |
1839750 ns |
1810667 ns |
1.02 |
array/reductions/mapreduce/Int64/1d |
1327916.5 ns |
1356437.5 ns |
0.98 |
array/reductions/mapreduce/Int64/dims=1 |
1139916 ns |
1102166.5 ns |
1.03 |
array/reductions/mapreduce/Int64/dims=2 |
1191916 ns |
1149750 ns |
1.04 |
array/reductions/mapreduce/Int64/dims=1L |
1992813 ns |
1988375 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2L |
3668520.5 ns |
3626916 ns |
1.01 |
array/reductions/mapreduce/Float32/1d |
769083 ns |
1055917 ns |
0.73 |
array/reductions/mapreduce/Float32/dims=1 |
828458.5 ns |
847396 ns |
0.98 |
array/reductions/mapreduce/Float32/dims=2 |
859729 ns |
860979.5 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1L |
1377875 ns |
1333042 ns |
1.03 |
array/reductions/mapreduce/Float32/dims=2L |
1863479.5 ns |
1898125 ns |
0.98 |
array/private/copyto!/gpu_to_gpu |
575041.5 ns |
633020.5 ns |
0.91 |
array/private/copyto!/cpu_to_gpu |
716104.5 ns |
804354.5 ns |
0.89 |
array/private/copyto!/gpu_to_cpu |
733000 ns |
816000 ns |
0.90 |
array/private/iteration/findall/int |
1618125 ns |
1581312.5 ns |
1.02 |
array/private/iteration/findall/bool |
1467937.5 ns |
1404916.5 ns |
1.04 |
array/private/iteration/findfirst/int |
2130875 ns |
2075167 ns |
1.03 |
array/private/iteration/findfirst/bool |
2091646 ns |
2048750 ns |
1.02 |
array/private/iteration/scalar |
3148250 ns |
4526479 ns |
0.70 |
array/private/iteration/logical |
2714583 ns |
2693625 ns |
1.01 |
array/private/iteration/findmin/1d |
2641812.5 ns |
2518041 ns |
1.05 |
array/private/iteration/findmin/2d |
1864125 ns |
1820229.5 ns |
1.02 |
array/private/copy |
857958.5 ns |
568854 ns |
1.51 |
array/shared/copyto!/gpu_to_gpu |
85645.5 ns |
84291 ns |
1.02 |
array/shared/copyto!/cpu_to_gpu |
84333 ns |
82875 ns |
1.02 |
array/shared/copyto!/gpu_to_cpu |
83833 ns |
83000 ns |
1.01 |
array/shared/iteration/findall/int |
1615958 ns |
1585854.5 ns |
1.02 |
array/shared/iteration/findall/bool |
1502875 ns |
1421875 ns |
1.06 |
array/shared/iteration/findfirst/int |
1734583 ns |
1654709 ns |
1.05 |
array/shared/iteration/findfirst/bool |
1691375 ns |
1643542 ns |
1.03 |
array/shared/iteration/scalar |
208917 ns |
210375 ns |
0.99 |
array/shared/iteration/logical |
2267250 ns |
2297959 ns |
0.99 |
array/shared/iteration/findmin/1d |
2258875 ns |
2134229 ns |
1.06 |
array/shared/iteration/findmin/2d |
1880958 ns |
1806042 ns |
1.04 |
array/shared/copy |
222000 ns |
241812 ns |
0.92 |
array/permutedims/4d |
2538479 ns |
2395583 ns |
1.06 |
array/permutedims/2d |
1240667 ns |
1158833 ns |
1.07 |
array/permutedims/3d |
1854333 ns |
1686541 ns |
1.10 |
metal/synchronization/stream |
19875 ns |
19583 ns |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
I know this is still WIP, but hopefully this saves you some troubleshooting time.
device_synchronize seems to be broken due to the task-local storage call.
Edit: From a fresh session:
julia> using Metal; length(Metal.global_queues)
1 # Should be 0Edit 2: Adding empty!(Metal.global_queues) to __init__() seems to prevent the segfault, but that cannot be the proper solution right?
| the function changes, or when different types or keyword arguments are provided. | ||
| """ | ||
| function mtlfunction(f::F, tt::TT=Tuple{}; name=nothing, kwargs...) where {F,TT} | ||
| function mtlfunction(@nospecialize(f), @nospecialize(tt)=Tuple{}; name=nothing, kwargs...) |
There was a problem hiding this comment.
This despecialization breaks this inference test. But that seems to be on purpose so maybe remove?
Lines 45 to 53 in 1d2f000
Only shows up when commenting out the device_synchronize test.
| finally | ||
| close(cce) | ||
| end | ||
| @autoreleasepool begin |
There was a problem hiding this comment.
Does this @autoreleasepool do anything? The parent function (@autoreleasepool function (kernel::HostKernel)(args...) is already annotated with one.
... by adding a proper precompilation workload and removing some overzealous specialization.
Needs to be properly validated to ensure the
@nospecializeon crucial functions likemtlfunctiondoesn't regress launch overhead.Before:
After: