-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapt to GPUCompiler 0.18 #1799
Conversation
7e55432
to
9b1463b
Compare
Uhh, so this speeds-up CI (and local testing) by 30-50% 🤯 There's two major new changes that may explain a speed-up:
I can't imagine the latter explaining such a significant performance improvement though. |
I partially reverted changes to figure out which one was responsible, and it's JuliaGPU/GPUCompiler.jl#394. So probably we weren't actually caching inference results before? |
Zooming in on one of the more compilation-heavy test suites. Before:
After:
So LLVM times also improved by 25%... |
The GPUCompiler timings are confusing me more than anything else. Before:
After:
I guess our |
Ah, the difference isn't in GPUCompiler-related compilation, but in how GPUCompiler.jl itself is compiled by Julia. Here's the (sorted and processed) compiler trace ( The relevant bits (comparing fast against slow): -precompile(Tuple{Type{Base.Dict{GPUCompiler.CompilerJob{T, P} where P where T, Any}}})
-precompile(Tuple{Type{Base.Dict{GPUCompiler.CompilerJob{T, P} where P where T, String}}, Pair{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, String}})
+precompile(Tuple{Type{Base.Dict{GPUCompiler.CompilerJob{T, P, F} where F where P where T, Any}}})
+precompile(Tuple{Type{Base.Dict{GPUCompiler.CompilerJob{T, P, F} where F where P where T, String}}, Pair{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"...", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Int64, 1, 1}, Int64}}}, String}})
+precompile(Tuple{Type{Base.Dict{GPUCompiler.CompilerJob{T, P, F} where F where P where T, String}}, Pair{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel##...", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Float32, 1, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Main.var"...", Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceArray{Float32, 1, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}, String}})
+precompile(Tuple{Type{Base.Dict{GPUCompiler.CompilerJob{T, P, F} where F where P where T, String}}, Pair{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel##...", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Float64, 1, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Main.var"...", Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceArray{Float32, 1, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}, String}})
... -precompile(Tuple{Type{GPUCompiler.CompilerJob{T, P} where P where T}, GPUCompiler.FunctionSpec, GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}})
-precompile(Tuple{Type{GPUCompiler.FunctionSpec}, Type, Type, UInt64})
-precompile(Tuple{Type{GPUCompiler.KernelError}, GPUCompiler.CompilerJob{T, P} where P where T, String, String})
+precompile(Tuple{Type{GPUCompiler.CompilerJob{T, P, F} where F where P where T}, GPUCompiler.PTXCompilerTarget, GPUCompiler.FunctionSpec{Main.var"...", Tuple{}}, CUDA.CUDACompilerParams, Symbol, Bool})
+precompile(Tuple{Type{GPUCompiler.CompilerJob{T, P, F} where F where P where T}, GPUCompiler.PTXCompilerTarget, GPUCompiler.FunctionSpec{Main.var"..."{Int64}, Tuple{}}, CUDA.CUDACompilerParams, Symbol, Bool})
+precompile(Tuple{Type{GPUCompiler.CompilerJob{T, P, F} where F where P where T}, GPUCompiler.PTXCompilerTarget, GPUCompiler.FunctionSpec{Main.var"#child##...", Tuple{CUDA.CuDeviceArray{Int64, 1, 1}, Tuple{Int16}, Tuple{Int32, Int8, Int16, Int64, Int16, Int16}}}, CUDA.CUDACompilerParams, Symbol, Bool})
+precompile(Tuple{Type{GPUCompiler.CompilerJob{T, P, F} where F where P where T}, GPUCompiler.PTXCompilerTarget, GPUCompiler.FunctionSpec{Main.var"#child##...", Tuple{CUDA.CuDeviceArray{Int64, 1, 1}, Tuple{Int32}, Tuple{Int16}}}, CUDA.CUDACompilerParams, Symbol, Bool})
+precompile(Tuple{Type{GPUCompiler.CompilerJob{T, P, F} where F where P where T}, GPUCompiler.PTXCompilerTarget, GPUCompiler.FunctionSpec{Main.var"#child##...", Tuple{CUDA.CuDeviceArray{Int64, 1, 1}, Tuple{}, NTuple{8, Int64}}}, CUDA.CUDACompilerParams,
... -precompile(Tuple{typeof(Base.get!), Base.Dict{GPUCompiler.CompilerJob{T, P} where P where T, Array{LLVM.CallInst, 1}}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, Array{LLVM.CallInst, 1}})
+precompile(Tuple{typeof(Base.get!), Base.Dict{GPUCompiler.CompilerJob{T, P, F} where F where P where T, Array{LLVM.CallInst, 1}}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{Main.var"...", Tuple{}}}, Array{LLVM.CallInst, 1}})
+precompile(Tuple{typeof(Base.get!), Base.Dict{GPUCompiler.CompilerJob{T, P, F} where F where P where T, Array{LLVM.CallInst, 1}}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{Main.var"..."{Int64}, Tuple{}}}, Array{LLVM.CallInst, 1}})
+precompile(Tuple{typeof(Base.get!), Base.Dict{GPUCompiler.CompilerJob{T, P, F} where F where P where T, Array{LLVM.CallInst, 1}}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{Main.var"#child##...", Tuple{CUDA.CuDeviceArray{Int64, 1, 1}, Tuple{Int16}, Tuple{Int32, Int8, Int16, Int64, Int16, Int16}}}}, Array{LLVM.CallInst, 1}})
+precompile(Tuple{typeof(Base.get!), Base.Dict{GPUCompiler.CompilerJob{T, P, F} where F where P where T, Array{LLVM.CallInst, 1}}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{Main.var"#child##...", Tuple{CUDA.CuDeviceArray{Int64, 1, 1}, Tuple{Int32}, Tuple{Int16}}}}, Array{LLVM.CallInst, 1}})
... -precompile(Tuple{typeof(Base.get!), GPUCompiler.var"..."{LLVM.ThreadSafeContext, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}}, Base.Dict{GPUCompiler.CompilerJob{T, P} where P where T, String}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}})
+precompile(Tuple{typeof(Base.get!), GPUCompiler.var"..."{LLVM.ThreadSafeContext, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{Main.var"...", Tuple{}}}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{Main.var"...", Tuple{}}}}, Base.Dict{GPUCompiler.CompilerJob{T, P, F} where F where P where T, String}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{Main.var"...", Tuple{}}}})
+precompile(Tuple{typeof(Base.get!), GPUCompiler.var"..."{LLVM.ThreadSafeContext, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{Main.var"...", Tuple{}}}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{Main.var"..."{Int64}, Tuple{}}}}, Base.Dict{GPUCompiler.CompilerJob{T, P, F} where F where P where T, String}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{Main.var"..."{Int64}, Tuple{}}}})
+precompile(Tuple{typeof(Base.get!), GPUCompiler.var"..."{LLVM.ThreadSafeContext, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{Main.var"...", Tuple{}}}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(Main.world), Tuple{}}}}, Base.Dict{GPUCompiler.CompilerJob{T, P, F} where F where P where T, String}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(Main.world), Tuple{}}}})
+precompile(Tuple{typeof(Base.get!), GPUCompiler.var"..."{LLVM.ThreadSafeContext, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{Main.var"#hello##...", Tuple{}}}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(Main.world), Tuple{}}}}, Base.Dict{GPUCompiler.CompilerJob{T, P, F} where F where P where T, String}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(Main.world), Tuple{}}}})
... -precompile(Tuple{typeof(Base.getindex), Base.Dict{GPUCompiler.CompilerJob{T, P} where P where T, Array{LLVM.CallInst, 1}}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}})
+precompile(Tuple{typeof(Base.getindex), Base.Dict{GPUCompiler.CompilerJob{T, P, F} where F where P where T, Array{LLVM.CallInst, 1}}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{Main.var"...", Tuple{}}}})
+precompile(Tuple{typeof(Base.getindex), Base.Dict{GPUCompiler.CompilerJob{T, P, F} where F where P where T, Array{LLVM.CallInst, 1}}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{Main.var"..."{Int64}, Tuple{}}}})
+precompile(Tuple{typeof(Base.getindex), Base.Dict{GPUCompiler.CompilerJob{T, P, F} where F where P where T, Array{LLVM.CallInst, 1}}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{Main.var"#child##...", Tuple{CUDA.CuDeviceArray{Int64, 1, 1}, Tuple{Int16}, Tuple{Int32, Int8, Int16, Int64, Int16, Int16}}}}})
+precompile(Tuple{typeof(Base.getindex), Base.Dict{GPUCompiler.CompilerJob{T, P, F} where F where P where T, Array{LLVM.CallInst, 1}}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{Main.var"#child##...", Tuple{CUDA.CuDeviceArray{Int64, 1, 1}, Tuple{Int32}, Tuple{Int16}}}}})
+precompile(Tuple{typeof(Base.getindex), Base.Dict{GPUCompiler.CompilerJob{T, P, F} where F where P where T, Array{LLVM.CallInst, 1}}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{Main.var"#child##...", Tuple{CUDA.CuDeviceArray{Int64, 1, 1}, Tuple{}, NTuple{8, Int64}}}}})
+precompile(Tuple{typeof(Base.getindex), Base.Dict{GPUCompiler.CompilerJob{T, P, F} where F where P where T, Array{LLVM.CallInst, 1}}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{Main.var"#dp_5arg_kernel##...", NTuple{5, Int64}}}}) +precompile(Tuple{typeof(Base.hash), GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"...", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Int64, 1, 1}, Int64}}}, UInt64})
+precompile(Tuple{typeof(Base.hash), GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel##...", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Float32, 1, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Main.var"...", Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceArray{Float32, 1, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}, UInt64})
+precompile(Tuple{typeof(Base.hash), GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel##...", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Float64, 1, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Main.var"...", Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceArray{Float32, 1, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}, UInt64})
+precompile(Tuple{typeof(Base.hash), GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel##...", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Float64, 1, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Main.var"...", Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceArray{Int64, 1, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}, UInt64})
+precompile(Tuple{typeof(Base.hash), GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel##...", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Tuple{Base.Complex{Float32}, Base.Complex{Float32}}, 1, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(Base.Math.sincos), Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceArray{Base.Complex{Float32}, 1, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}, UInt64}) -precompile(Tuple{typeof(Base.lock), GPUCompiler.var"..."{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}}, Base.ReentrantLock})
-precompile(Tuple{typeof(Base.lock), GPUCompiler.var"..."{LLVM.ThreadSafeContext, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}}, Base.ReentrantLock})
+precompile(Tuple{typeof(Base.lock), GPUCompiler.var"..."{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"...", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Int64, 1, 1}, Int64}}}}, Base.ReentrantLock})
+precompile(Tuple{typeof(Base.lock), GPUCompiler.var"..."{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel##...", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Float32, 1, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Main.var"...", Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceArray{Float32, 1, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}}, Base.ReentrantLock})
+precompile(Tuple{typeof(Base.lock), GPUCompiler.var"..."{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel##...", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Float64, 1, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Main.var"...", Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceArray{Float32, 1, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}}, Base.ReentrantLock})
+precompile(Tuple{typeof(Base.lock), GPUCompiler.var"..."{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel##...", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Float64, 1, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Main.var"...", Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceArray{Int64, 1, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}}, Base.ReentrantLock})
+precompile(Tuple{typeof(Base.lock), GPUCompiler.var"..."{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel##...", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Tuple{Base.Complex{Float32}, Base.Complex{Float32}}, 1, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(Base.Math.sincos), Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceArray{Base.Complex{Float32}, 1, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}}, Base.ReentrantLock})
... +precompile(Tuple{typeof(Base.setindex!), Base.Dict{Int64, Union{GPUCompiler.CompilerJob{T, P, F} where F where P where T, GPUCompiler.FunctionSpec{F, TT} where TT where F}}, GPUCompiler.FunctionSpec{Main.var"...", Tuple{}}, Int64})
+precompile(Tuple{typeof(Base.setindex!), Base.Dict{Int64, Union{GPUCompiler.CompilerJob{T, P, F} where F where P where T, GPUCompiler.FunctionSpec{F, TT} where TT where F}}, GPUCompiler.FunctionSpec{Main.var"..."{Int64}, Tuple{}}, Int64})
+precompile(Tuple{typeof(Base.setindex!), Base.Dict{Int64, Union{GPUCompiler.CompilerJob{T, P, F} where F where P where T, GPUCompiler.FunctionSpec{F, TT} where TT where F}}, GPUCompiler.FunctionSpec{Main.var"#child##...", Tuple{CUDA.CuDeviceArray{Int64, 1, 1}, Tuple{Int16}, Tuple{Int32, Int8, Int16, Int64, Int16, Int16}}}, Int64})
+precompile(Tuple{typeof(Base.setindex!), Base.Dict{Int64, Union{GPUCompiler.CompilerJob{T, P, F} where F where P where T, GPUCompiler.FunctionSpec{F, TT} where TT where F}}, GPUCompiler.FunctionSpec{Main.var"#child##...", Tuple{CUDA.CuDeviceArray{Int64, 1, 1}, Tuple{Int32}, Tuple{Int16}}}, Int64}) -precompile(Tuple{typeof(GPUCompiler.cached_compilation), Base.Dict{UInt64, Any}, GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, Type, Type, Function, Function})
-precompile(Tuple{typeof(GPUCompiler.check_ir), GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, LLVM.Module})
-precompile(Tuple{typeof(GPUCompiler.ci_cache_lookup), GPUCompiler.CodeCache, Core.MethodInstance, UInt64, UInt64})
-precompile(Tuple{typeof(GPUCompiler.classify_arguments), GPUCompiler.CompilerJob{T, P} where P where T, LLVM.FunctionType})
+precompile(Tuple{typeof(GPUCompiler.cached_compilation), Base.Dict{UInt64, Any}, GPUCompiler.CompilerJob{T, P, F} where F where P where T, typeof(CUDA.cufunction_compile), typeof(CUDA.cufunction_link)})
+precompile(Tuple{typeof(GPUCompiler.check_ir), GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"...", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Int64, 1, 1}, Int64}}}, LLVM.Module})
+precompile(Tuple{typeof(GPUCompiler.check_ir), GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel##...", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Float32, 1, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Main.var"...", Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceArray{Float32, 1, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}, LLVM.Module})
+precompile(Tuple{typeof(GPUCompiler.check_ir), GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel##...", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Float64, 1, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Main.var"...", Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceArray{Float32, 1, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}, LLVM.Module})
+precompile(Tuple{typeof(GPUCompiler.check_ir), GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel##...", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Float64, 1, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Main.var"...", Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceArray{Int64, 1, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}, LLVM.Module})
... +precompile(Tuple{typeof(GPUCompiler.isintrinsic), GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, _A} where _A, String})
+precompile(Tuple{typeof(GPUCompiler.JuliaContext), CUDA.var"..."{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"...", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Int64, 1, 1}, Int64}}}}})
+precompile(Tuple{typeof(GPUCompiler.JuliaContext), CUDA.var"..."{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel##...", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Float32, 1, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Main.var"...", Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceArray{Float32, 1, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}}})
+precompile(Tuple{typeof(GPUCompiler.JuliaContext), CUDA.var"..."{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel##...", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Float64, 1, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Main.var"...", Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceArray{Float32, 1, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}}})
+precompile(Tuple{typeof(GPUCompiler.JuliaContext), CUDA.var"..."{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel##...", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Float64, 1, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Main.var"...", Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceArray{Int64, 1, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}}}) -precompile(Tuple{Type{Pair{A, B} where B where A}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, String})
+precompile(Tuple{Type{Pair{A, B} where B where A}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"...", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Int64, 1, 1}, Int64}}}, String})
+precompile(Tuple{Type{Pair{A, B} where B where A}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel##...", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Float32, 1, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Main.var"...", Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceArray{Float32, 1, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}, String})
+precompile(Tuple{Type{Pair{A, B} where B where A}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel##...", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Float64, 1, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Main.var"...", Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceArray{Float32, 1, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}, String})
+precompile(Tuple{Type{Pair{A, B} where B where A}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel##...", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Float64, 1, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Main.var"...", Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceArray{Int64, 1, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}, String})
+precompile(Tuple{Type{Pair{A, B} where B where A}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel##...", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Tuple{Base.Complex{Float32}, Base.Complex{Float32}}, 1, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(Base.Math.sincos), Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceArray{Base.Complex{Float32}, 1, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}, String})
+precompile(Tuple{Type{Pair{A, B} where B where A}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{Main.KernelObject{Float64}, Tuple{CUDA.CuDeviceArray{Float64, 1, 1}}}}, String})
+precompile(Tuple{Type{Pair{A, B} where B where A}, GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{Main.var"...", Tuple{}}}, String})
... Essentially, by dropping the |
Also refactors the compiler instantiation functionality.