Skip to content

Differentiating CUDA kernels #648

@MilesCranmer

Description

@MilesCranmer

I just wanted to make an issue to track this.

So, I was able to get part of the way here with a few manual overloads, even being able to differentiate through sum(x) for x a CuArray. However it didn't work for sum(cos.(x)) which I guess compiles a special kernel. The error from this looks harder to overload:

You can try this out on a GPU here: https://colab.research.google.com/drive/1H1FzBaahClBOPO-q09vGr8b5qYWc0cgX?usp=sharing

Here are the rules I manually defined within this notebook:

using CUDA, Mooncake, DifferentiationInterface, GPUArrays
using Mooncake:
    @is_primitive, @zero_adjoint, DefaultCtx, CoDual, NoPullback, NoRData,
    primal, tangent, zero_fcodual

@zero_adjoint(
    DefaultCtx,
    Tuple{typeof(Mooncake.lgetfield),CuArray,Val{:dims}}
)
@zero_adjoint(
    DefaultCtx,
    Tuple{Type{<:CuArray},UndefInitializer,Vararg}
)
@zero_adjoint(
    DefaultCtx,
    Tuple{typeof(Base.mightalias),CuArray,CuArray}
)
@zero_adjoint(
    DefaultCtx,
    Tuple{typeof(Base.unsafe_convert),Type{<:CuDeviceVector},CuArray}
)
@is_primitive(
    DefaultCtx,
    Tuple{typeof(mapreduce),typeof(identity),typeof(Base.add_sum),CuArray}
)
function Mooncake.rrule!!(
    ::CoDual{typeof(mapreduce)},
    ::CoDual{typeof(identity)},
    ::CoDual{typeof(Base.add_sum)},
    A::CoDual{<:CuArray}
)
    y = zero_fcodual(sum(primal(A)))
    pullback(dy) = (tangent(A) .+= dy; ntuple(_ -> NoRData(), 4))
    return y, pullback
end

I then executed:

x = randn(512)
x_device = cu(x)
f(z) = sum(cos.(z))  # n.b., works with sum(z)!
prep = prepare_gradient(f, AutoMooncake(), x_device)

However, the next error looks a bit trickier and I worry if this requires the handling of tasks:

Mooncake.build_rrule(Mooncake.MooncakeInterpreter(), Tuple{CUDA.var"##cufunction#1206", Base.Pairs{Symbol, Union{Nothing, Bool}, Tuple{Symbol, Symbol}, @NamedTuple{always_inline::Bool, maxthreads::Nothing}}, typeof(cufunction), GPUArrays.var"#gpu_broadcast_kernel_linear#38", Type{Tuple{KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Tuple{Base.OneTo{Int64}}, typeof(cos), Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}}}}; debug_mode=false)
Stacktrace (click me)
Stacktrace:
  [1] build_rrule(interp::Mooncake.MooncakeInterpreter{DefaultCtx}, sig_or_mi::Type; debug_mode::Bool, silence_debug_messages::Bool)
    @ Mooncake ~/.julia/packages/Mooncake/lBHAV/src/interpreter/s2s_reverse_mode_ad.jl:1136
  [2] build_rrule
    @ ~/.julia/packages/Mooncake/lBHAV/src/interpreter/s2s_reverse_mode_ad.jl:1077 [inlined]
  [3] (::Mooncake.DynamicDerivedRule{Dict{Any, Any}})(::CoDual{CUDA.var"##cufunction#1206", Mooncake.NoFData}, ::CoDual{@Kwargs{always_inline::Bool, maxthreads::Nothing}, Mooncake.NoFData}, ::CoDual{typeof(cufunction), Mooncake.NoFData}, ::CoDual{GPUArrays.var"#gpu_broadcast_kernel_linear#38", Mooncake.NoFData}, ::CoDual{Type{Tuple{KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Tuple{Base.OneTo{Int64}}, typeof(cos), Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}}}, Mooncake.NoFData})
    @ Mooncake ~/.julia/packages/Mooncake/lBHAV/src/interpreter/s2s_reverse_mode_ad.jl:1736
  [4] cufunction
    @ ~/.julia/packages/CUDA/ja0IX/src/compiler/execution.jl:365 [inlined]
  [5] (::Tuple{Mooncake.Stack{Int32}, Base.RefValue{Tuple{Mooncake.LazyZeroRData{typeof(Core.kwcall), Nothing}, Mooncake.LazyZeroRData{@NamedTuple{always_inline::Bool, maxthreads::Nothing}, Nothing}, Mooncake.LazyZeroRData{typeof(cufunction), Nothing}, Mooncake.LazyZeroRData{GPUArrays.var"#gpu_broadcast_kernel_linear#38", Nothing}, Mooncake.LazyZeroRData{Type{Tuple{KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Tuple{Base.OneTo{Int64}}, typeof(cos), Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}}}, Nothing}}}, CoDual{Tuple{Symbol, Symbol}, Mooncake.NoFData}, Mooncake.DynamicDerivedRule{Dict{Any, Any}}, Mooncake.Stack{Tuple{Any, Any}}})(none::CoDual{typeof(Core.kwcall), Mooncake.NoFData}, none::CoDual{@NamedTuple{always_inline::Bool, maxthreads::Nothing}, Mooncake.NoFData}, none::CoDual{typeof(cufunction), Mooncake.NoFData}, none::CoDual{GPUArrays.var"#gpu_broadcast_kernel...
    @ Base.Experimental ./<missing>:0
  [6] (::Mooncake.DerivedRule{Tuple{typeof(Core.kwcall), @NamedTuple{always_inline::Bool, maxthreads::Nothing}, typeof(cufunction), GPUArrays.var"#gpu_broadcast_kernel_linear#38", Type{Tuple{KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Tuple{Base.OneTo{Int64}}, typeof(cos), Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}}}}, Tuple{CoDual{typeof(Core.kwcall), Mooncake.NoFData}, CoDual{@NamedTuple{always_inline::Bool, maxthreads::Nothing}, Mooncake.NoFData}, CoDual{typeof(cufunction), Mooncake.NoFData}, CoDual{GPUArrays.var"#gpu_broadcast_kernel_linear#38", Mooncake.NoFData}, CoDual{Type{Tuple{KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneT...
    @ Mooncake ~/.julia/packages/Mooncake/lBHAV/src/interpreter/s2s_reverse_mode_ad.jl:966
  [7] (::Mooncake.DynamicDerivedRule{Dict{Any, Any}})(::CoDual{typeof(Core.kwcall), Mooncake.NoFData}, ::CoDual{@NamedTuple{always_inline::Bool, maxthreads::Nothing}, Mooncake.NoFData}, ::CoDual{typeof(cufunction), Mooncake.NoFData}, ::CoDual{GPUArrays.var"#gpu_broadcast_kernel_linear#38", Mooncake.NoFData}, ::CoDual{Type{Tuple{KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Tuple{Base.OneTo{Int64}}, typeof(cos), Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}}}, Mooncake.NoFData})
    @ Mooncake ~/.julia/packages/Mooncake/lBHAV/src/interpreter/s2s_reverse_mode_ad.jl:1739
  [8] (::Mooncake.RRuleZeroWrapper{Mooncake.DynamicDerivedRule{Dict{Any, Any}}})(::CoDual{typeof(Core.kwcall), Mooncake.NoFData}, ::CoDual{@NamedTuple{always_inline::Bool, maxthreads::Nothing}, Mooncake.NoFData}, ::CoDual{typeof(cufunction), Mooncake.NoFData}, ::CoDual{GPUArrays.var"#gpu_broadcast_kernel_linear#38", Mooncake.NoFData}, ::CoDual{Type{Tuple{KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Tuple{Base.OneTo{Int64}}, typeof(cos), Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}}}, Mooncake.NoFData})
    @ Mooncake ~/.julia/packages/Mooncake/lBHAV/src/interpreter/s2s_reverse_mode_ad.jl:302
  [9] #_#4
    @ ~/.julia/packages/CUDA/ja0IX/src/CUDAKernels.jl:109 [inlined]
 [10] (::Tuple{Mooncake.Stack{Int32}, Base.RefValue{Tuple{Mooncake.LazyZeroRData{CUDA.CUDAKernels.var"##_#4", Nothing}, Mooncake.LazyZeroRData{Tuple{Int64}, Nothing}, Mooncake.LazyZeroRData{Nothing, Nothing}, Mooncake.LazyZeroRData{KernelAbstractions.Kernel{CUDABackend, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, GPUArrays.var"#gpu_broadcast_kernel_linear#38"}, Nothing}, Mooncake.LazyZeroRData{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, Vararg{Any}}, Any}}}, Mooncake.RRuleZeroWrapper{Mooncake.DynamicDerivedRule{Dict{Any, Any}}}, Mooncake.RRuleZeroWrapper{Mooncake.DynamicDerivedRule{Dict{Any, Any}}}, Mooncake.RRuleZeroWrapper{Mooncake.DynamicDerivedRule{Dict{Any, Any}}}, Mooncake.DynamicDerivedRule{Dict{Any, Any}}, Mooncake.RRuleZeroWrapper{Mooncake.DynamicDerivedRule{Dict{Any, Any}}}, Mooncake.LazyDerivedRule{Tuple{CUDA.var"##launch_configuration#1014", Int64, Int64, typeof(launch_configuration), CuFunction}, Mooncake.DerivedRule{Tuple{CUDA.var"##launch_configuration#1014", Int64, Int64, typeof(launch_configuration), CuFunction}, Tuple{CoDual{CUDA.var"##launch_configuration#1014", Mooncake.NoFData}, CoDual{Int64, Mooncake.NoFData}, CoDual{Int64, Mooncake.NoFData}, CoDual{typeof(launch_configuration), Mooncake.NoFData}, CoDual{CuFunction, Mooncake.FData{@NamedTuple{handle::Ptr{Mooncake.NoTangent}, mod::Mooncake.MutableTangent{@NamedTuple{handle::Ptr{Mooncake.NoTangent}, ctx::Mooncake.Tangent{@NamedTuple{handle::Ptr{Moonc...
    @ Base.Experimental ./<missing>:0
 [11] DerivedRule
    @ ~/.julia/packages/Mooncake/lBHAV/src/interpreter/s2s_reverse_mode_ad.jl:966 [inlined]
 [12] _build_rule!(rule::Mooncake.LazyDerivedRule{Tuple{CUDA.CUDAKernels.var"##_#4", Tuple{Int64}, Nothing, KernelAbstractions.Kernel{CUDABackend, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, GPUArrays.var"#gpu_broadcast_kernel_linear#38"}, CuArray{Float32, 1, CUDA.DeviceMemory}, Vararg{Any}}, Mooncake.DerivedRule{Tuple{CUDA.CUDAKernels.var"##_#4", Tuple{Int64}, Nothing, KernelAbstractions.Kernel{CUDABackend, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, GPUArrays.var"#gpu_broadcast_kernel_linear#38"}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, Vararg{Any}}}, Tuple{CoDual{CUDA.CUDAKernels.var"##_#4", Mooncake.NoFData}, CoDual{Tuple{Int64}, Mooncake.NoFData}, CoDual{Nothing, Mooncake.NoFData}, CoDual{KernelAbstractions.Kernel{CUDABackend, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, GPUArrays.var"#gpu_broadcast_kernel_linear#38"}, Mooncake.NoFData}, CoDual}, CoDual{Nothing, Mooncake.NoFData}, Tuple{NoRData}, Tuple{NoRData, NoRData, NoRData, NoRData, Any}, true, Val{5}}}, args::Tuple{CoDual{CUDA.CUDAKernels.var"##_#4", Mooncake.NoFData}, CoDual{Tuple{Int64}, Mooncake.NoFData}, CoDual{Nothing, Mooncake.NoFData}, CoDual{KernelAbstractions.Kernel{CUDABackend, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, GPUArrays.var"#gpu_broadcast_kernel_...
    @ Mooncake ~/.julia/packages/Mooncake/lBHAV/src/interpreter/s2s_reverse_mode_ad.jl:1827
 [13] LazyDerivedRule
    @ ~/.julia/packages/Mooncake/lBHAV/src/interpreter/s2s_reverse_mode_ad.jl:1822 [inlined]
 [14] Kernel
    @ ~/.julia/packages/CUDA/ja0IX/src/CUDAKernels.jl:108 [inlined]
 [15] (::Tuple{Mooncake.Stack{Int32}, Base.RefValue{Tuple{Mooncake.LazyZeroRData{typeof(Core.kwcall), Nothing}, Mooncake.LazyZeroRData{@NamedTuple{ndrange::Tuple{Int64}}, Nothing}, Mooncake.LazyZeroRData{KernelAbstractions.Kernel{CUDABackend, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, GPUArrays.var"#gpu_broadcast_kernel_linear#38"}, Nothing}, Mooncake.LazyZeroRData{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Tuple{Base.OneTo{Int64}}, typeof(cos), Tuple{Base.Broadcast.Extruded{CuArray{Float32, 1, CUDA.DeviceMemory}, Tuple{Bool}, Tuple{Int64}}}}}, Nothing}}}, Mooncake.LazyDerivedRule{Tuple{CUDA.CUDAKernels.var"##_#4", Tuple{Int64}, Nothing, KernelAbstractions.Kernel{CUDABackend, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, GPUArrays.var"#gpu_broadcast_kernel_linear#38"}, CuArray{Float32, 1, CUDA.DeviceMemory}, Vararg{Any}}, Mooncake.DerivedRule{Tuple{CUDA.CUDAKernels.var"##_#4", Tuple{Int64}, Nothing, KernelAbstractions.Kernel{CUDABackend, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, GPUArrays.var"#gpu_broadcast_kernel_linear#38"}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, Vararg{Any}}}, Tuple{CoDual{CUDA.CUDAKernels.var"##_#4", Mooncake.NoFData}, CoDual{Tuple{Int64}, Mooncake.NoFData}, CoDual{Nothing, Mooncake.NoFData}, CoDual{KernelAbstractions.Kernel{CUDABackend,...
    @ Base.Experimental ./<missing>:0
 [16] (::Mooncake.DerivedRule{Tuple{typeof(Core.kwcall), @NamedTuple{ndrange::Tuple{Int64}}, KernelAbstractions.Kernel{CUDABackend, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, GPUArrays.var"#gpu_broadcast_kernel_linear#38"}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Tuple{Base.OneTo{Int64}}, typeof(cos), Tuple{Base.Broadcast.Extruded{CuArray{Float32, 1, CUDA.DeviceMemory}, Tuple{Bool}, Tuple{Int64}}}}}}, Tuple{CoDual{typeof(Core.kwcall), Mooncake.NoFData}, CoDual{@NamedTuple{ndrange::Tuple{Int64}}, Mooncake.NoFData}, CoDual{KernelAbstractions.Kernel{CUDABackend, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, GPUArrays.var"#gpu_broadcast_kernel_linear#38"}, Mooncake.NoFData}, CoDual{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Tuple{Base.OneTo{Int64}}, typeof(cos), Tuple{Base.Broadcast.Extruded{CuArray{Float32, 1, CUDA.DeviceMemory}, Tuple{Bool}, Tuple{Int64}}}}}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, Mooncake.FData{@NamedTuple{style::Mooncake.NoFData, f::Mooncake.NoFData, args::Tuple{Mooncake.FData{@NamedTuple{x::CuArray{Float32, 1, CUDA.DeviceMemory}, keeps::Mooncake.NoFData, defaults::Mooncake.NoFData}}}, axes::Mooncake.NoFData}}}}}, CoDual{Nothing, Mooncake.NoFData}, Tuple{NoRData}, NTuple{4, NoRData}, true, Val{4}})(...
    @ Mooncake ~/.julia/packages/Mooncake/lBHAV/src/interpreter/s2s_reverse_mode_ad.jl:966
 [17] (::Mooncake.DynamicDerivedRule{Dict{Any, Any}})(::CoDual{typeof(Core.kwcall), Mooncake.NoFData}, ::CoDual{@NamedTuple{ndrange::Tuple{Int64}}, Mooncake.NoFData}, ::CoDual{KernelAbstractions.Kernel{CUDABackend, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, GPUArrays.var"#gpu_broadcast_kernel_linear#38"}, Mooncake.NoFData}, ::CoDual{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, ::CoDual{Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Tuple{Base.OneTo{Int64}}, typeof(cos), Tuple{Base.Broadcast.Extruded{CuArray{Float32, 1, CUDA.DeviceMemory}, Tuple{Bool}, Tuple{Int64}}}}, Mooncake.FData{@NamedTuple{style::Mooncake.NoFData, f::Mooncake.NoFData, args::Tuple{Mooncake.FData{@NamedTuple{x::CuArray{Float32, 1, CUDA.DeviceMemory}, keeps::Mooncake.NoFData, defaults::Mooncake.NoFData}}}, axes::Mooncake.NoFData}}})
    @ Mooncake ~/.julia/packages/Mooncake/lBHAV/src/interpreter/s2s_reverse_mode_ad.jl:1739
 [18] (::Mooncake.RRuleZeroWrapper{Mooncake.DynamicDerivedRule{Dict{Any, Any}}})(::CoDual{typeof(Core.kwcall), Mooncake.NoFData}, ::CoDual{@NamedTuple{ndrange::Tuple{Int64}}, Mooncake.NoFData}, ::CoDual{KernelAbstractions.Kernel{CUDABackend, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, GPUArrays.var"#gpu_broadcast_kernel_linear#38"}, Mooncake.NoFData}, ::CoDual{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, ::CoDual{Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Tuple{Base.OneTo{Int64}}, typeof(cos), Tuple{Base.Broadcast.Extruded{CuArray{Float32, 1, CUDA.DeviceMemory}, Tuple{Bool}, Tuple{Int64}}}}, Mooncake.FData{@NamedTuple{style::Mooncake.NoFData, f::Mooncake.NoFData, args::Tuple{Mooncake.FData{@NamedTuple{x::CuArray{Float32, 1, CUDA.DeviceMemory}, keeps::Mooncake.NoFData, defaults::Mooncake.NoFData}}}, axes::Mooncake.NoFData}}})
    @ Mooncake ~/.julia/packages/Mooncake/lBHAV/src/interpreter/s2s_reverse_mode_ad.jl:302
 [19] f
    @ ./In[6]:1 [inlined]
 [20] (::Tuple{Mooncake.Stack{Int32}, Base.RefValue{Tuple{Mooncake.LazyZeroRData{typeof(f), Nothing}, Mooncake.LazyZeroRData{CuArray{Float32, 1, CUDA.DeviceMemory}, Nothing}}}, Mooncake.LazyDerivedRule{Tuple{typeof(unsafe_copyto!), CuArray{Float32, 1, CUDA.DeviceMemory}, Int64, CuArray{Float32, 1, CUDA.DeviceMemory}, Int64, Int64}, Mooncake.DerivedRule{Tuple{typeof(unsafe_copyto!), CuArray{Float32, 1, CUDA.DeviceMemory}, Int64, CuArray{Float32, 1, CUDA.DeviceMemory}, Int64, Int64}, Tuple{CoDual{typeof(unsafe_copyto!), Mooncake.NoFData}, CoDual{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, CoDual{Int64, Mooncake.NoFData}, CoDual{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, CoDual{Int64, Mooncake.NoFData}, CoDual{Int64, Mooncake.NoFData}}, CoDual{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Tuple{NoRData}, NTuple{6, NoRData}, false, Val{6}}}, CoDual{Symbol, Mooncake.NoFData}, CoDual{Symbol, Mooncake.NoFData}, CoDual{Symbol, Mooncake.NoFData}, Mooncake.RRuleZeroWrapper{Mooncake.DynamicDerivedRule{Dict{Any, Any}}}, Mooncake.RRuleZeroWrapper{Mooncake.DynamicDerivedRule{Dict{Any, Any}}}, Mooncake.LazyDerivedRule{Tuple{typeof(Base.Broadcast.throwdm), Tuple{Base.OneTo{Int64}}, Tuple{Base.OneTo{Int64}}}, Mooncake.DerivedRule{Tuple{typeof(Base.Broadcast.throwdm), Tuple{Base.OneTo{Int64}}, Tuple{Base.OneTo{Int64}}}, Tuple{CoDual{typeof(Base.Broadcast.throwdm), Mooncake.N...
    @ Base.Experimental ./<missing>:0
 [21] (::Mooncake.DerivedRule{Tuple{typeof(f), CuArray{Float32, 1, CUDA.DeviceMemory}}, Tuple{CoDual{typeof(f), Mooncake.NoFData}, CoDual{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}, CoDual{Float32, Mooncake.NoFData}, Tuple{Float32}, Tuple{NoRData, NoRData}, false, Val{2}})(::CoDual{typeof(f), Mooncake.NoFData}, ::CoDual{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}})
    @ Mooncake ~/.julia/packages/Mooncake/lBHAV/src/interpreter/s2s_reverse_mode_ad.jl:966
 [22] prepare_gradient_cache(::Function, ::Vararg{Any}; kwargs::@Kwargs{debug_mode::Bool, silence_debug_messages::Bool})
    @ Mooncake ~/.julia/packages/Mooncake/lBHAV/src/interface.jl:509
 [23] prepare_gradient_cache
    @ ~/.julia/packages/Mooncake/lBHAV/src/interface.jl:506 [inlined]
 [24] prepare_gradient_nokwarg(::Val{true}, ::typeof(f), ::AutoMooncake{Nothing}, ::CuArray{Float32, 1, CUDA.DeviceMemory})
    @ DifferentiationInterfaceMooncakeExt ~/.julia/packages/DifferentiationInterface/sPszY/ext/DifferentiationInterfaceMooncakeExt/onearg.jl:114
 [25] #prepare_gradient#46
    @ ~/.julia/packages/DifferentiationInterface/sPszY/src/first_order/gradient.jl:11 [inlined]
 [26] prepare_gradient(::typeof(f), ::AutoMooncake{Nothing}, ::CuArray{Float32, 1, CUDA.DeviceMemory})
    @ DifferentiationInterface ~/.julia/packages/DifferentiationInterface/sPszY/src/first_order/gradient.jl:8
 [27] top-level scope
    @ In[7]:1

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions