Skip to content

Incorrect gradient of convolution w.r.t. weights #197

@dfdx

Description

@dfdx
import Random
import NNlib
import NNlib: DenseConvDims


Random.seed!(42);


function ngradient(f, xs::AbstractArray...)
  grads = zero.(xs)
  for (x, Δ) in zip(xs, grads), i in 1:length(x)
    δ = sqrt(eps())
    tmp = x[i]
    x[i] = tmp - δ/2
    y1 = f(xs...)
    x[i] = tmp + δ/2
    y2 = f(xs...)
    x[i] = tmp
    Δ[i] = (y2-y1)/δ
  end
  return grads
end


function conv_loss(x, w)
    cdims = DenseConvDims(x, w; stride=1, padding=0, dilation=1)
    y = NNlib.conv(x, w, cdims)
    return sum(y)
end


x = rand(7, 7, 3, 10); w = rand(3, 3, 3, 1)
cdims = DenseConvDims(x, w; stride=1, padding=0, dilation=1)
y = NNlib.conv(x, w, cdims)
dy = ones(size(y))
    
ndx, ndw = ngradient(conv_loss, x, w)

dx = NNlib.∇conv_data(dy, w, cdims)
dw = NNlib.∇conv_filter(x, dy, cdims)

isapprox(dx, ndx, rtol=1e-5, atol=1e-5)  # true
isapprox(dw, ndw, rtol=1e-5, atol=1e-5)  # false

I recently updated NNlib from (I think) version 0.6.0 to the latest version 0.6.6 and my tests started to fail. NNlib.∇conv_filter() differs from both - numeric approximation (ngradient) from CUDNN implementation, and the difference is quite huge (like [123, 128, ...] vs [112, 115, ...]), so it's not about numeric instability.

I tried to track back NNlib implementation, but huge portions of code were changed, including tests. Does anybody know what may have caused this issue?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions