Back Propagation for SVD #97

GiggleLiu · 2018-10-29T16:22:50Z

Here I present the correct (but poor) implementation of BP for SVD, this implementation changes the original svd interfaces a bit, hoping someone can help improve it.

using LinearAlgebra
using Flux
using Flux.Tracker: @grad, data, track, TrackedTuple
import Flux.Tracker: _forward
import LinearAlgebra: svd

"""stablized back propagation function for svd"""
function svd_back(U, S, V, dU, dS, dV)
    NS = length(S)
    S2 = S.^2
    Sinv = 1 ./ S
    F = S2' .- S2
    @. F = F/(F^2 + 1e-12)

    UdU = U'*dU
    VdV = V'*dV

    Su = (F.*(UdU-UdU'))*Diagonal(S)
    Sv = Diagonal(S) * (F.*(VdV-VdV'))

    U * (Su + Sv + Diagonal(dS)) * V' +
    (I - U*U') * dU*Diagonal(Sinv) * V' +
    U*Diagonal(Sinv) * dV' * (I - V*V')
end

svd(a::TrackedArray) = track(svd, a)
# I doubt the macro `@grad` interface is less intuitive than `_forward`
function _forward(::typeof(svd), a)
    U, S, V = svd(data(a))   # making `svd` return value SVD, making Julian's life shorter.
    # returning a list won't work, one will get 0 gradient
    # [U|>param, S|>param, V|>param],  -> (svd_back(U, S, V, dU, dS, dV),)
    (U, S, Matrix(V)), Δ -> (svd_back(U, S, V, Δ...),)
end

# This is a use case
M, N = 4, 6
K = min(M, N)
A = param(randn(M, N))
res = svd(A)
# implement `Base.iterate(res::TrackedTuple) = ?` can make it prettier
U, S, V = res[1], res[2], res[3]

dU, dS, dV = randn(M, K), randn(K), randn(N, K)
Tracker.back!(res, (dU, dS, dV))
Tracker.grad(A)

Why we use `Matrix(V)` here?

We see this line in file src/tracker/scalar.jl is called

track(f::Call, xs::Tuple) = TrackedTuple(xs, Tracked{typeof(xs)}(f, zero.(xs)))

One should notice function zero can change type sometimes!
Here, SVD returns V as Adjoint, zero(Adjoint) will get Array!

Gocha!

Some aspects can be improved

Returning [U, S, V] in Flux should not cause gradient tracking failure.
Tracker should be able to propagate over dagger?
Return value checking for _forward is nessesary, so that readable error message can be throwed.
@grad is an arguably useful interface
Julia should remove over designed outputs for linear algebra functions like svd, I didn't see many benefits of such design.
zero and one should never change type, here, it should be considered as a bug.

The text was updated successfully, but these errors were encountered:

MikeInnes · 2018-10-29T17:43:28Z

Seems good to me, perhaps good to turn it into a PR?

The zero issue has come up before, I have a planned fix but haven't put it together yet.

GiggleLiu · 2018-10-29T18:03:21Z

Sure, I will submit a PR soon.

CarloLucibello · 2020-12-30T19:14:03Z

forgot to label this with "move to tracker". I labeled a few Tracker related issues, I'm not authorized to move them to Tracker.jl so I just closed them. Ideally, they should all be moved there and reopened (if we still care about tracker, I honestly don't)

GiggleLiu changed the title ~~Back Propagation Failure for SVD~~ Back Propagation for SVD Oct 29, 2018

CarloLucibello closed this as completed Dec 26, 2020

DhairyaLGandhi reopened this Dec 29, 2020

DhairyaLGandhi transferred this issue from FluxML/Flux.jl Feb 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Back Propagation for SVD #97

Back Propagation for SVD #97

GiggleLiu commented Oct 29, 2018

MikeInnes commented Oct 29, 2018

GiggleLiu commented Oct 29, 2018

CarloLucibello commented Dec 30, 2020

Back Propagation for SVD #97

Back Propagation for SVD #97

Comments

GiggleLiu commented Oct 29, 2018

Why we use Matrix(V) here?

Gocha!

Some aspects can be improved

MikeInnes commented Oct 29, 2018

GiggleLiu commented Oct 29, 2018

CarloLucibello commented Dec 30, 2020

Why we use `Matrix(V)` here?