Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with non-array/scalar structured values #8

Closed
ararslan opened this issue Apr 22, 2019 · 16 comments
Closed

Dealing with non-array/scalar structured values #8

ararslan opened this issue Apr 22, 2019 · 16 comments
Labels
Structural Tangent Related to the `Tangent` type for structured (composite) values

Comments

@ararslan
Copy link
Member

Consider the case of an eigenvalue decomposition from the eigen function, which produces both eigenvalues and eigenvectors in an Eigen object. Giles provides forward- and reverse-mode sensitivities for the decomposition, which depend on both the eigenvalues and vectors. That begs the question of how this should be expressed in ChainRules.

In a conversation with @jrevels, he said that his vision for this was to use named tuples in cases such as this, which involve structures other than arrays and scalars. However, we weren't able to come to a concrete conclusion on whether it makes more sense for a Rule for e.g. eigen to produce a named tuple upon application of the rule, or if frule/rrule should themselves should produce a named tuple of Rules.

It seems that the author of a rule can reuse computations for the eigenvalues and vectors if applying the rule yields a single named tuple. This would look something along the lines of (untested):

function frule(::typeof(eigen), X::AbstractMatrix)
    E = eigen(X)
    λ, U = E
    n = size(X, 1)
    R = Rule() do ΔX
        Y = U' * ΔX * U
        values = copy(diag(Y))
        @inbounds for j = 1:n, i = 1:n
            if i == j
                Y[i,j] = 0
            else
                Y[i,j] /= λ[j] - λ[i]
            end
        end
        vectors = U * Y
        (values=values, vectors=vectors)
    end
    return E, R
end

However, if frule/rrule returns a named tuple of rules, one can specify accessor functions much more conveniently, something like:

function frule(::typeof(eigvals), X::AbstractMatrix)
    E, R = frule(eigen, X)
    return E.values, R.values
end

To quote Jarrett directly, at what stage do we want the caller to know that they should deal with a named tuple?

@willtebbutt
Copy link
Member

I'm also not entirely sure what the correct answer is here.

The other thing we should sort out while making these choices is whether, when dealing with non-Array AbstractArray types (e.g. Diagonal, Symmetric etc), the rules should also be AbstractArray types, named tuples (in whatever way we decide), or whether it's acceptable to have both conventions coexist.

@ararslan
Copy link
Member Author

ararslan commented May 2, 2019

Something else to consider here is objects with "virtual fields," for which in some cases the actual fields may be considered "private." For example, getproperty on a Cholesky permits accessing U or L, which are not actually fields, but are computed as simple functions of the fields factors and uplo. Would the named tuple returned in this case, be it from frule/rrule or from application of a rule, reflect the fields or the properties?

@oxinabox
Copy link
Member

oxinabox commented May 10, 2019

So my attempt to summarize the discussion right now:
We should return a Rule
that when evaluated return s a NamedTuple,
where that NamedTuple contains for each element a Differentiable (or a Value)

The normal case: (Without NamedTuples)
We have say a function and 3 inputs to a rrule
its returns the value for the function and a tuple containing 3 Rules.
Each for those rules can then be given a a sensitivity for the souronding operation.
And at which point they return a Differentiable (or a value).
(Which could be a Thunk, but Thunks have no inputs, though they do close over the input to the Rule)

Now the new case is if one of the inputs was a Struct.
Now the rrule still returns the same: ie,. the function value and a tuple containing 3 Rules.
But now the struct's rule when given a sentivity return not a single Differentiable,
(or value), but rather a NamedTuple of Differentiables (or values) matching the property (fields) of the struct

This makes sense as at the end of the day a Differntiable is basically a representation of how the input can change.
So it is of semantically the same type as the input.

Coming off this, to make things nice we might want to add add and mul.
for NamedTuples
because it won't be type piracy. (since those are not + and *) so it is probably fine.
And at least in the case when they are the same NamedTuple clear.
At which point they are basically themselves Differentiables.

This also maxises our expressive power,
as logic things can be done in:

  • the top level rrule function (which can contol all lower levels via closing)
  • or in the the Rule that returns a named tuple of differnetiables
  • the individual Differnetiables of the NamedTuple can be Thunks themsevles.

Also we want to do the same for Tuples as for NamedTuple.
NamedTuples stand in for all structs,
Tuple stand in for tuples.

@ararslan
Copy link
Member Author

Also note that we might not actually need add and mul for NamedTuples if our goal is accumulation; we can extend accumulate for that.

@oxinabox oxinabox transferred this issue from JuliaDiff/ChainRules.jl Aug 2, 2019
@oxinabox
Copy link
Member

oxinabox commented Aug 27, 2019

@willtebbutt do you remember why we decided on a Rule returning a NamedTuple (of Differentials) rather than a NamedTuple of Rules?

That latter feels better to me now.

@willtebbutt
Copy link
Member

willtebbutt commented Aug 27, 2019

I think it was probably something to do with a named tuple of differentials being something that can reasonably participate immediately in whatever computations are required downstream, whereas a named tuple of rules would require further unwrapping etc. For example, if you have a heavily nested namedtuple (e.g. corresponding to the adjoint w.r.t. a struct inside a struct inside a struct etc in Zygote) you'll wind up having to repeat a two-step procedure to recursively evaluate everything as required as you work through the nested named tuple, rather than just having every already evaluated.

I think, in short, it's just simpler to have a named tuple of differentials than a named tuple of rules.

@oxinabox
Copy link
Member

The annoying case is in particular the fact that most function derviative W.R.T self
will be
Rule(i->NamedTuple()) rather than NamedTuple()

Which is byfar the most common case.
Even though it is an edge case.
And for things that do have fields (functors/closures)
then one would need

(a=Rule(i->x), b=Rule(i->x)

Also means all deferring of work now hass to be done via Thunk rather than Rule
but that is much less of a problem and is probably better.

I guess I will define const nil_structure_rule=Rule(i->NamedTuple())

@oxinabox
Copy link
Member

So in the post #30 world we do not have Rules anymore so everything we do in this space is more elegant. At least in terms of having resolves fully the question of Rule returning named tuple, vs named tuple of Rules.
It is now clearly pullback/pushforward returning a namedtuple.

Right now for svd, our only example of returning a NamedTuple,
it is quiet limitted:

Also note that we might not actually need add and mul for NamedTuples if our goal is accumulation; we can extend accumulate for that.

Ever since #16 we don't have add and mul, just + and *
so overloading those is not an option without terrible type-piracy.
So we would have to create out own-types,
Say DNamedTuple and DTuple
but they would only need to support the differential operations,
which seems reasonable really.
They would just apply them recursively.
(recursive extern was already bought up in #47 (comment) about Arrays)

To go with that, we should I think add getindex and getproperty overloads to all abstract differentials. getindex(::One, ::Any) = One() etc
so that if something was expecting a DNamedTuple but that got multiplied away by Zero()*DNamedTuple(...)=Zero() it can still have its fields looked up in accumulate.

Also we should have some macro or function to convert NamedTuples and Tuples to the Differiential type them.
Maybe we can reuse use the differential function for that.

@willtebbutt
Copy link
Member

Say DNamedTuple and DTuple but they would only need to support the differential operations, which seems reasonable really.

This seems reasonable to me. It is kind of sad that we don't have add and mul for precisely these kinds of situations though. If we start encountering lots of types that don't have + and * defined on them in precisely the manner that we need for ChainRules, we're going to have to start creating a lot of wrapper types. Or am I over-estimating the problem?

@oxinabox
Copy link
Member

I don't think we will encounter more (touch wood),
because addition and multiplication have a mathematical definition that we want to follow.

There is an interesting case around things like DateTime. If you say have a function of DateTime, like say the value of the pound stirling, the gradient (fromn the pullback) will be in terms of a Period like milliseconds. You can't add/subtract DateTimes, but you can add/subtract a Period from a DateTime.

@oxinabox
Copy link
Member

The DNamedTuple might want to take in some extra information about what type it was supposed to be (like what struct it is based on),
so that when Externing it, it can maybe get back into that struct?

@oxinabox
Copy link
Member

@willtebbutt and I have concluded That we only need 1 composite type.
and it is Composite and it does both NamedTuples and Tuples.
(for tuples it has numeric keys).

Also @willtebbutt is keen on the idea of it being parameterized byt the original object type,
e.g. SVD

@oxinabox
Copy link
Member

I am thinking
We have 3 kinds of objects*: (these do not correspond to julia types)
primal values, and their paired differential,
and scale-factors.
The core rule is that for
s a scale-factor
v a primal value of kind V
d a differential (of kind V')
Then v + sd must be a primal value in V
In the simple case:
scalar factors are scalars, subtyping Real,
primals and differentials are matrixes of the same size.
Zero() is both a scale factor and a differential for All Kinds of primals.
One() is a scalar factor and a differential for scalar primals only.
The main reason to care about One() when acting s a differential is for doing computation of partial derivatives via forward, or for the initial seed in reverse when you have gone down to a single value like a neural network loss.
So One is not like LinearAlgebra.I because when you multiply a matrix by One(), that matrix is a differential, d in core formula above,
and the core formula only has scalefactor multiplication with a differential.
So it is scalar in that case.
We don’t need to multiply two differentials by each other.
Core formula could be wrong though
Thunk is a differential that is equivelent to the differential being thunked.
(if you thunk something else then it is invalid)
Think of this like the deletation pattern, Wrapped{T<:AbstractT} <:AbstractT
*(plus and maybe differentials that escape their pairings like Wirtinger, but that hopefully will come home)

The interesting one in this framework is the Composite
which is the named-tuple like differential that matches to all stucts.
Because it is nontrival to make the addition of it to its primal (which could be any struct) work
because constructing primals its hard.
But we can do the thing that mostly works and fails when someone makes a primal that can’t be constructed,
and we can make it easy to overload that

@oxinabox
Copy link
Member

Turns out one of the cases we need this for is sum(::Tuple)
because the pullback will return a Tuple.
but you can't add Tuples,
So it should be returnings Composite

@oxinabox
Copy link
Member

I think we do want refine_differential(xs) to be a thing,
for when you are not sure on the type you are producing.
E.g. the naive implementation of sum already almost produces a Tuple or an Array
depending on the type of the input.
But actually if it is an Array then we can leave it as is,
but if it is a Tuple we should convert it the a Composite.

YingboMa added a commit that referenced this issue Jan 11, 2020
Master behavior
```julia
julia> @scalar_rule(one(x), Zero())

julia> frule(one, 1, Zero(), [1, 2])
(1, Zero())

julia> frule(one, 1, Zero(), One())
(1, Zero())
```

Desirable behavior
```julia
julia> @scalar_rule(one(x), Zero())

julia> frule(one, 1, Zero(), [1, 2])
(1, [0, 0])

julia> frule(one, 1, Zero(), One())
(1, Thunk(var"#8#10"())
)
```
YingboMa added a commit that referenced this issue Jan 12, 2020
* Eagerly evaluate scalers rules

Master behavior
```julia
julia> @scalar_rule(one(x), Zero())

julia> frule(one, 1, Zero(), [1, 2])
(1, Zero())

julia> frule(one, 1, Zero(), One())
(1, Zero())
```

Desirable behavior
```julia
julia> @scalar_rule(one(x), Zero())

julia> frule(one, 1, Zero(), [1, 2])
(1, [0, 0])

julia> frule(one, 1, Zero(), One())
(1, Thunk(var"#8#10"())
)
```

* New release

* Add tests

* Revert "Eagerly evaluate scalers rules"

This reverts commit dbe7765.

* Redefine * between ::Zero and ::Any

* Make it nicer

* Add tests and move zero(::AbstractDifferential) to the right folder
@nickrobinson251 nickrobinson251 added the Structural Tangent Related to the `Tangent` type for structured (composite) values label Jan 25, 2020
@oxinabox
Copy link
Member

closed by #59

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Structural Tangent Related to the `Tangent` type for structured (composite) values
Projects
None yet
Development

No branches or pull requests

4 participants