Tackling batch-computations once and for all #178

torfjelde · 2021-04-24T18:12:27Z

torfjelde
Apr 24, 2021
Maintainer

Why do we care
Current implementation
New solution?
Issues
To discuss

19th of February, 2020 was a dark day in the history of Bijectors.jl. An absolute idiot decided it was a good idea to encode the "dimensionality" in the bijector type, i.e. Bijector{N}. Since then the code has been riddled with `N` everywhere and seemingly redundant implementations for different inputs, all of which could have avoided if it wasn't for this idiot.

This idiot was me. And now I'm back, still an idiot, but an idiot who has been burned one too many times.

Why do we care

In Bijectors.jl we have logabsdetjac for which we need to disambiguate whether a Matrix is a single matrix input, i.e. the bijector is acting on the space of matrices, in which case the return-value is a Real, or if it's a collection of vectors (reals), in which case the return-value is a Vector (Matrix).

More about this can be found in the original issue: #35.

Current implementation

The solution of the above issue was to include the dimensionality of the input-space in the bijector, i.e. Bijector became Bijector{N}, with 0 representing Real.

This works but it sucks, and leads to a lot of redundant code, e.g.

Bijectors.jl/src/bijectors/exp_log.jl

Lines 33 to 40 in 694db6f

    
           logabsdetjac(b::Exp{0}, x::Real) = x 
        
           logabsdetjac(b::Exp{0}, x::AbstractVector) = x 
        
           logabsdetjac(b::Exp{1}, x::AbstractVector) = sum(x) 
        
           logabsdetjac(b::Exp{1}, x::AbstractMatrix) = vec(sum(x; dims = 1)) 
        
           logabsdetjac(b::Exp{2}, x::AbstractMatrix) = sum(x) 
        
           logabsdetjac(b::Exp{2}, x::AbstractArray{<:AbstractMatrix}) = map(x) do x 
        
               logabsdetjac(b, x) 
        
           end

And it's really annoying to deal with this when implementing a new bijector.

Aaaaand there are sooo many examples of users mistakenly applying bijectors to the "wrong" dimensionality (also we need docs; but I'll address that separately to this issue).

New solution?

Make Bijector{N} into Bijector again! Easy.

And introduce a simple Batch struct:

# Have an `AbstractBatch` type so that we can allow
# different approaches, e.g. whether to use `eachcol` or `eachrow`
# for matrices.
abstract type AbstractBatch end

struct Batch{T} <: AbstractBatch
    value::T
end

value(x) = x
value(x::Batch) = x.value

# Convenient aliases
const ArrayBatch = Batch{<:AbstractArray{<:Real}};
const VectorBatch = Batch{<:AbstractVector{<:AbstractArray{<:Real}}};

x = Batch(randn(2, 3))

ArrayBatch{Matrix{Float64}}([-0.6857963268250232 0.742343120758312 -0.5613394311548385; -1.0382899071547282 0.14710486706804884 1.0637235393974716])

Batch-computation in Bijectors.jl then goes as follows, where b is a Bijector and xs is a "collection of inputs":

Wrap xs in Batch, i.e. batch = Batch(xs).
Implement default methods for Batch where we simply unwrap using value(batch), compute, and then wrap again Batch(out), e.g. the default method of logabsdetjac could be logabsdetjac(b::Bijector, xs::VectorBatch) = Batch(map(x -> logabsdetjac(b, x), value(xs))).

This is nice because it allows the following:

Define default implementations, e.g. for VectorBatch we have defaults using Batch(map(..., value(xs))) while for ArrayBatch we have defaults using mapslices ¹ assuming the last dimension is the batch-dimension².
Bijectors can themselves decide whether they want to make use of the default implementation, or if they want to act directly on the value(xs), e.g. Exp (as an elementwise transformation) can be applied to any array-size by using broadcasting and so it would make sense to implement this specifically as (b::Exp)(xs::ArrayBatch) = Batch(b(value(xs))) and (b::Exp)(x) = exp.(x) ³.
It's veeery easy for the user to know that if they want something represented as a batch, they simply do Batch :)
Compositions might change the batch-type!
- Suppose we have a b = Reshape(in_size, out_size) with length(in_size) == 1 and length(out_size) == 2. Then ys = b(ArrayBatch(xs)) where size(xs) == in_size will be a Batch now containing an array of (out_size..., num_batches): Batch{Matrix{T}} became Batch{Array{T, 3}}. This needs to be supported since we don't want to assume that the input-type is the same as the output-type.

So all in all this looks really nice!

Issues

Method ambiguity: implementing both (b::Exp)(xs::ArrayBatch) and (b::Exp)(x) won't work. Instead we'll have to do (b::Exp)(x::AbstractArray{<:Real}) to resolve the ambiguity. Not cool. Ways to possibly address:
1. Add private methods which should be overloaded when implementing a Bijector , e.g. _transform and _logabsdetjac. This way we can implement generic behavior in the "non-private" methods, e.g. logabsdetjac. This is a bit annoying for users when implementing their own bijectors, but it does simplify things significantly + allows further extensions that those mentioned here⁴
2. Force specification of inputs, e.g. (b::Exp)(x::AbstractArray{<:Real}). I don't like this. E.g. for NamedBijector we don't want to do this since the implementation really only relies on the input having getproperty with the corresponding symbols, i.e. if we don't specify the type we can even pass it a ComponentArray from ComponentArrays.jl and it will just work.
3. Overload map as in KernelFunctions.jl. This is better than (2), but not to my fancy. This can get a bit nasty in Bijectors.jl IMO, since we have to overload map(::typeof(logabsdetjac), b::Bijector, xs::Batch), which seems to break with what you'd expect from map.

To discuss

This approach is very similar to the ColVecs and RowVecs in KernelFunctions.jl (can be read about here: https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/blob/wct/docs-update/docs/src/api.md#why-abstractvectors-everywhere). A couple of differences:
1. Instead of calling transformations on batched inputs, they use map on batches.
2. ColVecs and RowVecs are subtypes of AbstractVector{T}. This is a good idea, as outlined in the docs linked above. It might be a good idea to also apply here, i.e. make AbstractBatch{T} <: AbstractVector{T} end where T represents the type of each of the elements in the batch. This gives us a natural interpretation of certain things you expect from a batch-type, e.g. length, indexing, iteration. It also means that if we implement iteration, we can potentally interact nicely with other packages that already use AbstractVector{T} to represent a batch.
Ideally we come up with something that can be shared across packages, e.g. Bijectors.jl and KernelFunctions.jl. This will be useful for several reasons:
1. One implementation to rule them all.
2. Packages can interact even when using "batch-computations" without being aware of each other, e.g. if both Bijectors.jl and KernelFunctions.jl use the same ColVecs, batches using both packages will just work.

Footnotes

¹ A couple of notes on this: mapslices isn't supported by CUDA.jl properly (JuliaGPU/CUDA.jl#807) so maybe mapslices isn't the best idea.

² Actually, it might be a better idea to make these batch-types separate, and allow specification of the dimensionality to iterate over, e.g. something like ArrayBatch(value, dim). It might also be a good idea to separate these for other reasons, i.e. to specify orientation like ColVecs and RowVecs in KernelFunctions.jl.

³ This particular example won't work because of method-ambiguity, but we'll get back to this later.

⁴ E.g. add checks for arguments, allow usage of @thunk from ChainRulesCore.jl so that we only have to define _forward(b::MyBijector, x) and then _transform(b::MyBijector, x) = _forward(b, x).rv + similarly for _logabsdetjac, with the default impl forward(b::MyBijector, x) = unthunked(_forward(b, x)). This idea was brought to my attention by @devmotion in a previous issue.

willtebbutt · 2021-04-24T23:04:05Z

willtebbutt
Apr 24, 2021
Collaborator

I like this a lot, because it's trying to deal with the same things that we're trying to solve in KernelFuntions.jl by insisting that everything subtype AbstractVector.

One implementation to rule them all.

I'm pro- this. Yay for pooling resources.

I was wondering: suppose that I have a collection of things of type T, and a good way (in terms of e.g peformance) to store them is in a Vector{T}. Does batch add anything in this scenario, or should I think of Batch primarily as a way to dealing with arbitrary-dimensional arrays?

32 replies

torfjelde May 30, 2021
Maintainer Author

I think the conversation has moved away from the type, to how it's used in our respective packages though (I don't know that it's my business to be saying what I think Bijectors.jl should be doing joy ). Unless I'm missing something obvious, it should be fine for you to implement logabsdetjac to have vectorised semantics, and for KernelFunctions to do otherwise.

Very true:) But it's def going to be beneficial if we all converge on a common pattern, so that the user doesn't have to change how they interact with batches across different packages.

I think I agree with this, particualrly if we're agreed that Batch should subtype AbstractVector, and its length should be the batch size.

I've started rewriting Bijectors.jl and I really, really like the sub-typing of AbstractVector! Realized how much convenient behaviour you get from doing so.

Also, just to address some points from @devmotion earlier:

But then you still have to know this convention first upside_down_face

I mean, the convention would be the same as broadcast though, right? So it's not a lot of information that needs to be conveyed. But I agree: easier to already know the convention of map.

And for some other functions you would prefer a different generalization that does not broadcast. So why not make it explicit and unambiguously use broadcast or map in such a case?

You mean that a function might sometimes want to return something that isn't a Batch, despite the input being a Batch? In contrast, when using map you'd sort of "force" the convention that the result would always be a collection?

Hmm, that is a good point. This would provide a distinction between when acting on a Batch results in a Batch and acting on a Batch results in something else, e.g. if you're computing the mean of a Batch.

As I'm doing this rewrite, I am coming to dislike having to do _transform, _logabsdetjac, etc. everywhere, which you sort of need to do if you want to avoid method ambiguities when you have "default" implementations, e.g. transform(::Bijector, ::Batch) and transform(::Exp, ::Any) results in ambiguity error, so I need to do transform(f, x) = _transform(f, x) and implement _transform(::Exp, ::Any). I worry that because of method ambiguity issues, doing what I've been a proponent of, you'd end up with a similar pattern in a lot of packages. It's not that bad, because you often might want to take this approach to be able to add optional argument-checks, etc. in the user-facing transform and then have implementers touch _transform, e.g. Distributions.jl does this, but I don't like almost forcing this pattern.

It still bothers me that the user sometimes needs to be unnecessarily explit with their implementation. Often it's possible to implement a single function that just works for both collections of inputs and single inputs, e.g. say f(x) = reconstruct(x, 2 .* value(x)) where reconstruct and value are identity for anything that's not a Batch, while if x isa Batch then reconstruct wraps the second argument in a Batch again and value returns the underlying storage of the Batch. In constrast, if we use map or broadcast the user has to check if the input is a Batch or not, then call map or not correspondingly.

TL;DR:

Disagreement: broadcast/map vs. not:
- broadcast/map:
  - (+) Makes the semantics clear: when acting on a Batch, this results in a Batch.
  - (+) Avoids needing "private" methods to disambiguate.
  - (-) map likely end up breaking a bit with expectations, e.g. map(::typeof(logabsdetjac), ::Bijector, ::Batch) results in different behavior than map(::typeof(logabsdetjac), b::Bijector, x::AbstractArray); the latter results in [logabsdetjac(b, x[1])].
    - Note: broadcast would have more "natural" semantics.
  - (-) A bit more complex for the a user that wants to implement their own batch-behavior, in particular if we use broadcast it can be quite scary for new users (which, as pointed out above, is maybe the most natural approach).
- not:
  - (+) Easier to write code agnostic to whether it's a Batch or not.
    - Note: this could also be done with broadcast.
  - (-) Not clear what f(::Batch) whould return, i.e. could sometimes be Batch, could sometimes not be Batch.
  - (-) Semantics has to be explained.
Agreement:
- A batch-type should subtype AbstractVector{T}.

I think I'm slowly getting convinced that the evaluation-approach that I've been arguing in favour of is maybe not the preferred approach, and that maybe broadcast is the way to go... It's likely that we can also hide the "complexity" of broadcasting from implementers, e.g. if you want to implement your own batch-behavior you just need to overload a particular method.

willtebbutt Jun 14, 2021
Collaborator

I've started rewriting Bijectors.jl and I really, really like the sub-typing of AbstractVector! Realized how much convenient behaviour you get from doing so.

This is really good to hear. Did you happen to note any specific things that it helped with?

(-) map likely end up breaking a bit with expectations, e.g. map(::typeof(logabsdetjac), ::Bijector, ::Batch) results in different behavior than map(::typeof(logabsdetjac), b::Bijector, x::AbstractArray); the latter results in [logabsdetjac(b, x[1])].

I think this point is giving me most pause at the minute.

A reasonable solution would be to ask users to write

map(Base.Fix1(logabsdetjac, my_bijector), batch)

(possibly I've not got the arguments in exactly the right order, but hopefully the gist is clear).

Don't know how much I like this though. Perhaps broadcast is cleaner?

oschulz Dec 12, 2021
Maintainer

I have the same dilemma in EuclidianNormalizingFlows.jl (currently just a playground to test out concepts). My current plan is to use ArraysOfArrays.VectorOfSimilarArrays and overload both map and broadcasted - I've done this for batch-transformation in ValueShapes.jl and BAT.jl so far (mostly broadcasted, but I should add map overloads for consistency), it has worked well so far with little dispatch trouble.

ToucheSir Mar 8, 2022

I'm very late to the party, but this Batch type could be very desirable for Flux as well. There are long-standing questions about how to implement an equivalent to vmap in other libraries. So if this work moves ahead, count me in 😄

oschulz Mar 8, 2022
Maintainer

Yes, I think this is a very fundamental issue in general.

oschulz · 2022-05-08T09:52:30Z

oschulz
May 8, 2022
Maintainer

I've outlined a proposal for an AbstractArraysOfArrays (or other name) API here: JuliaStats/StatsBase.jl#518 (comment)

@torfjelde and @ToucheSir I think such a common API for nested arrays would provide a good basis for efficient batched (and possibly GPU-compatible) transformations.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tackling batch-computations once and for all #178

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 32 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Tackling batch-computations once and for all #178

torfjelde Apr 24, 2021 Maintainer

Why do we care

Current implementation

New solution?

Issues

To discuss

Footnotes

Replies: 2 comments · 32 replies

willtebbutt Apr 24, 2021 Collaborator

torfjelde May 30, 2021 Maintainer Author

TL;DR:

willtebbutt Jun 14, 2021 Collaborator

oschulz Dec 12, 2021 Maintainer

ToucheSir Mar 8, 2022

oschulz Mar 8, 2022 Maintainer

oschulz May 8, 2022 Maintainer

torfjelde
Apr 24, 2021
Maintainer

Replies: 2 comments 32 replies

willtebbutt
Apr 24, 2021
Collaborator

torfjelde May 30, 2021
Maintainer Author

willtebbutt Jun 14, 2021
Collaborator

oschulz Dec 12, 2021
Maintainer

oschulz Mar 8, 2022
Maintainer

oschulz
May 8, 2022
Maintainer