Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New interface #27

Merged
merged 86 commits into from
Aug 29, 2019
Merged

New interface #27

merged 86 commits into from
Aug 29, 2019

Conversation

torfjelde
Copy link
Member

@torfjelde torfjelde commented Jul 8, 2019

EDIT: I missed the latest suggestion from @kai in issue #6: #6 (comment). Seems like better approach!
EDIT 2: I've now tried incorporating the suggested interface.

Overview

This is a proposal for a new interface related to discussion in issue #6. It introduces

  • Bijector and Inversed{<: Bijector} <: Bijector which represents an invertible transform and its inverse, respectively, with the following methods:
    • transform(b::Bijector, x) and transform(ib::Inversed{<: Bijector}, y): transforms from original variable space to transformed space, and vice versa.
    • inv(b::Bijector, y): simply returns the Inversed{<: Bijector}.
    • logabsdetjac(b::Bijector, x) and logabsdetjac(b::Inversed{<: Bijector}, y): computes the logabsdet of the Jacobian of the transformation and its inverse. Inverse defaults to -logabsdetjac(ib.orig, transform(ib, y))
    • [OPTIONAL] jacobian(b::Bijector, x): this is currently only used in default impl of logabsdetjac for ADBijector, thus only requires impl if logabsdetjac is not implemented (further than default impl).
  • Composed{<: Bijector} which allow composing bijectors
    • compose(bs...): returns the Composed object of the bijectors bs
    • ∘(b1::Bijector, b2::Bijector): results in a composition of the two bijectors (first b2, then b1)
  • UnivariateTransformed <: UnivariateDistribution{Continuous} and MultivariateTransformed <: MultivariateDistribution{Continuous} both which implement the correspoding Distributions.jl interfaces:
    • logpdf(td, y)
    • rand(td)
    • logpdf_with_jac(td, y): returns both logpdf(td, y) and logabsdetjac(td, y), saving repetitive computation in cases where both these quantities are wanted. I believe this is often enough to warrent such a method.
  • transformed(d::Distribution, b::Bijector): returns a transformed distribution using b as the bijector.
    • Default transformed(d::Distribution) = transformed(d, DistributionBijector(d)) is provided, where DistributionBijector is a proxy for the link and invlink already in Bijectors.jl (logabsdetjac is provided by AD)

For the user to implement their own UserBijector, the easiest approach is then to make UserBijector{AD} <: ADBijector{AD} and then they only need to implement transform(b::UserBijector, x) and transform(ib::Inversed{UserBijector}, y).

Examples

Composition

d = Beta()
b = DistributionBijector(d)
b⁻¹ = inv(b)

id = Identity()

x = rand(d)
(b  b⁻¹)(x)  id(x) # => `true`

ADVI

# Usage in ADVI
d = InverseGamma()
b = DistributionBijector(d)    # (0, ∞) → ℝ
ib = inv(b)                    # ℝ → (0, ∞)
td = transformed(Normal(), ib) # x ∼ 𝓝(0, 1) then f(x) ∈ (0, ∞)
x = rand(td)                   # ∈ (0, ∞)
0 < x < Inf                    # => true ✓

logpdf(td, x) != logpdf(d, x)  # => true ✓

Finite-set Stein Discrepancy (FSSD) test

This is a fun little (and accidental!) "outside-of-Turing"-application of this interface.

I have a package called KernelGoodnessOfFit which performs a goodness-of-fit test for a given model distribution q <: Distribution. The issue with the FSSD test is that it assumes the distribution to have support on the entire real line, and thus does not work for distributions such as Beta. With this interface for Bijectors we can simply plug in the UnivariateTransformed{Beta} and the test works! (in the sense that the false-negative/rejection ratio is less than the specified size of the test)

I know this is a frequentist application; please don't shoot me!

using Bijectors
using KernelGoodnessOfFit

using Random; Random.seed!(123)
using ForwardDiff
function Distributions.gradlogpdf(d::ContinuousUnivariateDistribution, x::Real)
    ForwardDiff.derivative(z -> logpdf(d, z), x)
end

p = Beta(2, 3)         # true distribution
td = transformed(p)    # transformed
q = Beta()             # model distribution

α = 0.05               # size of the test
n = 100                # number of samples
num_experiments = 100  # number of experiments to run

# Under H₀: p = q
total = 0.0            # rejection ratio
for i = 1:num_experiments
    xs = reshape(rand(p, n), 1, :)
    ys = td.transform.(xs)

    res = FSSDopt(ys, td; β_H₁ = 0.01)

    total += res.p_val  α
end
total / n              # => 0.03 ≤ 0.05 (✓)

# Under H₁: p ≠ q
total = 0.0            # rejection ratio
for i = 1:num_experiments
    xs = reshape(rand(q, n), 1, :)
    ys = td.transform.(xs)

    res = FSSDopt(ys, td; β_H₁ = 0.01)

    total += res.p_val  α
end
total / n              # => 0.83 "close" to 1.0 (✓)

It's of course not directly related to what we do in Turing, but it shows a very neat application that immediately follows from this interface.

Normalizing flows (thanks to @sharanry!)

d = MvNormal(zeros(10), ones(10))
b = PlanarLayer(10)

flow = transformed(d, b) # <= TransformedDistribution

y = rand(flow)           # samples from `MvNormal` and transforms using `b`
logpdf(flow, y)          # computes the logpdf using the inverse of `b`

# more efficiently (in certain cases), we can compute logpdf in forward pass
x = rand(d)
logpdf_forward(flow, x)

# Want to fit the flow? No problem!
using Tracker
b = PlanarLayer(10, param)                  # construct parameters using `param`
flow = transformed(d, b)
rand(flow)                                  # <= TrackedArray

x = rand(flow.dist)
Tracker.back!(logpdf_forward(flow, x), 1.0) # backprob through flow

Currently we have PlanarFlow and RadialFlow implemented, but implementing other flows is fairly straightforward.

Issues / Notes

  • Composition in mathematics is read right-to-left, but in the current implementation transform(Composed{<: Bijector}, x) is implemented by applying transforms from left-to-right; should this be changed?
  • Ideally, we'd implement the logabsdetjac "manually" rather than using AD for these known transformations.
  • Currently depend on Turing for the different backend-types; probably don't want to do this. Need to decide how to handle this.
  • Type-stability; difficult with current Composed type.
  • Should we use logdetinvjac or logdetjac? They're equivalent up to minus sign, but inside the transformed distributions we'll mostly need the logdetinvjac (though could always do logdetjac(td.transform, transform(td.transform, y))
    • Default: logabsdetjac(ib::Inversed{<: Bijector}, y) = -logabsdetjac(ib.orig, transform(ib, y))
    • Can then just override logabsdetjac(ib::SpecificBijector, y) to in cases and corresponding for Inversed{<:SpecificBijector} to get both.
  • The interface is potentially a bit verbose at the moment, can do:
    • Rename fields of the transformed distributions to d::Distribution and b::Bijector
    • transform(d::Distribution, y) = transform(d.transform, y) and similarily for inverse

@xukai92
Copy link
Member

xukai92 commented Jul 10, 2019

Some update of #6 (comment):

I think we should add a transform that returns the transformed r.v. only and forward returns both the transformed r.v and logdetabsjacob. Why? Because in cases we only need "sample" so the transformed r.v. is necessay while computing the logdetabsjacob would be a waste of time.

@xukai92
Copy link
Member

xukai92 commented Jul 10, 2019

I will give a try if things haven't been moved here.

@torfjelde
Copy link
Member Author

@xukai92 Agreed!

@torfjelde torfjelde mentioned this pull request Jul 11, 2019
@willtebbutt
Copy link
Collaborator

I'm slightly itchy about using a vector to implement the composition operation. As you point out @torfjelde it will probably create some type instability issues. Could we instead use a Tuple so that we can guarantee type stability? Or do we imagine that we're going to need to compose a large number of heterogeneous transformations?

@torfjelde
Copy link
Member Author

torfjelde commented Jul 16, 2019

@willtebbutt After messing about with trying to get type-stability for VI, I think making it a Tuple is a good idea. If we also make compose(ts...) = Composed(ts...) which @xukai92 had in his original proposal, the constructor of composed also becomes type-stable. Granted, this means we lose the "optimization" of turning b ∘ b⁻¹ = Identity(). But, a quick test, using compose(ts...) = Composed(ts...):

@btime b1 ∘ b2
# 1.7 μs for Array
# 600 ns for Tuple with same constructor as Array
# 25 ns for `Compose(ts...)` with Tuple

So it seems worth it to leave the "optimization" up to the user.

EDIT: though it's going to need some work/thought to "unwrap" compositions of Composed to get type-stability.

@willtebbutt
Copy link
Collaborator

@torfjelde could you explain the difference between your tuple implementation and splatted implementation please? I had imagined something like

struct Composed{Tts<:Tuple} <: Bijector
    ts::Tts
end

so the type of each of the things being composed should be known. Alternatively, we could just not type the stuff that's contained e.g.

struct Composed{Tts} <: Bijector
    ts::Tts
end

and leave it up to the user to use their preferred iterable data structure i.e. they could choose to use either a Vector or some Tuple

@xukai92
Copy link
Member

xukai92 commented Jul 16, 2019

Using Tuple sounds like a very good idea for me.

I also thinked about this but choosed to use vector only because I want a easy way to do push! to a composed transformation + for deep learning the computational bottleneck was on GPUs. But certainly we need this for Turing.jl.

@willtebbutt
Copy link
Collaborator

I also thinked about this but choosed to use vector only because I want a easy way to do push! to a composed transformation

Fair enough @xukai92. How critical is the push! interface? Does the composition interface suffice in your opinion?

@torfjelde
Copy link
Member Author

@torfjelde could you explain the difference between your tuple implementation and splatted implementation please?

@willtebbutt The difference is in the constructor. For the tuple-impl we have to do:

compose(ts...) = Composed(ts)  # <= type-stable

for it to be type-stable. If we drop type-stability from the constructor we can do

function compose(ts...) # <= NOT type-stable
    res = []
    
    for b ∈ ts
        if b isa Composed
            # "lift" the transformations
            for b_ ∈ b.ts
                push!(res, b_)
            end
        else
            if (length(res) > 0) && (res[end] == inv(b))
                # remove if inverse
                pop!(res)
            else
                push!(res, b)
            end
        end
    end

    length(res) == 0 ? IdentityBijector : Composed([res...])
end

which will

  1. "Un-wrap" composed Compositions so that we don't get nested compositions.
  2. Drop sequential application of bijector and it's inverse, i.e. b ∘ inv(b) == Identity()
    If we want type-stability in the constructor, I don't see how we can implement the above. I think 1. is quite preferable to have, while 2. can always just be left to the user.

We can of course drop type-stability of constructor but keep the Tuple implementation. We'd still benefit from type-stability, but less so. This is sort of similar to what we're currently doing for TypedVarInfo: not type-stable constructor, but type-stable operations. I'm just thinking we might be composing functions more often that creating instances of VarInfo, thus having a type-stable constructor in this case would be more preferable.

Alternatively, we could just not type the stuff that's contained

Agree; that's what I did locally to try it out.

@xukai92
Copy link
Member

xukai92 commented Jul 17, 2019

I also thinked about this but choosed to use vector only because I want a easy way to do push! to a composed transformation

Fair enough @xukai92. How critical is the push! interface? Does the composition interface suffice in your opinion?

It would work but just a little bit trickier as I was playing with an idea on adding or removing the transformations during training. I could build a new model each time I need to change, or I can push or pop.

@willtebbutt The difference is in the constructor. For the tuple-impl we have to do:

As I said in Slakc I feel this is a little bit over-optimisation. If it doens't give much performance gain we'd better not to do this automatic change which is not expected by user inituitivly.

@xukai92
Copy link
Member

xukai92 commented Jul 17, 2019

Or maybe add a flag called to trigger this behaviour.

@torfjelde
Copy link
Member Author

It would work but just a little bit trickier as I was playing with an idea on adding or removing the transformations during training. I could build a new model each time I need to change, or I can push or pop.

One thing to note about this though, depending on how often you add/remove, is that with the type-stable Bijector this would lead to possibly noticiable compilation overhead. But if we do the "generic" container approach, the user can then just choose to use Array instead.

@willtebbutt
Copy link
Collaborator

As I said in Slakc I feel this is a little bit over-optimisation. If it doens't give much performance gain we'd better not to do this automatic change which is not expected by user inituitivly.

The performance gain really depends on the context: as you say if we're working in the DL setting, then this optimisation is unnecessary. However, if we're expecting to use these in other settings then it's probably not unreasonable to ask for type stability. The other option is, again, to provide both interfaces and just expect the push! interface to work when a Vector of bijectors is provided, and code to be type-stable when a Tuple of bijectors (or some other type-stable container) is provided.

Another alternative is to have two different types, which have different performance characteristics and provide different behaviour. I don't have any good suggestions for names though.

@torfjelde : thanks for the explanation of your implementations

"Un-wrap" composed Compositions so that we don't get nested compositions.

I agree that this is generally tricky to do in a type-stable way, but it's not clear to me why we mind if the user creates nested compositions. Is there something obvious that I'm missing?

Drop sequential application of bijector and it's inverse, i.e. b ∘ inv(b) == Identity()

I agree with you that this is a nice-to-have rather than something that's especially important.

@xukai92
Copy link
Member

xukai92 commented Jul 17, 2019

The performance gain really depends on the context: as you say if we're working in the DL setting, then this optimisation is unnecessary. However, if we're expecting to use these in other settings then it's probably not unreasonable to ask for type stability. The other option is, again, to provide both interfaces and just expect the push! interface to work when a Vector of bijectors is provided, and code to be type-stable when a Tuple of bijectors (or some other type-stable container) is provided.

I'm talking about optimizations for the type stable implementations, i.e. the un-wrapping and getting rid of inverse operations. I completely agree that we need type stable version.

@xukai92
Copy link
Member

xukai92 commented Aug 27, 2019

Composition in mathematics is read right-to-left, but in the current implementation transform(Composed{<: Bijector}, x) is implemented by applying transforms from left-to-right; should this be changed?

Is this still true?

@torfjelde
Copy link
Member Author

Composition in mathematics is read right-to-left, but in the current implementation transform(Composed{<: Bijector}, x) is implemented by applying transforms from left-to-right; should this be changed?

Is this still true?

Yes; and it has already caused some confusion I believe. At least, I'm very much for making compose(ts...) = Composed(reversed(ts)). The user should really never have to write Composed((b1, b2, b3)) anymore, so I think changing the order of Composed.ts isn't as crucial (and will require a bit more work)

@xukai92
Copy link
Member

xukai92 commented Aug 27, 2019

I'm pro to stick to the mathematical convention. I think the current implementation already does so right?

@torfjelde
Copy link
Member Author

I'm pro to stick to the mathematical convention. I think the current implementation already does so right?

Internally, i.e. Composed.ts, we represent in a left-to-right manner. So Composed.ts[1] is applied, then Composed.ts[2], and so on. This is opposite of mathematical convention, but it's quite convenient for a recursive implementation (though not necessary). But is implemented so that b1 ∘ b2 == Composed((b2, b1)), which results in b1(b2(x)), which is what you'd expect from .

Currently, compose(ts...) = Composed(ts) but maybe this should be compose(ts...) = Composed(reversed(ts)) OR we re-implement the transform of Composed to apply Composed.ts[end] first, and so on. My initial thought is that this will require an addition reverse(cb.ts) in the implementation which is not as performant, right?

@xukai92
Copy link
Member

xukai92 commented Aug 27, 2019

I feel it's better to make and compose consistent somehow. compose(ts...) = Composed(reversed(ts)) sounds good but on the user side it would make confusion IMO. What if I just want a transfom do 1, 2 and 3? Users have to construct things by reversing them first.

Maybe what we can do is to drop compose but implment composel(ts) and composer(ts) which are simply equivalences for foldl(∘, ts) and foldr(∘, ts). In this case things are consistent and clear, indicated by l and r. And users have the choice to use whichever the one is convenient.

@torfjelde
Copy link
Member Author

I feel it's better to make and compose consistent somehow. compose(ts...) = Composed(reversed(ts)) sounds good but one the user side it would make confusion IMO. What if I just want a transfom do 1, 2 and 3? Users have to construct things by reversing them first.

Maybe what we can do is to drop compose but implment composel(ts) and composer(ts) which are simply equivalences for foldl(∘, ts) and foldr(∘, ts). In this case things are consistent and clear, indicated by l and r. And users have the choice to use whichever the one is convenient.

That's not a bad idea 👍 Only thing is though, composel(ts...) = foldl(∘, ts) actually leads to right-to-left application because reverses the order, right? I find this a bit confusing, as I'd expect the composel to give me application from left-to-right. I'm also not certain if the fact that the user has to reverse 1, 2, and 3 is much of a problem. If you're already putting them in a list or something, doing it in the reverse order isn't much of an issue. But I'd be fine with doing what you suggested, though maybe composel(ts...) = foldr(∘, ts) instead?

There's another thing I've been thinking about; you can do multiple dispatch on something like Composed{B1, B2, B3} where B1, B2, and B3 are specific types. If all the "standard" ways of constructing a Composed are such that we get nested compositions, it becomes difficult to do something like what I just mentioned. Because of this, I would like to have at least one way to construct a composition without nesting.

@xukai92
Copy link
Member

xukai92 commented Aug 27, 2019

I guess the way to link composel and composer to foldr and foldr introduces extra confusion. Let's forget about foldr and foldr and simply say composel means applying things from the left and composer means applying things from the right. How does this sound?

If all the "standard" ways of constructing a Composed are such that we get nested compositions, it becomes difficult to do something like what I just mentioned.

I didn't mean to get nested compositions via composel and composer. Simply composel(ts...) = Composed(ts) and composer(ts...) = Composed(reverse(ts)).

@torfjelde
Copy link
Member Author

Ah, follow 👍 Yeah that sounds good!

struct Log <: Bijector end

(b::Log)(x) = @. log(x)
(b::Exp)(y) = @. exp(y)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We often need to apply this to just one of the dimensions. Any ideas on how to go about this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me and @xukai92 were discussing this earlier, and we feel like batch computation is something we'll have to have a proper discussion about after this PR is merged. For now assume batch-computation is not supported :/ The difficulty is that in a lot of cases it's ambiguous whether or not something is a batch. And we want to make sure that if we say we support batch-computation, then it's supported for all bijectors, not just on a case-by-case basis.

I know you use it a lot though, so I'd suggest making a different branch where you try to accommodate batch-computation and then we can potentially use this when we're considering the approach we're going to take to support it:)

Discussion: #27 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sharanry @torfjelde Let's experiment this feature in a different branch. I'm going to merge this PR soon.

@xukai92 xukai92 changed the title [WIP] New interface New interface Aug 29, 2019
@xukai92
Copy link
Member

xukai92 commented Aug 29, 2019

We also need to increment the version number in Project.toml. Since these are major interface introduction, shall we increment to 0.4.0 @yebai ?

@yebai
Copy link
Member

yebai commented Aug 29, 2019

We also need to increment the version number in Project.toml. Since these are major interface introduction, shall we increment to 0.4.0 @yebai ?

Sure, 0.4 sounds good to me.

@xukai92
Copy link
Member

xukai92 commented Aug 29, 2019

@torfjelde Can you increment the version please?

@torfjelde
Copy link
Member Author

@torfjelde Can you increment the version please?

Done!

@yebai yebai merged commit 6ea2c37 into TuringLang:master Aug 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants