Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support InverseFunctions.jl and ChangesOfVariables.jl #199

Closed
2 tasks done
oschulz opened this issue Sep 26, 2021 · 65 comments · Fixed by #212
Closed
2 tasks done

Support InverseFunctions.jl and ChangesOfVariables.jl #199

oschulz opened this issue Sep 26, 2021 · 65 comments · Fixed by #212

Comments

@oschulz
Copy link
Collaborator

oschulz commented Sep 26, 2021

To do

  • Create lightweight packages that define an interface for inverse functions and log-abs-det-jacobians
  • Use InverseFunctions and ChangesOfVariables in Bijectors (and other packages ...)

Context

Bijectors is really neat, but it's also kind of a heavy package with many dependencies (necessarily).

I have some variable transformation stuff in BAT.jl that I want to split out into a separate package, and I've recently been experimenting with some normalizing flow trafos - I'd like to make that kind of stuff compatible with Bijectors, but it would be to heavy as a dependency.

Would you guys (CC @devmotion) be interested in setting up a lightweight variable-trafo-interface package? Maybe just something like

function with_logabsdet_jacobian end

function with_logabsdet_jacobian(f::Base.ComposedFunction)(x)
    y_inner, ladj_inner = with_logabsdet_jacobian(f.inner, x)
    y, ladj_outer = with_logabsdet_jacobian(f.outer, y_inner)
    (y, ladj_inner + ladj_outer)
end


function inverse end

inverse(trafo::Base.ComposedFunction) = Base.ComposedFunction(inverse(f.inner), inverse(f.outer))

inverse(f) = inv(f)  # Until JuliaLang/julia#42421 is decided one way of the other

with_logabsdet_jacobian would be the equivalent of Bijectors.forward, while f(x) would be used to just run the transform without calculating LADJ. So the behavior would be

mytrafo(x) == y

with_logabsdet_jacobian(mytrafo, x) == (y, ladj)

inverse(trafo2∘ trafo1)(x) == inverse(trafo1)(inverse(trafo2)(x))

No autodiff or anything, that we'd leave to implementations like Bijectors.Bijector.

Long-term, it would be nice to get rid of inverse in favor of inv, if we can get inv(::ComposedFunction) into Base (just opened an issue in that regard: JuliaLang/julia#42421).

I'd volunteer to prototype a package (would be quite tiny, basically just the code above) if there's interest.

Edit: Replaced special argument type WithLADJ in proposal with function with_logabsdet_jacobian, as suggested by @devmotion

@oschulz
Copy link
Collaborator Author

oschulz commented Sep 29, 2021

I think it would be neat if we had a such a common interface that would allow combining transformations from different packages in workflows that need LADJ and inverse calculation. Could be called "TransformationTraits.jl" or so. It would like ChainRulesCore in spirit, just for LADJ instead of the gradients, even more lightweight, and with the option to define an inverse.

@willtebbutt maybe such a package/interface would be interesting for ParameterHandling.jl as well?

@oschulz
Copy link
Collaborator Author

oschulz commented Sep 29, 2021

CC @oxinabox

@willtebbutt
Copy link
Collaborator

I'd be very pro- finding a way to make the indirect deps that Bijectors introduces in ParameterHandling much lighter. A number of people have commented on Bijectors being quite a heavy dependency.

I'm not totally sure that a BijectorsCore package would necessarily help ParameterHandling though, because ParameterHandling would continue to need to depend on Bijectors in order to get the actual bijections -- in this sense, ParameterHandling is analogous to an AD in the ChainRules world, rather than a package which defines a couple of rules.

@oschulz
Copy link
Collaborator Author

oschulz commented Sep 29, 2021

[...] introduces in ParameterHandling much lighter. A number of people have commented on Bijectors being quite a heavy dependency.

Yes, I recently looked at using ParameterHandling in a package, but the transitive deps would be a bit heavy (otherwise I like the concept of ParameterHandling very much!).

BijectorsCore package would necessarily help ParameterHandling though [...]

Ah right. Still, I hope the idea of a lightweight interface-package for transformations can find some support - after all, we've been very successfull with this approach in the ecosystem. Bikeshedding-wise I wouldn't call it "BijectorsCore", it could also cover non-bijective/invertible mappings (or transformations that are in bijective in principle, but for which the inverse is hard to calculate but also not required).

@willtebbutt
Copy link
Collaborator

Ah right. Still, I hope the idea of a lightweight interface-package for transformations can find some support

Yeah -- I'm totally in favour of this. It's the kind of thing that I might even use to define the defaults in ParameterHandling so as to avoid the bijectors dep.

@devmotion
Copy link
Member

Most dependencies are quite lightweight actually, the main exemptions are NonlinearSolve and Distributions. I made some improvements and type inference fixes in Roots recently which (this was the plan) would allow us to switch back from NonlinearSolve to the much more lightweight Roots package. Another issue is Requires which eg increases loading times if you actually use the optional dependencies.

I guess a core interface package could be useful even though I am not sure if it would have to depend on Distributions as well (if yes, it would maybe be another motivation for a lightweight DistributionsBase package).

@oschulz
Copy link
Collaborator Author

oschulz commented Sep 29, 2021

I am not sure if it would have to depend on Distributions as well

What I had in mind would literally just be the outline above, so really super-lightweight and no dependencies beyond Base.

@devmotion, would you support such a super-lightweight "TransformationTraits" (or similar name) package in Bijectors? We could host it at a central place like JuliaMath or JuliaStats. It would just mean adding something like

TransformationTraits.with_logabsdet_jacobian(b::Bijector, x) = forward(b, x)
TransformationTraits.inverse(b::Bijector) = inv(b)

or maybe even making forward just an alias for with_logabsdet_jacobian (for backward compatibility).

If "TransformationTraits" is lightweight enough, we should be able to convince packages like DiffEqFlux to support it as well, e.g. for the ffjord normalizing flows and similar.

@devmotion
Copy link
Member

I guess you should ask @torfjelde, he's the Bijectors god 😄

I wonder if one should not define a struct WithLADJ (seems a bit strange at first glance) but rather make a function with_logabsdet_jacobian(f, x) part of the interface that is supposed to return a tuple of f(x) and the log absolute determinant of the Jacobian. Generally, I am also a fan of more descriptive names as you can see 😄

@oschulz
Copy link
Collaborator Author

oschulz commented Sep 30, 2021

@torfjelde, sorry for not pulling you in earlier! Would this be Ok with you?

@oschulz
Copy link
Collaborator Author

oschulz commented Sep 30, 2021

I wonder if one should not define a struct WithLADJ [...] but rather make a function with_logabsdet_jacobian(f, x)

Yes, I was torn between these options myself. I liked having the WithLADJ() argument because then a function call will stay a function call. But then, we do use op_on_func(f, args...) for all autodiff stuff, and so on.

I think you're right, better use a function than a special argument. I've updated the code above accordingly.

@oschulz
Copy link
Collaborator Author

oschulz commented Sep 30, 2021

I think the whole package would literally just be

function with_logabsdet_jacobian end

function with_logabsdet_jacobian(f::Base.ComposedFunction)(x)
    y_inner, ladj_inner = with_logabsdet_jacobian(f.inner, x)
    y, ladj_outer = with_logabsdet_jacobian(f.outer, y_inner)
    (y, ladj_inner + ladj_outer)
end

function inverse end

inverse(trafo::Base.ComposedFunction) = Base.ComposedFunction(inverse(f.inner), inverse(f.outer))

inverse(f) = inv(f)  # Until JuliaLang/julia#42421 is decided one way or the other

Maybe we should add support for with_logabsdet_jacobian(Base.Fix1(broadcast, f), Xs...) to that, it would help keep code that uses broadcasted transformations short and readable.

With the with_logabsdet_jacobian suggested by @devmotion we can be even more generic and support transformations (a, b) = f(x, y, z) as ((a, b), ladj) = with_logabsdet_jacobian(f, x, y, z).

One interesting option might be to also define inverse for elementary unary functions commonly used in variable transformations like exp, log, etc. It would be a very finite list, so it wouldn't make the package too heavy. Moving this into Base with inv could become part of the proposal in JuliaLang/julia#42421 , if supported.

In any case, the change/additions to Bijectors.jl would just be

const forward = TransformationTraits.with_logabsdet_jacobian
TransformationTraits.inverse(b::Bijector) = inv(b)   # Until JuliaLang/julia#42421 is decided one way or the other

or similar.

@torfjelde
Copy link
Member

First off, sorry for the delay in response here! Been busy with a first year evaluation for my PhD, so have been very much off the grid for the past week.

I'm very much in favour of making a lightweight interface package:)
We had a discussion about whether Bijectors.jl was too heavy or not not long ago, and how it's unfortunate that we have direct dependencies on packages such as Distributions.jl (IMO the transformations should be separate from the interop with Distributions.jl).
So personally very happy to support something like this.

@oschulz
Copy link
Collaborator Author

oschulz commented Oct 4, 2021

Thanks @torfjelde (hope your Phd eval went smoothly!). Ok, then how about I prototype a package, based on the outline above?

I would include implementations of with_logabsdet_jacobian and inverse for log, log10, log2, log1p, exp, exp2, exp10, expm1, inv, adjoint, transpose, as log and exp are often used to transform between bounded and unbounded variables and inv, adjoint and transpose are the fundamental self-inverse operations (so their implementations will be trivial). Shouldn't increase the measurable load time of the package and can't be done outside of it without committing type-piracy.

Bikeshedding: @torfjelde , @devmotion and @willtebbutt : Do you have any preference regarding package name? I was thinking "TransformationTraits.jl", "TransformationIntercase.jl", "VarTrafoTraits.jl", "VarTrafoInterface.jl" or so.

@devmotion
Copy link
Member

Hmm maybe it would be good to use a name that makes it clear which transformations the package deals with (transformation of measures/probability distributions/densities)? I think TransformationTraits etc. is a bit too general. And I think I would prefer something like XXXInterface, XXXCore or XXXBase instead of XXXTraits.

@willtebbutt
Copy link
Collaborator

No strong preference on the name :)

@oschulz
Copy link
Collaborator Author

oschulz commented Oct 4, 2021

Hmm maybe it would be good to use a name that makes it clear which transformations the package deals with (transformation of measures/probability distributions/densities)

I don't think it has to be limited to transformations of measures/probability distributions/densities - IMHO the API above would cover variable transformations in general, I would also cover integration problems and other cases where one needs the local delta-volume of a trafo, but where the target function may not be limited to positive values or may not even be scalar.

And I think I would prefer something like XXXInterface, XXXCore or XXXBase instead of XXXTraits

How about "VarTrafoInterface.jl" then? Or "VariableTransformationInterface.jl" - but that's awefully long.

@devmotion
Copy link
Member

"Variable" is also not very specific I guess 😛 Maybe just LogAbsDetJacobian.jl if its main purpose is the definition of with_logabsdet_jacobian?

@oschulz
Copy link
Collaborator Author

oschulz commented Oct 4, 2021

Hm, "LogAbsDetJacobian.jl" or "LogAbsDetJacobians.jl"? It's a bit unwieldy, though.

However, if we focus completely on with_logabsdet_jacobian in that package, then I think we should create a second package InvertibleFunctions.jl that concentrates on inverse. What do you think?

@torfjelde
Copy link
Member

It will also provides inversion for a bunch functions though, so feel like LogAbsDetJacobian is too restrictive. I'm also in agreement that this extends beyond just interactions with distributions/measures.

I'm more so leaning towards TransformTraits.jl, but maybe that is also a bit too broad. There's always InvertibleTransformTraits.jl, but it's a bit of a mouthful + we'd like to have logabsdetjac work for some "not-quite-invertible" functions/only locally invertible, etc. 😕

In conclusion: TransformTraits or TransformInterface are the ones I like the most atm 👍

@torfjelde
Copy link
Member

However, if we focus completely on with_logabsdet_jacobian in that package, then I think we should create a second package InvertibleFunctions.jl that concentrates on inverse. What do you think?

Had the same thought, but seems a bit redundant given that the only things we care about right now is invertibility and logabsdetjac. IMO start with it under the same package, and then we can separate into different packages later if the scope expands?

@oschulz
Copy link
Collaborator Author

oschulz commented Oct 4, 2021

A few more suggestions: ChangesOfVariables, VolumeElements or TransformationVolumeElements

@oschulz
Copy link
Collaborator Author

oschulz commented Oct 4, 2021

create a second package InvertibleFunctions.jl that concentrates on inverse

Had the same thought, but seems a bit redundant given that the only things we care about right now is invertibility and logabsdetjac

Yes, I also think it might be better to do both in one package, it will be very lightweight already, and there may be some interdepencencies/links between with_logabsdet_jacobian and inverse implementations.

@oschulz
Copy link
Collaborator Author

oschulz commented Oct 4, 2021

In conclusion: TransformTraits or TransformInterface are the ones I like the most atm

I'm very happy with both of those - should it be "transform" or "transformation"? I did a bit of a literature search on this terminology and ended up confused. :-)

@devmotion
Copy link
Member

Ah yes, I forgot the definitions of the inverses. I still think that Transform or Transformation are very general names and not self explanatory. I like ChangeOfVariables.jl since the name already indicates the intention and area of the interface.

@oschulz
Copy link
Collaborator Author

oschulz commented Oct 4, 2021

ChangeOfVariables.jl

I'd be happy with that - @torfjelde, fine with you?

Oh - should we name it ChangesOfVariables, using the Julia "plural" convention, just in case we ever want a ChangeOfVariables type?

@oschulz
Copy link
Collaborator Author

oschulz commented Oct 6, 2021

Any objections to ChangesOfVariables? If not, I'll set up the package. :-)

@torfjelde
Copy link
Member

Oh sorry, I forgot to respond to this!

Personally, I'm not a huge fan of ChangeOfVariables 🤷 It's too specific IMO since the package will provide an interface that goes beyond this particular use-case (in particular inversion has way more use-cases).
I can also imagine us wanting to add more functionality than just with_logabsdet_jacobian in the future, quickly outgrowing the package name.

But I really just want a package with this interface:) So if the consensus is that ChangeOfVariables is superior, then I'm of course fine with that 👍

@oschulz
Copy link
Collaborator Author

oschulz commented Oct 7, 2021

in particular inversion has way more use-cases

Me too, I need this soon. :-)

Hm, not every change of variables is necessarily strictly bijective/invertible (e.g. when changing from a periodic infinite space to the "unit cell", like in solid-state physics), and not every invertible function has a log-abs-det-jacobian (e.g. invertible discrete functions).

Proposal: I create a package ChangesOfVariables now, and we see how far it'll carry us. Certainly a change of variable is a very central concept, and having a package that directly reflects the concept in it's name should always be useful. If we do see that use cases of inversion outgrow the package name at some point, we can still create a package InvertibleFunctions (or similar) later on and move the inversion stuff there.

@devmotion
Copy link
Member

I've seen other use cases of invertible functions/inv definitions for functions in completely different contexts without log absdet (eg JuliaGaussianProcesses/GPLikelihoods.jl#43 (comment)), so I think it could be useful to keep both things separate.

@oschulz
Copy link
Collaborator Author

oschulz commented Oct 8, 2021

Ok, then I'll make two packages: InverseFunctions and ChangesOfVariables?

Since it would be more narrow in focus then, we could name the second one VolumeElementRules instead of ChangesOfVariables, and rename with_logabsdet_jacobian to ladj_rule. This would be closer to ChainRules.frule, naming-wise. Or we simply name it VolumeElements (keeping with_logabsdet_jacobian as the function name).

@oschulz
Copy link
Collaborator Author

oschulz commented Nov 6, 2021

Ok, now that we have https://github.com/JuliaMath/InverseFunctions.jl and https://github.com/JuliaMath/ChangesOfVariables.jl, we just need to use them in Bijectors. :-)

@torfjelde
Copy link
Member

Lovely! Thank you for making these.

I've been AWOL for a couple of weeks now (sorry about that), but back now and fixing up Bijectors.jl is high on my priority list so will get to this very soon:)

@oschulz oschulz changed the title Lightweight interface-only package(s) for variable transformations Support InverseFunctions.jl and ChangesOfVariables.jl Dec 6, 2021
@oschulz
Copy link
Collaborator Author

oschulz commented Dec 6, 2021

@torfjelde , I could prepare a small non-breaking PR to add add initial support for the InverseFunctions.jl and ChangesOfVariables.jl API to Bijector as it is now (similar to what I did for TransformVariables in tpapp/TransformVariables.jl#85), until you more substantial changes.

@torfjelde
Copy link
Member

@torfjelde , I could prepare a small non-breaking PR to add add initial support for the InverseFunctions.jl and ChangesOfVariables.jl API to Bijector as it is now (similar to what I did for TransformVariables in tpapp/TransformVariables.jl#85), until you more substantial changes.

That would be awesome and greatly appreciated!:) Feel free to shoot if you have any questions.

I'm still somewhat unhappy with how we're going to deal with batches of inputs, i.e. how do I efficiently compute with_logabsdet_jacobian(f, batch) where f is some composition without creating a bunch of intermediate Vector{Tuple{T, Float64}} that I then need to convert into Vector{T} and Vector{Float64} before applying the next function in the composition.

@oschulz
Copy link
Collaborator Author

oschulz commented Dec 7, 2021

That would be awesome and greatly appreciated!:) Feel free to shoot if you have any questions.

Will do!

I'm still somewhat unhappy with how we're going to deal with batches of inputs [...]

Yes, I've been fighting with that myself lately - I've been playing with StructArrays but haven't been fully satisfied with the results, and then it has to be AD-compatible. I found a way to make ArraysOfArrays.jl fully Zygote-compatible now, will to this in the next days. But efficient handling of composed and then broadcasted univariate transformations (esp. combined with parameter optimization for normalizing flows) seems a tricky business.

@torfjelde
Copy link
Member

Yes, I've been fighting with that myself lately - I've been playing with StructArrays but haven't been fully satisfied with the results, and then it has to be AD-compatible. I found a way to make ArraysOfArrays.jl fully Zygote-compatible now, will to this in the next days. But efficient handling of composed and then broadcasted univariate transformations (esp. combined with parameter optimization for normalizing flows) seems a tricky business.

This is something discussed quite a bit before: #178.

Personally I'm leaning towards generalizing ColVecs and RowVecs from KernelFunctions.jl to a more general Batch. It's similar to the idea of ArraysOfArrays.jl. This way we can provide default implementations which just does map while simultaneously allowing each function to specialize on the underlying storage for higher performance, if the implementer so desires.

ArraysOfArrays.jl could potentially provide the same, but I think there are a couple of caveats:

  1. It could be more light-weight (and it uses Requires.jl).
  2. It's objective is providing an easy way to view N + M dimensional arrays as M dimensional, but for a Batch you're really just interested in converting anything into a 1-dimensional collection.

@oschulz
Copy link
Collaborator Author

oschulz commented Dec 7, 2021

ArraysOfArrays.jl could potentially provide the same, but I think there are a couple of caveats:

It could be more light-weight (and it uses Requires.jl).

I promised @cscherrer to get rid of that. :-)

but for a Batch you're really just interested in converting anything into a 1-dimensional collection.

Why I 1-dimensional one - wouldn't you lose information about which parts belong to which sample/event/...?

@torfjelde
Copy link
Member

I promised @cscherrer to get rid of that. :-)

Nice:)

Why I 1-dimensional one - wouldn't you lose information about which parts belong to which sample/event/...?

I'm a bit uncertain what you mean/maybe I confused you. I meant that in our case, we want any representation which allows us to do batch[i] and get back what corresponds to a single input. How this is represented under the hood, the user shouldn't have to worry about (but the implementer of the transformation might want to specialize on different underlying representations, e.g. if it's represented as a higher-dim array then one might broadcast, etc.).

@cscherrer
Copy link

Thanks @oschulz for the ping :)

If there's refactoring in the works, here's my wishlist:

  • Not too Distributions-specific (though a dependency is no problem)
  • No requirement that dimensionality is constant (impossible in many cases)
  • Support for one-sided inverses (injections / surjections)
  • LADJ optional (e.g. doesn't apply for discrete measures)
  • No hard-coded type parameters. Real prevents symbolics, Int prevents static sizes
  • Callable transforms - no transform required for mapping points
  • Flattening should be optional, or maybe an intermediate form

I've also been thinking of using StrideArrays for intermediate representations, since allocations can have a lot of overhead. This is still in the pondering stage, and I guess for a lightweight dependency is a bit much. But maybe there can be an argument giving the array type? Most important IMO is not to disallow it by design.

@oschulz
Copy link
Collaborator Author

oschulz commented Dec 7, 2021

If there's refactoring in the works, here's my wishlist

That I'll happily leave to the Bijectors team, I'll be busy getting VariateTransformation registration-ready (finally) :-) Initially I'll just do something like tpapp/TransformVariables.jl#85 so that code can use all kinds of transformations without depending on specific trafo packages directly.

@oschulz
Copy link
Collaborator Author

oschulz commented Dec 7, 2021

Why I 1-dimensional one - wouldn't you lose information about which parts belong to which sample/event/...?
I'm a bit uncertain what you mean/maybe I confused you. I meant that in our case, we want any representation which allows us to do batch[i] and get back what corresponds to a single input.

Ah, now I get it - yes, of course! So there, batch could be an ArraysOfArrays.VectorOfSimilarVectors{<:Real} or a plain Vector{Vector{<:Real}} and so on, right?

but the implementer of the transformation might want to specialize on different underlying representations, e.g. if it's represented as a higher-dim array then one might broadcast

Yes, that what I do in BAT.jl at the moment, it uses ArraysOfArrays extensively (ArraysOfArrays was partially designed exactly for the batch-of-samples use case).

@devmotion
Copy link
Member

So there, batch could be an ArraysOfArrays.VectorOfSimilarVectors{<:Real} or a plain Vector{Vector{<:Real}} and so on, right?

Or even more generally just an AbstractVector collection with arbitrary possibly non-Array elements - at least that's supported by the KernelFunctions API where ColVecs and RowVecs (optimizations for the vector of vector case with data as matrices) are used currently (I'm still looking forward to replacing them with EachCol and EachRow: JuliaLang/julia#32310).

The Bijectors refactoring is discussed in #178 and worked on in #183.

@torfjelde
Copy link
Member

Or even more generally just an AbstractVector collection with arbitrary possibly non-Array elements

Exactly. We want to support arbitrary inputs.

@cscherrer I'll make a separate issue from your comment since it's very useful feedback, but here we're talking a bit more specifically about adoption of InverseFunctions.jl and ChangeOfVariables.jl rather than a general rewrite (as is being worked on in #183 as mentioned by David).

@cscherrer
Copy link

Thanks @torfjelde. We've discussed #183 , but the relationship of this issue (199) to that wasn't really clear to me. New issue sounds good :)

@oschulz
Copy link
Collaborator Author

oschulz commented Dec 8, 2021

Or even more generally just an AbstractVector collection with arbitrary possibly non-Array elements
Exactly. We want to support arbitrary inputs.

ValueShapes has that - it allows you to view a flat matrix of real numbers as a vector of NamedTuple, for example (with an ArrayOfArrays in the middle).

@devmotion
Copy link
Member

This is just one particular example - another example would be e.g. computing the kernel matrix on a vector of graphs. If the API just expects that collections of inputs are provided as an AbstractVector, the elements of the AbstractVector can be anything and it's not required that the underlying storage is an array itself.

@oschulz
Copy link
Collaborator Author

oschulz commented Dec 10, 2021

the elements of the AbstractVector can be anything and it's not required that the underlying storage is an array itself.

Oh sure - I didn't mean it should be limited in any way!

@oschulz
Copy link
Collaborator Author

oschulz commented Dec 10, 2021

@oschulz: I could prepare a small non-breaking PR to add add initial support for the InverseFunctions.jl

Not so small after all, but here it is: #212

@oschulz
Copy link
Collaborator Author

oschulz commented Dec 20, 2021

@willtebbutt with ChangesOfVariables and InverseFunctions support in LogExpFunctions and Bijectors not, ParameterHandling may be able to drop the explicit dependency on Bijectors now.

@devmotion
Copy link
Member

I removed it already a while ago: JuliaGaussianProcesses/ParameterHandling.jl#42

@oschulz
Copy link
Collaborator Author

oschulz commented Dec 20, 2021

I removed it already a while ago: JuliaGaussianProcesses/ParameterHandling.jl#42

Oh right, sorry, should have checked first - I just had this stuck as a "to do after" in my mind. Great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants