Basic rewrite of the package #25

theogf · 2021-02-12T14:16:01Z

Hey, following #24 I am making a first attempt to make large changes (I don't believe this could be done by small changes and I don't think anyone is using this package atm anyway).

Here are the current main changes :

Introduction of a collection of variational distributions with specific parametrizations. This make it possible to change the variational parameters in-place and avoid passing a vector.
~~The problem of having to deal with a vector of variational parameters is gone since by using the reparametrization trick, we take the gradients of the samples and not of the parameters.~~ ~~I implement both approach one with a vector of parameters and one without~~
samples (and parametrized samples) are stored in so there is no additional allocation
optimize! is now a loop over step! which has to be defined for each algorithm
Instead of grad!, there is now gradlogπ! and eventually gradentropy. The first is computed via sampling, the second mostly has closed-form solutions but a generic fall back with samples can be made.
~~I entirely removed the VariationalObjective object which right now objectively does not bring anything (I guess if we start to consider more divergences we could start to add such objects)~~ I left it and set the default behavior to use ELBO
I removed Tracker to put it as a Requires based package (does anyone still uses Tracker?)
[NEW] Similarly to AbstractMCMC, step! now takes a state as an argument which is initialized via init for each algorithm. This allows to use the right preallocations for each method!
[NEW] I added the basic ugly version of BBVI
[NEW] Started to adapt the framework to be compatible with Optimisers.jl (avoiding a dependency on Flux for optimisers)
More things to come

yebai · 2021-02-14T21:18:32Z

thanks @theogf, it looks like a great PR in progress. Just a clarification question, does DSVI = ADVI?

theogf · 2021-02-15T11:46:31Z

thanks @theogf, it looks like a great PR in progress. Just a clarification question, does DSVI = ADVI?

That's a good question, after checking the ADVI paper, ADVI is DSVI with bijectors (which is answering the question I had @torfjelde). However the previous implementation of ADVI was not reflecting what was done in the paper.

Reference: Automatic Differentiation Variational Inference https://arxiv.org/abs/1603.00788

theogf · 2021-02-18T17:12:33Z

Here is a larger discussion on having q(theta) and/or q.

So I implemented ADVI and a basic version of BBVI. Interestingly, the first one relies on q and the second one on q(theta).
This leads me to think that the choice q(theta)/q literally depends of the algorithm used and should be treated as such.

The approach I took now for BBVI is to use the amazing state approach. Since we work with our own distributions (but this is not a restriction), I defined a to_vec(q) and to_dist(q, theta) to jump between the two representations. This way we create an initial theta in init that we update, and finally return to_dist(q, theta) at the end of the run.

So basically I would argue to only leave the q approach and to eventually deal with the q(theta) internally.

coveralls · 2021-02-18T17:37:11Z

Pull Request Test Coverage Report for Build 589667488

100 of 153 (65.36%) changed or added relevant lines in 14 files are covered.
20 unchanged lines in 1 file lost coverage.
Overall coverage decreased (-5.8%) to 53.695%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/interface.jl	19	21	90.48%
src/ad.jl	1	4	25.0%
src/distributions/cholmvnormal.jl	7	10	70.0%
src/algorithms/advi.jl	23	27	85.19%
src/compat/zygote.jl	0	6	0.0%
src/distributions/diagmvnormal.jl	8	14	57.14%
src/compat/reversediff.jl	0	7	0.0%
src/compat/tracker.jl	0	10	0.0%
src/distributions/distributions.jl	7	19	36.84%

Files with Coverage Reduction	New Missed Lines	%
src/optimisers.jl	20	0%

Totals
Change from base Build 387007800:	-5.8%
Covered Lines:	109
Relevant Lines:	203

💛 - Coveralls

theogf · 2021-02-18T17:37:39Z

Since (most) tests are passing I am going to make it ready for review while I work on the tests and remove some unnecessary bits.

One part I am quite unhappy about is the way gradients are computed. Every algorithm require a different approach and I could not find a global approach. This means a huge load of code copy/paste to make it work with the different backends...
I would be happy to hear some suggestions

theogf · 2021-02-22T16:47:41Z

I solved the gradient issue with going back to grad! as a very generic inplace gradient computing function.
Tests are sometimes not passing because of their randomness, I don't know if there is a better approach here.

src/algorithms/bbvi.jl

torfjelde

Good stuff! There's quite a bit that needs addressing though.

I've left some comments on what for sure needs changing. I still need to look through again and think about how we can get to a complete solution here.

Some more general comments:

I would really, really like to not reimplement a bunch of distributions in this package. Most of these very particular distributions could probably go into DistributionsAD.jl if we need them.
There's a big focus on mutating operations here. It seems a bit unnecessary given that it's a matter of decreasing memory usage by a constant factor of 2, no? Or am I missing something here? Also, you've mentioned Optimisers.jl; isn't that moving towards non-mutation states for optimisers?

But yeah, after you've responded to my comments, I'll have to go through again 👍

src/ad.jl

src/algorithms/advi.jl

torfjelde · 2021-04-09T12:45:54Z

src/algorithms/advi.jl

+    update_mean!(q, vec(mean(Δ, dims = 2)), opt)
+    update_cov!(alg, q, Δ, state, opt)


Too general IMO. ADVI can be used with non-normal distributions as the underlying, in which case there is no such thing as cov.

Well actually not, and that's where I am trying to be maybe more precise. The ADVI approach I implemented is following the ADVI paper which namely states :

So the underlying distribution is always a Gaussian.
I will add a reference to the paper in the docs

If we want to train something different it will have to be a different algorithm.

I'm fully aware that the original paper only had this in mind, but it's unecessarily restricted and is mainly just a consequence of the restricted tools they had available at the time.

It's called "Automatic Differential Variational Inference" which really refers to the fact that we use AD + reparameterization trick.

IMO it just comes down to: what do it cost us to allow any reparameterization with any valid base distribution, and just let Gaussian be the default? That just seems overall superior, no?

Well in theory yes that sounds amazing. The problem comes from how do you formulate the reparametrization trick for non-Gaussian. It's not always possible right?

Normalizing flows; affine transformation is just an instance:)

IMO, we don't even need all these implementations of differnt Gaussians. We just make it different affine transformations, e.g. introduce a Affine <: Bijector and define different evaluations depending on whether the scaling-matrix is Cholesky, etc.

The above is the reasoning behind the current impl btw. Though I 100% agree we should make it more convenient to instantiate the different approaches.

torfjelde · 2021-04-09T12:48:50Z

src/algorithms/advi.jl

+nsamples(alg::ADVI) = alg.samples_per_step
+niters(alg::ADVI) = alg.max_iters
+
+function compats(::ADVI)


Is this specifying which distriubitons it's compatible with?
IMO this shouldn't be here; it will be difficult to keep track off + too restrictive (e.g. ADVI works for any distribution for which we can use the reparam trick, e.g. any TransformedDistribution) + it will be incorrect as soon as someone decides to extend functionality in a different package/user code. Instead we should let methods fail according to missing impls.

See the previous point.

src/algorithms/advi.jl

torfjelde · 2021-04-09T13:07:38Z

src/interface.jl

+## Verify that the algorithm can work with the corresponding variational distribution
+function check_compatibility(alg, q)
+    if !compat(alg, q)
+        throw(ArgumentError("Algorithm $(alg) cannot work with distributions of type $(typeof(q)), compatible distributions are: $(compats(alg))"))
+    end
+end
+
+function compat(alg::VariationalInference, q)
+    return q isa compats(alg)
+end
+
+function compats(::Any)
+    return ()
+end 


Suggested change

## Verify that the algorithm can work with the corresponding variational distribution

function check_compatibility(alg, q)

if !compat(alg, q)

throw(ArgumentError("Algorithm $(alg) cannot work with distributions of type $(typeof(q)), compatible distributions are: $(compats(alg))"))

end

end

function compat(alg::VariationalInference, q)

return q isa compats(alg)

end

function compats(::Any)

return ()

end

I think my solution is awesome :D

torfjelde · 2021-04-09T13:09:43Z

test/runtests.jl

+    end
+    include("gradients.jl")
+    include("interface.jl")
+    # include("optimisers.jl") # Relying on Tracker...


Can we not add Tracker to test/Project.toml?

I think this is currently outdated, I need to have another look

Yeah the problem is that it only works with Tracker and I had deleted it for a while

Am confused. Isn't using ForwardDiff? o.O

See here :

AdvancedVI.jl/src/optimisers.jl

Line 42 in 808acbe

)::Array{typeof(Tracker.data(Δ)), 1}

src/utils.jl

torfjelde · 2021-04-09T15:12:05Z

src/distributions/distributions.jl

+abstract type AbstractPosteriorMvNormal{T} <:
+              Distributions.ContinuousMultivariateDistribution end


Why is this needed vs. AbstractMvNormal or whatever it's called?

I was not aware of AbstractMvNormal until recently. This is still practical to have AbstractPosteriorMvNormal to define stuff like mean(d::AbstractPosteriorMvNormal) = d.\mu

torfjelde · 2021-04-09T15:12:43Z

src/distributions/cholmvnormal.jl

+function to_vec(q::CholMvNormal)
+    vcat(q.μ, vec(q.Γ))
+end


Can we not make use of something similar to Flux.params here since we already depend on Functors.jl?

Well you need to tell that to ForwardDiff.jl, ReverseDiff.jl and co :D

There are no ties between Flux.params and the AD-framework used anymore:)
It's using Functors.jl to define what's trainable and what isn't. So Flux.params should just work for the above distributions. Then you just vec and vcat.

Ah sorry I misunderstood you, I thought you were complaining about the vcat.

…edVI.jl into rework_advi

trappmartin · 2021-09-17T12:20:59Z

Is there a timeframe for this PR?

theogf · 2021-09-17T12:26:18Z

Is there a timeframe for this PR?

Not really, I think the problem is that it tries to put too many things at the same time.
Also there are some things where we are not entirely sure on how to proceed.

@torfjelde What do you think about having the variational distributions defined as bijected distributions in another PR?

ParadaCarleton · 2021-10-26T23:46:35Z

Is there a timeframe for this PR?

Not really, I think the problem is that it tries to put too many things at the same time. Also there are some things where we are not entirely sure on how to proceed.

@torfjelde What do you think about having the variational distributions defined as bijected distributions in another PR?

What's missing?

yebai · 2023-06-10T12:15:52Z

Closed in favour of #45

theogf added 3 commits February 12, 2021 14:57

First rewrite pass with addition of DSVI

8449d29

Better handling of Tracker

0e3b798

Correct entropy implementation

33948e6

Moved back to ADVI definition and readded objectives

06317ca

theogf changed the title ~~[WIP] Basic rewrite of the package with addition of DSVI~~ [WIP] Basic rewrite of the package Feb 17, 2021

theogf added 5 commits February 17, 2021 20:34

More updates on the interface

fd657d5

Corrections interface

603b5ae

Remove manifest

ebf5cfa

Corrected approach of BBVI

5f711e6

Working version of BBVI

62f9f26

theogf added 2 commits February 18, 2021 18:13

Update CI

b7390ac

Fixed test and issues

1c7045e

theogf marked this pull request as ready for review February 18, 2021 17:38

theogf changed the title ~~[WIP] Basic rewrite of the package~~ Basic rewrite of the package Feb 18, 2021

theogf added 9 commits February 18, 2021 18:44

Reorganized file structure and tests

8104a5c

Adding more tests

26f33e1

Cleaned up tests more

58fde04

Fixing tests

dfb4351

Removing unneeded parts distributions

22b024b

Relaxing objectives test

b01cdce

Adapted grad! for others AD

5343a2f

Correct inheritance

59834d4

Fixing versioning

f58970b

theogf requested a review from torfjelde February 22, 2021 16:47

luiarthur reviewed Feb 26, 2021

View reviewed changes

src/algorithms/bbvi.jl Show resolved Hide resolved

luiarthur reviewed Feb 26, 2021

View reviewed changes

src/algorithms/bbvi.jl Outdated Show resolved Hide resolved

torfjelde requested changes Apr 9, 2021

View reviewed changes

theogf added 2 commits April 12, 2021 18:15

Back to unintuitive ad backend naming

e3ad352

Addressing comments

633fa0e

theogf mentioned this pull request May 14, 2021

Zygote should be preferred to ReverseDiff for reverse mode default #23

Closed

theogf added 7 commits July 29, 2021 12:11

Removed XXt

094e30c

Merge branch 'tg/rework_advi' of https://github.com/TuringLang/Advanc…

d290fa0

…edVI.jl into rework_advi

Passing now a RNG through

ec4f84e

Formatted everything with bluestyle

3a0fb4b

Fixed makelogpi

a638e12

Remade the interface to take function plus arguments

3718978

Initialize clean up

9202f24

sethaxen mentioned this pull request Oct 26, 2021

Implement the AdvancedVI interface mlcolab/Pathfinder.jl#9

Open

Red-Portal mentioned this pull request Jun 10, 2022

Rethinking AdvancedVI #24

Closed

Red-Portal mentioned this pull request Mar 14, 2023

Basic rewrite of the package 2023 edition #45

Closed

13 tasks

yebai closed this Jun 10, 2023

yebai deleted the tg/rework_advi branch June 10, 2023 12:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic rewrite of the package #25

Basic rewrite of the package #25

theogf commented Feb 12, 2021 •

edited

yebai commented Feb 14, 2021

theogf commented Feb 15, 2021

theogf commented Feb 18, 2021 •

edited

coveralls commented Feb 18, 2021 •

edited

theogf commented Feb 18, 2021

theogf commented Feb 22, 2021

torfjelde left a comment

torfjelde Apr 9, 2021

theogf Apr 12, 2021

torfjelde Apr 13, 2021

theogf Apr 13, 2021

torfjelde Apr 13, 2021

torfjelde Apr 13, 2021

torfjelde Apr 13, 2021

torfjelde Apr 9, 2021

theogf Apr 12, 2021

torfjelde Apr 9, 2021

theogf Apr 12, 2021

torfjelde Apr 9, 2021

theogf Apr 12, 2021

theogf Apr 12, 2021

torfjelde Apr 13, 2021

theogf Apr 13, 2021

torfjelde Apr 9, 2021

theogf Apr 12, 2021

torfjelde Apr 9, 2021

theogf Apr 12, 2021

torfjelde Apr 13, 2021

theogf Apr 13, 2021

trappmartin commented Sep 17, 2021

theogf commented Sep 17, 2021

ParadaCarleton commented Oct 26, 2021

yebai commented Jun 10, 2023

		update_mean!(q, vec(mean(Δ, dims = 2)), opt)
		update_cov!(alg, q, Δ, state, opt)

		abstract type AbstractPosteriorMvNormal{T} <:
		Distributions.ContinuousMultivariateDistribution end

Basic rewrite of the package #25

Basic rewrite of the package #25

Conversation

theogf commented Feb 12, 2021 • edited

yebai commented Feb 14, 2021

theogf commented Feb 15, 2021

theogf commented Feb 18, 2021 • edited

coveralls commented Feb 18, 2021 • edited

Pull Request Test Coverage Report for Build 589667488

💛 - Coveralls

theogf commented Feb 18, 2021

theogf commented Feb 22, 2021

torfjelde left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trappmartin commented Sep 17, 2021

theogf commented Sep 17, 2021

ParadaCarleton commented Oct 26, 2021

yebai commented Jun 10, 2023

theogf commented Feb 12, 2021 •

edited

theogf commented Feb 18, 2021 •

edited

coveralls commented Feb 18, 2021 •

edited