Implement natural gradient optimization for q_diag=True #878

Joshuaalbert · 2018-11-12T11:25:44Z

Description

Running an SVGP with NatGrad optimizer fails because meanvarsqrt_to_natural expects a full covariance shape.

MVCE

X = np.random.uniform(size=[N,1])
Y = X
m = gp.models.SVGP(X,Y,gp.kernels.RBF(1),gp.likelihoods.Gaussian(),q_diag=True)
gp.train.NatGradOptimizer(1.).minimize(m, vars_list=[[m.q_mu, m.q_sqrt]])

The text was updated successfully, but these errors were encountered:

hughsalimbeni · 2018-11-12T18:17:39Z

The nat grads update isn't implemented for q_diag=True. I'm not sure how this work, but I'd like to think about it. In any case, the documentation should state that nat grads are only applicable for the full q_sqrt case

hughsalimbeni · 2018-11-12T18:19:57Z

I've not liked q_diag=True in the past as it has odd properties, but it does save on memory and also saves a big matmul

Joshuaalbert · 2019-05-27T15:42:05Z

Just noticed this hanging around issue of mine. q_diag=True can be very useful when your prior is close to the posterior, so that the Bayesian update adds little information. In this case, choosing the transformation which decorrelates your prior space y = L.x + m, with prior P(y) = N[m, LL^T], leads to a posterior P(x | d) = P(L.x + m|d) that is "nearly" decorrelated, i.e. P(x|d) \approx N[q_mean, diag(q_sqrt)^2] to very good approximation. This depends on your prior being similar enough to the posterior, which can be verifed e.g. with HMC to get actual samples of P(x|d). In this parametrisation the natural gradients are super simple, since the Fisher Information matrix for q_mean is just

1/diag(q_sqrt^2) and

for q_sqrt.unconstrained is

2 diag( (d q_sqrt / d q_sqrt.unconstrained)^2)/diag(q_sqrt^2)

hughsalimbeni · 2019-05-29T10:21:56Z

Yes you're right it would be simple to implement the nat grads directly via the fisher. The nat grad code uses autodiff, however, which is probably still fine but would require a new transform.

st-- · 2019-12-03T21:40:15Z

@Joshuaalbert @hughsalimbeni I'm assuming this issue would apply the same to gpflow 2.0?

…1489) As discussed in #878, GPflow's NaturalGradient optimizer does not implement the diagonal covariance parametrization (`q_diag=True`). This PR clarifies this in the documentation and adds extra shape checks.

st-- added the bug label Apr 9, 2020

st-- added the enhancement label Apr 16, 2020

st-- self-assigned this May 7, 2020

st-- mentioned this issue May 28, 2020

improve NaturalGradient optimizer documentation and add shape checks #1489

Merged

st-- removed the bug label May 28, 2020

st-- changed the title ~~NatGrads fails with q_diag=True~~ Implement natural gradient optimization for q_diag=True May 28, 2020

st-- linked a pull request Jun 4, 2020 that will close this issue

improve NaturalGradient optimizer documentation and add shape checks #1489

Merged

st-- removed a link to a pull request Jun 4, 2020

improve NaturalGradient optimizer documentation and add shape checks #1489

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement natural gradient optimization for q_diag=True #878

Implement natural gradient optimization for q_diag=True #878

Joshuaalbert commented Nov 12, 2018

hughsalimbeni commented Nov 12, 2018

hughsalimbeni commented Nov 12, 2018

Joshuaalbert commented May 27, 2019

hughsalimbeni commented May 29, 2019

st-- commented Dec 3, 2019

Implement natural gradient optimization for q_diag=True #878

Implement natural gradient optimization for q_diag=True #878

Comments

Joshuaalbert commented Nov 12, 2018

Description

MVCE

hughsalimbeni commented Nov 12, 2018

hughsalimbeni commented Nov 12, 2018

Joshuaalbert commented May 27, 2019

hughsalimbeni commented May 29, 2019

st-- commented Dec 3, 2019