Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement natural gradient optimization for q_diag=True #878

Open
Joshuaalbert opened this issue Nov 12, 2018 · 5 comments
Open

Implement natural gradient optimization for q_diag=True #878

Joshuaalbert opened this issue Nov 12, 2018 · 5 comments
Assignees

Comments

@Joshuaalbert
Copy link

Description

Running an SVGP with NatGrad optimizer fails because meanvarsqrt_to_natural expects a full covariance shape.

MVCE

X = np.random.uniform(size=[N,1])
Y = X
m = gp.models.SVGP(X,Y,gp.kernels.RBF(1),gp.likelihoods.Gaussian(),q_diag=True)
gp.train.NatGradOptimizer(1.).minimize(m, vars_list=[[m.q_mu, m.q_sqrt]])
@hughsalimbeni
Copy link
Contributor

The nat grads update isn't implemented for q_diag=True. I'm not sure how this work, but I'd like to think about it. In any case, the documentation should state that nat grads are only applicable for the full q_sqrt case

@hughsalimbeni
Copy link
Contributor

I've not liked q_diag=True in the past as it has odd properties, but it does save on memory and also saves a big matmul

@Joshuaalbert
Copy link
Author

Just noticed this hanging around issue of mine. q_diag=True can be very useful when your prior is close to the posterior, so that the Bayesian update adds little information. In this case, choosing the transformation which decorrelates your prior space y = L.x + m, with prior P(y) = N[m, LL^T], leads to a posterior P(x | d) = P(L.x + m|d) that is "nearly" decorrelated, i.e. P(x|d) \approx N[q_mean, diag(q_sqrt)^2] to very good approximation. This depends on your prior being similar enough to the posterior, which can be verifed e.g. with HMC to get actual samples of P(x|d). In this parametrisation the natural gradients are super simple, since the Fisher Information matrix for q_mean is just

1/diag(q_sqrt^2) and

for q_sqrt.unconstrained is

2 diag( (d q_sqrt / d q_sqrt.unconstrained)^2)/diag(q_sqrt^2)

@hughsalimbeni
Copy link
Contributor

Yes you're right it would be simple to implement the nat grads directly via the fisher. The nat grad code uses autodiff, however, which is probably still fine but would require a new transform.

@st--
Copy link
Member

st-- commented Dec 3, 2019

@Joshuaalbert @hughsalimbeni I'm assuming this issue would apply the same to gpflow 2.0?

@st-- st-- added the bug label Apr 9, 2020
@st-- st-- self-assigned this May 7, 2020
@st-- st-- removed the bug label May 28, 2020
@st-- st-- changed the title NatGrads fails with q_diag=True Implement natural gradient optimization for q_diag=True May 28, 2020
st-- added a commit that referenced this issue Jun 4, 2020
…1489)

As discussed in #878, GPflow's NaturalGradient optimizer does not implement the diagonal covariance parametrization (`q_diag=True`). This PR clarifies this in the documentation and adds extra shape checks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants