-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hessian diagonal computation #445
Comments
EDIT: THIS PROPOSED SOLUTION DOES NOT WORK IN GENERAL - see dougalm's answer below for explanation Just provided a solution for this over at #417 - small world. Below is an example showing how to compute the gradient and just the diagonal of the Hessian (using
PS: you can check out the entire list of built in differential operators here. |
Hey, thanks a lot for the answer. I though about doing that but I find the results surprising. Here is the updated example using the approach you're proposing. from autograd import elementwise_grad
import autograd.numpy as np
y_true = np.array([
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1]
], dtype=float)
y_pred = np.array([
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1]
], dtype=float)
weights = np.array([1, 1, 1, 1, 1], dtype=float)
def softmax(x, axis=1):
z = np.exp(x)
return z / np.sum(z, axis=axis, keepdims=True)
def loss(y_pred):
y_true = np.array([
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1]
], dtype=float)
ys = np.sum(y_true, axis=0)
y_true = np.divide(y_true, ys)
ln_p = np.log(softmax(y_pred))
wll = np.sum(y_true * ln_p, axis=0)
loss = -np.dot(weights, wll)
return loss
nabla_g = elementwise_grad(loss)
diag_hess_g = elementwise_grad(nabla_g)
print(diag_hess_g(y_pred)) This returns: [[7.54474768e-17 2.77555756e-17 2.77555756e-17 2.77555756e-17
2.77555756e-17]
[2.77555756e-17 7.54474768e-17 2.77555756e-17 2.77555756e-17
2.77555756e-17]
[2.77555756e-17 2.77555756e-17 2.77555756e-17 7.54474768e-17
2.77555756e-17]
[2.77555756e-17 2.77555756e-17 7.54474768e-17 2.77555756e-17
2.77555756e-17]
[6.93889390e-18 6.93889390e-18 6.93889390e-18 6.93889390e-18
1.88618692e-17]
[6.93889390e-18 6.93889390e-18 6.93889390e-18 6.93889390e-18
1.88618692e-17]
[6.93889390e-18 6.93889390e-18 6.93889390e-18 6.93889390e-18
1.88618692e-17]] which isn't the result I'm expecting (I provided the expected output in my initial post). Instead it's basically a matrix full of zeros. I'm not saying you're wrong but I'm pretty sure of my expected output. First of all I did the same thing with Tensorflow and it worked fine. Secondly when I used it in my machine learning routine it worked very well. On the other hand using the output obtained by using |
Here is the code I'm using to extract the diagonal part. hess = hessian(loss)
H = hess(y_pred)
diag = np.array([
H[i, j, i, j]
for j in range(H.shape[1])
for i in range(H.shape[0])
])
diag.reshape(y_pred.shape, order='F') |
Side-question (from looking a bit closer at your original code): are you trying to differentiate the function
with respect to If you want to differentiate with respect to weights you can write your function like shown below
Since your weights have length 5, your gradient will also have length 5 and your Hessian will be a (5x5) matrix. Then - I assume you're computing the Hessian to perform Newton's method to tune the weights (?) - you're good to go (i.e., you won't need to take the 'diagonal of the Hessian'). |
Hey, Nope I'm trying to differentiate w.r.t. to |
Unfortunately, I don't think it's possible to compute the diagonal of the Hessian other than by taking N separate Hessian-vector products, equivalent to instantiating the full Hessian and then taking the diagonal. People resort to all sorts of tricks to estimate the trace of the Hessian (e.g. https://arxiv.org/abs/1802.03451) precisely because it's expensive to evaluate the diagonal. Autograd's That caveat was in the docstring of an earlier version of |
Sorry, I meant to say "sum of each column". I was thinking of forward mode (which we should probably be using in |
How stupid of a work-around (not invoking I'm assuming pretty stupid, but for example
compare to the "incorrect" answer provided by composing
|
Hello,
I've been playing around with
autograd
and I'm having a blast. However I'm having some difficulty with extracting the diagonal of the Hessian.This is my current code:
I understand that
hessian
is simplyjacobian
called twice and thathess
is ann * p * n * p
matrix. I can extract the diagonal manually and obtain my expected output which is:I've checked this numerically and it's fine. The problem is that this still requires computing the full Hessian before accessing the diagonal part, which is really expensive. Is there any better way to proceed? I think this is a common use case in machine learning optimization that could deserve a dedicated convenience function
The text was updated successfully, but these errors were encountered: