A question on the computation of Hessian-vector product #4

YiifeiWang · 2020-02-15T13:08:01Z

In the function dataloader_hv_product() under the class hessian(), in line 86-87, it follows
'''
THv = [torch.randn(p.size()).to(device) for p in self.params
] # accumulate result
'''
I am wondering why it uses random initialization instead of zero initialization. (Although in actual computation, with large data number, this initialization is approximate to zero.)

htwang14 · 2020-03-14T20:54:42Z

Thanks a lot to the authors for releasing these codes of their excellent work. But I do agree with @YiifeiWang that it is more appropriate to use zero initialization here. In my own experiments, when using random initialization, the power iteration converges poorly and the returned top eigenvalues vary a lot among different runs. Please point out if my understanding is wrong.

yaozhewei · 2020-03-14T22:51:41Z

Thanks for pointing this out. This is a mistake when we clean up the code. It is fixed now.

yaozhewei closed this as completed Mar 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question on the computation of Hessian-vector product #4

A question on the computation of Hessian-vector product #4

YiifeiWang commented Feb 15, 2020

htwang14 commented Mar 14, 2020

yaozhewei commented Mar 14, 2020

A question on the computation of Hessian-vector product #4

A question on the computation of Hessian-vector product #4

Comments

YiifeiWang commented Feb 15, 2020

htwang14 commented Mar 14, 2020

yaozhewei commented Mar 14, 2020