You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the function dataloader_hv_product() under the class hessian(), in line 86-87, it follows
'''
THv = [torch.randn(p.size()).to(device) for p in self.params
] # accumulate result
'''
I am wondering why it uses random initialization instead of zero initialization. (Although in actual computation, with large data number, this initialization is approximate to zero.)
The text was updated successfully, but these errors were encountered:
Thanks a lot to the authors for releasing these codes of their excellent work. But I do agree with @YiifeiWang that it is more appropriate to use zero initialization here. In my own experiments, when using random initialization, the power iteration converges poorly and the returned top eigenvalues vary a lot among different runs. Please point out if my understanding is wrong.
In the function dataloader_hv_product() under the class hessian(), in line 86-87, it follows
'''
THv = [torch.randn(p.size()).to(device) for p in self.params
] # accumulate result
'''
I am wondering why it uses random initialization instead of zero initialization. (Although in actual computation, with large data number, this initialization is approximate to zero.)
The text was updated successfully, but these errors were encountered: