Hessian Free Optimization (Truncated Newton)

Hessian Free Optimizer based on Tensorflow

Hessian-Free optimization (HF) or Truncated Newton are a family of optimization algorithms designed for optimizing non-linear functions with large numbers of independent variables. However, unlike Newton’s method, which optimizes its quadratic approximations using matrix-inverses, HF performs a sub-optimization using the linear conjugate gradient algorithm (CG), which doesn’t require ever even forming the curvature matrix explicitly, let alone inverting it. It thus belongs to the broad class of approximate Newton methods that are practical for high-dimensional optimization problems (like neural network training).

Implemented Hessian-free optimization features from Martens (2010) and Martens and Sutskever (2011):

Gauss-Newton approximation
Early termination
Tikhonov damping
Preconditioner for conjugate gradient method
Levenberg-Marquardt heuristic for adapting damping coefficient

Main computation features

Computing Hessian and vector product:

# self.W: weights
# grads: Gradients of the network
# self.damp_pl: Tikhonov damping coefficient
# vec: Vector that is multiplied by the Jacobian.
grad_v = [tf.reduce_sum(g * v) for g, v in zip(grads, vec)]
Hv = tf.gradients(grad_v, self.W, stop_gradients=vec)
Hv = [hv + self.damp_pl * v for hv, v in zip(Hv, vec)]

Computing Jacobian and vector product or R-operator:

# f: Objective function
# x: Parameters with respect to which computes Jacobian matrix.
# vec: Vector that is multiplied by the Jacobian.
r = [tf.reduce_sum([tf.reduce_sum(v * tf.gradients(f, x)[i])
  for i, v in enumerate(vec)])
  for f in tf.unstack(f)]

Computing Gauss-Newton matrix (J'HJ) and vector product:

# self.output: activation of output layer
# self.W: weights
# self.damp_pl: Tikhonov damping coefficient
Jv = self.__Rop(self.output, self.W, vec)
Jv = tf.reshape(tf.stack(Jv), [-1, 1])
HJv = tf.gradients(tf.matmul(tf.transpose(tf.gradients(self.loss,
  self.output)[0]), Jv), self.output, stop_gradients=Jv)[0]
JHJv = tf.gradients(tf.matmul(tf.transpose(HJv), self.output), self.W,
  stop_gradients=HJv)
JHJv = [gv + self.damp_pl * v for gv, v in zip(JHJv, vec)]

Tested on:

python 3.6.4
Tensorflow v1.5.0
XOR and MNIST Datasets

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
README.md		README.md
_config.yml		_config.yml
hfoptimizer.py		hfoptimizer.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hessian Free Optimization (Truncated Newton)

Main computation features

Tested on:

About

Releases

Packages

Languages

moonl1ght/HessianFreeOptimization

Folders and files

Latest commit

History

Repository files navigation

Hessian Free Optimization (Truncated Newton)

Main computation features

Tested on:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages