Skip to content

moonl1ght/HessianFreeOptimization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hessian Free Optimization (Truncated Newton)

Hessian Free Optimizer based on Tensorflow

Hessian-Free optimization (HF) or Truncated Newton are a family of optimization algorithms designed for optimizing non-linear functions with large numbers of independent variables. However, unlike Newton’s method, which optimizes its quadratic approximations using matrix-inverses, HF performs a sub-optimization using the linear conjugate gradient algorithm (CG), which doesn’t require ever even forming the curvature matrix explicitly, let alone inverting it. It thus belongs to the broad class of approximate Newton methods that are practical for high-dimensional optimization problems (like neural network training).

Implemented Hessian-free optimization features from Martens (2010) and Martens and Sutskever (2011):

  • Gauss-Newton approximation
  • Early termination
  • Tikhonov damping
  • Preconditioner for conjugate gradient method
  • Levenberg-Marquardt heuristic for adapting damping coefficient

Main computation features

Computing Hessian and vector product:

# self.W: weights
# grads: Gradients of the network
# self.damp_pl: Tikhonov damping coefficient
# vec: Vector that is multiplied by the Jacobian.
grad_v = [tf.reduce_sum(g * v) for g, v in zip(grads, vec)]
Hv = tf.gradients(grad_v, self.W, stop_gradients=vec)
Hv = [hv + self.damp_pl * v for hv, v in zip(Hv, vec)]

Computing Jacobian and vector product or R-operator:

# f: Objective function
# x: Parameters with respect to which computes Jacobian matrix.
# vec: Vector that is multiplied by the Jacobian.
r = [tf.reduce_sum([tf.reduce_sum(v * tf.gradients(f, x)[i])
  for i, v in enumerate(vec)])
  for f in tf.unstack(f)]

Computing Gauss-Newton matrix (J'HJ) and vector product:

# self.output: activation of output layer
# self.W: weights
# self.damp_pl: Tikhonov damping coefficient
Jv = self.__Rop(self.output, self.W, vec)
Jv = tf.reshape(tf.stack(Jv), [-1, 1])
HJv = tf.gradients(tf.matmul(tf.transpose(tf.gradients(self.loss,
  self.output)[0]), Jv), self.output, stop_gradients=Jv)[0]
JHJv = tf.gradients(tf.matmul(tf.transpose(HJv), self.output), self.W,
  stop_gradients=HJv)
JHJv = [gv + self.damp_pl * v for gv, v in zip(JHJv, vec)]

Tested on:

  • python 3.6.4
  • Tensorflow v1.5.0
  • XOR and MNIST Datasets

About

Hessian Free Optimization with Tensorflow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages