Limited BFGS

Overview

Assume the objective f(w) = ℓ(w) + r(w)

Implementation

Iterating implemented in LBFGSLearner::RunScheduler

PrepareData();
InitServer();
InitWorker();
for (int k = 0; k < max_num_epochs; ++k) {
  PushGradient(k);
  PrepareCalcDirection(k);
  CalcDirection(k);
  α = 1;
  while (wolfe_condition_not_satisfied) {
     PushGradient(k);
     LineSearch(k, α);
     α *= ρ;
  }
}

The wolfe condition

f(w_k+αp_k) ≤ f(w_k)+c₁αp_k^T∂f(w_k)
p_k^T∂f(w_k+αp_k) ≥ c₂p_k^T∂f(w_k)

method	Worker nodes	Server nodes
PrepareData	1. find the unique feature IDs 2. count the feature appearance 3. push all to severs Return: number of examples read	-
InitServer	-	1. initialize w₀ Return: r(w₀)
InitWorker	1. pull w₀ from servers 2. calculate ∂ℓ(w₀) Return: ℓ(w₀)	-
PushGradient(k)	1. push ∂ℓ(w_k) to servers	-
PrepareCalcDirection(k)	-	1. ∂f(w_k)=∂ℓ(w_k)+∂r(w_k) 2. add ∂f(w_k) - ∂f(w_k-1) to s 3. add w_k - w_k-1 to y Return: B = [s,y,∂f]^T[s,y,∂f]
CalcDirection(k, B)	-	1. calculate p_k by the two-loop Return p_k^T∂ℓ(w_k)
LineSearch(k,α)	1. pull p_k from server if hasn't yet 2. w_k+1 = w_k+αp_k 3. calculate ∂ℓ(w_k+1) Return: [ℓ(w_k+1), ∂ℓ(w_k+1)^Tp_k]	1. w_k+1 = w_k+αp_k Return: [r(w_k+1), ∂r(w_k+1)^Tp_k]

the actual implementation may be slightly different in order to save the computation and memory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limited BFGS

Overview

Implementation

User Guides

Development Notes

Clone this wiki locally