Limited BFGS
Mu Li edited this page Feb 11, 2016
·
3 revisions
Assume the objective f(w) = ℓ(w) + r(w)
Iterating implemented in LBFGSLearner::RunScheduler
PrepareData();
InitServer();
InitWorker();
for (int k = 0; k < max_num_epochs; ++k) {
PushGradient(k);
PrepareCalcDirection(k);
CalcDirection(k);
α = 1;
while (wolfe_condition_not_satisfied) {
PushGradient(k);
LineSearch(k, α);
α *= ρ;
}
}
The wolfe condition
- f(wk+αpk) ≤ f(wk)+c1αpkT∂f(wk)
- pkT∂f(wk+αpk) ≥ c2pkT∂f(wk)
method | Worker nodes | Server nodes |
---|---|---|
PrepareData | 1. find the unique feature IDs 2. count the feature appearance 3. push all to severs Return: number of examples read |
- |
InitServer | - | 1. initialize w0 Return: r(w0) |
InitWorker | 1. pull w0 from servers 2. calculate ∂ℓ(w0) Return: ℓ(w0) |
- |
PushGradient(k) | 1. push ∂ℓ(wk) to servers | - |
PrepareCalcDirection(k) | - | 1. ∂f(wk)=∂ℓ(wk)+∂r(wk) 2. add ∂f(wk) - ∂f(wk-1) to s 3. add wk - wk-1 to y Return: B = [s,y,∂f]T[s,y,∂f] |
CalcDirection(k, B) | - | 1. calculate pk by the two-loop Return pkT∂ℓ(wk) |
LineSearch(k,α) | 1. pull pk from server if hasn't yet 2. wk+1 = wk+αpk 3. calculate ∂ℓ(wk+1) Return: [ℓ(wk+1), ∂ℓ(wk+1)Tpk] |
1. wk+1 = wk+αpk Return: [r(wk+1), ∂r(wk+1)Tpk] |
the actual implementation may be slightly different in order to save the computation and memory.