-
Notifications
You must be signed in to change notification settings - Fork 98
/
Copy pathrlbfgs.m
540 lines (458 loc) · 21.3 KB
/
rlbfgs.m
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
function [x, cost, info, options] = rlbfgs(problem, x0, options)
% Riemannian limited memory BFGS solver for smooth objective functions.
%
% function [x, cost, info, options] = rlbfgs(problem)
% function [x, cost, info, options] = rlbfgs(problem, x0)
% function [x, cost, info, options] = rlbfgs(problem, x0, options)
% function [x, cost, info, options] = rlbfgs(problem, [], options)
%
%
% This is a Riemannian limited memory BFGS solver (quasi-Newton method),
% which aims to minimize the cost function in the given problem structure.
% It requires access to the gradient of the cost function.
%
% Parameter options.memory can be used to specify the number of iterations
% the algorithm remembers and uses to approximate the inverse Hessian of
% the cost. Default value is 30.
% For unlimited memory, set options.memory = Inf.
%
%
% For a description of the algorithm and theorems offering convergence
% guarantees, see the references below.
%
% The initial iterate is x0 if it is provided. Otherwise, a random point on
% the manifold is picked. To specify options whilst not specifying an
% initial iterate, give x0 as [] (the empty matrix).
%
% The two outputs 'x' and 'cost' are the last reached point on the manifold
% and its cost.
%
% The output 'info' is a struct-array which contains information about the
% iterations:
% iter (integer)
% The iteration number. The initial guess is 0.
% cost (double)
% The corresponding cost value.
% gradnorm (double)
% The (Riemannian) norm of the gradient.
% time (double)
% The total elapsed time in seconds to reach the corresponding cost.
% stepsize (double)
% The size of the step from the previous to the new iterate.
% accepted (Boolean)
% true if step is accepted in the cautious update. 0 otherwise.
% And possibly additional information logged by options.statsfun.
% For example, type [info.gradnorm] to obtain a vector of the successive
% gradient norms reached at each iteration.
%
% The options structure is used to overwrite the default values. All
% options have a default value and are hence optional. To force an option
% value, pass an options structure with a field options.optionname, where
% optionname is one of the following and the default value is indicated
% between parentheses:
%
% tolgradnorm (1e-6)
% The algorithm terminates if the norm of the gradient drops below
% this. For well-scaled problems, a rule of thumb is that you can
% expect to reduce the gradient norm by 8 orders of magnitude
% (sqrt(eps)) compared to the gradient norm at a "typical" point (a
% rough initial iterate for example). Further decrease is sometimes
% possible, but inexact floating point arithmetic will eventually
% limit the final accuracy. If tolgradnorm is set too low, the
% algorithm may end up iterating forever (or at least until another
% stopping criterion triggers).
% maxiter (1000)
% The algorithm terminates if maxiter iterations were executed.
% maxtime (Inf)
% The algorithm terminates if maxtime seconds elapsed.
% minstepsize (1e-10)
% The minimum norm of the tangent vector that points from the current
% point to the next point. If the norm is less than minstepsize, the
% program will terminate.
% memory (30)
% The number of previous iterations the program remembers. This is used
% to approximate the inverse Hessian at the current point. Because of
% difficulty of maintaining a representation of operators in terms of
% coordinates, a recursive method is used. The number of steps in the
% recursion is at most options.memory. This parameter can take any
% integer value >= 0, or Inf, which is taken to be options.maxiter. If
% options.maxiter has value Inf, then it will take value 10000 and a
% warning will be displayed.
% strict_inc_func (@(t) 1e-4*t)
% The Cautious step needs a real function that has value 0 at t = 0,
% and is strictly increasing. See details in Wen Huang's paper
% "A Riemannian BFGS Method without Differentiated Retraction for
% Nonconvex Optimization Problems"
% statsfun (none)
% Function handle to a function that will be called after each
% iteration to provide the opportunity to log additional statistics.
% They will be returned in the info struct. See the generic Manopt
% documentation about solvers for further information. statsfun is
% called with the point x that was reached last.
% stopfun (none)
% Function handle to a function that will be called at each iteration
% to provide the opportunity to specify additional stopping criteria.
% See the generic Manopt documentation about solvers for further
% information.
% verbosity (2)
% Integer number used to tune the amount of output the algorithm
% generates during execution (mostly as text in the command window).
% The higher, the more output. 0 means silent. 3 and above includes a
% display of the options structure at the beginning of the execution.
% debug (false)
% Set to true to allow the algorithm to perform additional
% computations for debugging purposes. If a debugging test fails, you
% will be informed of it, usually via the command window. Be aware
% that these additional computations appear in the algorithm timings
% too, and may interfere with operations such as counting the number
% of cost evaluations, etc. (the debug calls get storedb too).
% storedepth (2)
% Maximum number of different points x of the manifold for which a
% store structure will be kept in memory in the storedb for caching.
% If memory usage is an issue, you may try to lower this number.
% Profiling may then help to investigate if a performance hit was
% incurred as a result.
% ls_initial_scale (@(gradnorm) 1/gradnorm)
% A function handle that takes as input a real number (a gradient norm)
% and outputs a real number for how to scale the initialization of
% line-search.
%
%
% Please cite the Manopt paper as well as the research paper:
% @InBook{Huang2016,
% title = {A {R}iemannian {BFGS} Method for Nonconvex Optimization Problems},
% author = {Huang, W. and Absil, P.-A. and Gallivan, K.A.},
% year = {2016},
% publisher = {Springer International Publishing},
% editor = {Karas{\"o}zen, B{\"u}lent and Manguo{\u{g}}lu, Murat and Tezer-Sezgin, M{\"u}nevver and G{\"o}ktepe, Serdar and U{\u{g}}ur, {\"O}m{\"u}r},
% address = {Cham},
% booktitle = {Numerical Mathematics and Advanced Applications ENUMATH 2015},
% pages = {627--634},
% doi = {10.1007/978-3-319-39929-4_60}
% }
%
% We point out that, at the moment, this implementation of RLBFGS can be
% slower than the implementation in ROPTLIB by Wen Huang et al. referenced
% above. For the purpose of comparing to their work, please use their
% implementation.
% This file is part of Manopt: www.manopt.org.
% Original author: Changshuo Liu, July 19, 2017.
% Contributors: Nicolas Boumal
% Change log:
%
% Nov. 27, 2017 (Wen Huang):
% Changed the default strict_inc_func to @(t) 1e-4*t from @(t) t.
%
% Jan. 18, 2018 (NB):
% Corrected a bug related to the way the line search hint was defined
% by default.
%
% Aug. 2, 2018 (NB):
% Using the new storedb.remove features to keep storedb lean, and
% reduced the default value of storedepth from 30 to 2 as a result.
%
% Aug. 2, 2022 (sfrcorne):
% Added option ls_initial_scale, with default setting to ensure
% the method is invariant to positive scaling of the cost function.
% Verify that the problem description is sufficient for the solver.
if ~canGetCost(problem)
warning('manopt:getCost', ...
'No cost provided. The algorithm will likely abort.');
end
if ~canGetGradient(problem) && ~canGetApproxGradient(problem)
% Note: we do not give a warning if an approximate gradient is
% explicitly given in the problem description, as in that case the user
% seems to be aware of the issue.
warning('manopt:getGradient:approx', ...
['No gradient provided. Using an FD approximation instead (slow).\n' ...
'It may be necessary to increase options.tolgradnorm.\n' ...
'To disable this warning: warning(''off'', ''manopt:getGradient:approx'')']);
problem.approxgrad = approxgradientFD(problem);
end
% This solver uses linesearch_hint as a line search algorithm. By
% default, try a step size of 1, so that if the BFGS approximation of
% the Hessian (or inverse Hessian) is good, then the iteration is close
% to a Newton step.
if ~canGetLinesearch(problem)
problem.linesearch = @(x, d) 1;
end
% Local defaults for the program
localdefaults.minstepsize = 1e-10;
localdefaults.maxiter = 1000;
localdefaults.tolgradnorm = 1e-6;
localdefaults.memory = 30;
localdefaults.strict_inc_func = @(t) 1e-4*t;
localdefaults.ls_max_steps = 25;
localdefaults.ls_initial_scale = @(gradnorm) 1/gradnorm;
localdefaults.storedepth = 2;
% Merge global and local defaults, then merge w/ user options, if any.
localdefaults = mergeOptions(getGlobalDefaults(), localdefaults);
if ~exist('options', 'var') || isempty(options)
options = struct();
end
options = mergeOptions(localdefaults, options);
% To make sure memory in range [0, Inf)
options.memory = max(options.memory, 0);
if options.memory == Inf
if isinf(options.maxiter)
options.memory = 10000;
warning('rlbfgs:memory', ['options.memory and options.maxiter' ...
' are both Inf; options.memory has been changed to 10000.']);
else
options.memory = options.maxiter;
end
end
M = problem.M;
timetic = tic();
% Create a random starting point if no starting point is provided.
if ~exist('x0', 'var')|| isempty(x0)
xCur = M.rand();
else
xCur = x0;
end
% Create a store database and get a key for the current x
storedb = StoreDB(options.storedepth);
key = storedb.getNewKey();
% __________Initialization of variables______________
% Number of iterations since the last restart
k = 0;
% Total number of BFGS iterations
iter = 0;
% This cell stores step vectors which point from x_{t} to x_{t+1} for t
% indexing the last iterations, capped at options.memory.
% That is, it stores up to options.memory of the most recent step
% vectors. However, the implementation below does not need step vectors
% in their respective tangent spaces at x_{t}'s. Rather, it requires
% them transported to the current point's tangent space by
% transporter. For details regarding the requirements on the the
% transporter, see the reference paper by Huang et al.
% In this implementation, those step vectors are iteratively
% transported to the current point's tangent space after every
% iteration. Thus, at every iteration, vectors in sHistory are in the
% current point's tangent space.
sHistory = cell(1, options.memory);
% This cell stores the differences for latest t's of the gradient at
% x_{t+1} and the gradient at x_{t}, transported to x_{t+1}'s tangent
% space. The memory is also capped at options.memory.
yHistory = cell(1, options.memory);
% rhoHistory{t} stores the reciprocal of the inner product between
% sHistory{t} and yHistory{t}.
rhoHistory = cell(1, options.memory);
% Scaling of direction given by getDirection for acceptable step
alpha = 1;
% Norm of the step
stepsize = 1;
% Stores whether the step is accepted by the cautious update check.
accepted = true;
% Query the cost function and its gradient
[xCurCost, xCurGradient] = getCostGrad(problem, xCur, storedb, key);
xCurGradNorm = M.norm(xCur, xCurGradient);
% Scaling of initial matrix, Barzilai-Borwein.
scaleFactor = options.ls_initial_scale(xCurGradNorm);
% Line-search statistics for recording in info.
lsstats = [];
% Flag to control restarting scheme to avoid infinite loops (see below)
ultimatum = false;
% Save stats in a struct array info, and preallocate.
stats = savestats();
info(1) = stats;
info(min(10000, options.maxiter+1)).iter = [];
if options.verbosity >= 2
fprintf(' iter cost val grad. norm alpha\n');
end
% Main iteration
while true
% Display iteration information
if options.verbosity >= 2
fprintf('%5d %+.16e %.8e %.4e\n', ...
iter, xCurCost, xCurGradNorm, alpha);
end
% Start timing this iteration
timetic = tic();
% Run standard stopping criterion checks
[stop, reason] = stoppingcriterion(problem, xCur, options, ...
info, iter+1);
% If none triggered, run specific stopping criterion check
if ~stop
if stats.stepsize < options.minstepsize
% To avoid infinite loop and to push the search further
% in case BFGS approximation of Hessian is off towards
% the end, we erase the memory by setting k = 0;
% In this way, it starts off like a steepest descent.
% If even steepest descent does not work, then it is
% hopeless and we will terminate.
if ~ultimatum
if options.verbosity >= 2
fprintf(['stepsize is too small, restarting ' ...
'the bfgs procedure at the current point.\n']);
end
k = 0;
ultimatum = true;
else
stop = true;
reason = sprintf(['Last stepsize smaller than ' ...
'minimum allowed; options.minstepsize = %g.'], ...
options.minstepsize);
end
else
% We are not in trouble: lift the ultimatum if it was on.
ultimatum = false;
end
end
if stop
if options.verbosity >= 1
fprintf([reason '\n']);
end
break;
end
% Compute BFGS direction
p = getDirection(M, xCur, xCurGradient, sHistory,...
yHistory, rhoHistory, scaleFactor, min(k, options.memory));
% Execute line-search
[stepsize, xNext, newkey, lsstats] = ...
linesearch_hint(problem, xCur, p, xCurCost, ...
M.inner(xCur, xCurGradient, p), ...
options, storedb, key);
% Record the BFGS step-multiplier alpha which was effectively
% selected. Toward convergence, we hope to see alpha = 1.
alpha = stepsize/M.norm(xCur, p);
step = M.lincomb(xCur, alpha, p);
% Query cost and gradient at the candidate new point.
[xNextCost, xNextGrad] = getCostGrad(problem, xNext, storedb, newkey);
% Compute sk and yk
sk = M.transp(xCur, xNext, step);
yk = M.lincomb(xNext, 1, xNextGrad, ...
-1, M.transp(xCur, xNext, xCurGradient));
% Computation of the BFGS step is invariant under scaling of sk and
% yk by a common factor. For numerical reasons, we scale sk and yk
% so that sk is a unit norm vector.
norm_sk = M.norm(xNext, sk);
sk = M.lincomb(xNext, 1/norm_sk, sk);
yk = M.lincomb(xNext, 1/norm_sk, yk);
inner_sk_yk = M.inner(xNext, sk, yk);
inner_sk_sk = M.norm(xNext, sk)^2; % ensures nonnegativity
% If the cautious step is accepted (which is the intended
% behavior), we record sk, yk and rhok and need to do some
% housekeeping. If the cautious step is rejected, these are not
% recorded. In all cases, xNext is the next iterate: the notion of
% accept/reject here is limited to whether or not we keep track of
% sk, yk, rhok to update the BFGS operator.
cap = options.strict_inc_func(xCurGradNorm);
if inner_sk_sk ~= 0 && (inner_sk_yk / inner_sk_sk) >= cap
accepted = true;
rhok = 1/inner_sk_yk;
scaleFactor = inner_sk_yk / M.norm(xNext, yk)^2;
% Time to store the vectors sk, yk and the scalar rhok.
% Remember: we need to transport all vectors to the most
% current tangent space.
% If we are out of memory
if k >= options.memory
% sk and yk are saved from 1 to the end with the most
% current recorded to the rightmost hand side of the cells
% that are occupied. When memory is full, do a shift so
% that the rightmost is earliest and replace it with the
% most recent sk, yk.
for i = 2 : options.memory
sHistory{i} = M.transp(xCur, xNext, sHistory{i});
yHistory{i} = M.transp(xCur, xNext, yHistory{i});
end
if options.memory > 1
sHistory = sHistory([2:end, 1]);
yHistory = yHistory([2:end, 1]);
rhoHistory = rhoHistory([2:end 1]);
end
if options.memory > 0
sHistory{options.memory} = sk;
yHistory{options.memory} = yk;
rhoHistory{options.memory} = rhok;
end
% If we are not out of memory
else
for i = 1:k
sHistory{i} = M.transp(xCur, xNext, sHistory{i});
yHistory{i} = M.transp(xCur, xNext, yHistory{i});
end
sHistory{k+1} = sk;
yHistory{k+1} = yk;
rhoHistory{k+1} = rhok;
end
k = k + 1;
% The cautious step is rejected: we do not store sk, yk, rhok but
% we still need to transport stored vectors to the new tangent
% space.
else
accepted = false;
for i = 1 : min(k, options.memory)
sHistory{i} = M.transp(xCur, xNext, sHistory{i});
yHistory{i} = M.transp(xCur, xNext, yHistory{i});
end
end
% Update variables to new iterate.
storedb.removefirstifdifferent(key, newkey);
xCur = xNext;
key = newkey;
xCurGradient = xNextGrad;
xCurGradNorm = M.norm(xNext, xNextGrad);
xCurCost = xNextCost;
% iter is the number of iterations we have accomplished.
iter = iter + 1;
% Make sure we don't use too much memory for the store database
% (this is independent from the BFGS memory.)
storedb.purge();
% Log statistics for freshly executed iteration
stats = savestats();
info(iter+1) = stats;
end
% Housekeeping before we return
info = info(1:iter+1);
x = xCur;
cost = xCurCost;
if options.verbosity >= 1
fprintf('Total time is %f [s] (excludes statsfun)\n', ...
info(end).time);
end
% Routine in charge of collecting the current iteration stats
function stats = savestats()
stats.iter = iter;
stats.cost = xCurCost;
stats.gradnorm = xCurGradNorm;
if iter == 0
stats.stepsize = NaN;
stats.time = toc(timetic);
stats.accepted = NaN;
else
stats.stepsize = stepsize;
stats.time = info(iter).time + toc(timetic);
stats.accepted = accepted;
end
stats.linesearch = lsstats;
stats = applyStatsfun(problem, xCur, storedb, key, options, stats);
end
end
% BFGS step, see Wen's paper for details. This function takes in a tangent
% vector g, and applies an approximate inverse Hessian P to it to get Pg.
% Then, -Pg is returned.
%
% Theory requires the transporter to be isometric and to satisfy the
% locking condition (see paper), but these properties do not seem to be
% crucial in practice. If your manifold provides M.isotransp, it may be
% good to do M.transp = M.isotransp; after loading M with a factory.
%
% This implementation operates in the tangent space of the most recent
% point since all vectors in sHistory and yHistory have been transported
% there.
function dir = getDirection(M, xCur, xCurGradient, sHistory, yHistory, ...
rhoHistory, scaleFactor, k)
q = xCurGradient;
inner_s_q = zeros(1, k);
for i = k : -1 : 1
inner_s_q(1, i) = rhoHistory{i} * M.inner(xCur, sHistory{i}, q);
q = M.lincomb(xCur, 1, q, -inner_s_q(1, i), yHistory{i});
end
r = M.lincomb(xCur, scaleFactor, q);
for i = 1 : k
omega = rhoHistory{i} * M.inner(xCur, yHistory{i}, r);
r = M.lincomb(xCur, 1, r, inner_s_q(1, i)-omega, sHistory{i});
end
dir = M.lincomb(xCur, -1, r);
end