-
Notifications
You must be signed in to change notification settings - Fork 108
Expand file tree
/
Copy pathconjugategradient.m
More file actions
445 lines (379 loc) · 17.8 KB
/
Copy pathconjugategradient.m
File metadata and controls
445 lines (379 loc) · 17.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
function [x, cost, info, options] = conjugategradient(problem, x, options)
% Conjugate gradient minimization algorithm for Manopt.
%
% function [x, cost, info, options] = conjugategradient(problem)
% function [x, cost, info, options] = conjugategradient(problem, x0)
% function [x, cost, info, options] = conjugategradient(problem, x0, options)
% function [x, cost, info, options] = conjugategradient(problem, [], options)
%
% Apply the conjugate gradient minimization algorithm to the problem
% defined in the problem structure, starting at x0 if it is provided
% (otherwise, at a random point on the manifold). To specify options whilst
% not specifying an initial guess, give x0 as [] (the empty matrix).
%
% The outputs x and cost are the best reached point on the manifold and its
% cost. The struct-array info contains information about the iterations:
% iter : the iteration number (0 for the initial guess)
% cost : cost value
% time : elapsed time in seconds
% gradnorm : Riemannian norm of the gradient
% stepsize : norm of the last tangent vector retracted
% beta : value of the beta parameter (see options.beta_type)
% linesearch : information logged by options.linesearch
% And possibly additional information logged by options.statsfun.
% For example, type [info.gradnorm] to obtain a vector of the successive
% gradient norms reached.
%
% The options structure is used to overwrite the default values. All
% options have a default value and are hence optional. To force an option
% value, pass an options structure with a field options.optionname, where
% optionname is one of the following and the default value is indicated
% between parentheses:
%
% tolgradnorm (1e-6)
% The algorithm terminates if the norm of the gradient drops below this.
% maxiter (1000)
% The algorithm terminates if maxiter iterations have been executed.
% maxtime (Inf)
% The algorithm terminates if maxtime seconds elapsed.
% minstepsize (1e-10)
% The algorithm terminates if the linesearch returns a displacement
% vector (to be retracted) smaller in norm than this value.
% beta_type ('H-S')
% Conjugate gradient beta rule used to construct the new search
% direction, based on a linear combination of the previous search
% direction and the new (preconditioned) gradient. Possible values
% for this parameter are:
% 'S-D', 'steep' for beta = 0 (preconditioned steepest descent)
% 'F-R' for Fletcher-Reeves's rule
% 'P-R' for Polak-Ribiere's modified rule
% 'H-S' for Hestenes-Stiefel's modified rule
% 'H-Z' for Hager-Zhang's modified rule
% 'L-S' for Sato's Liu-Storey rule
% 'P-R-SATO' for Sato's variation of modified PR rule
% 'H-S-SATO' for Sato's variation of modified HS rule
% See Hager and Zhang 2006, "A survey of nonlinear conjugate gradient
% methods" for a description of these rules in the Euclidean case and
% for an explanation of how to adapt them to the preconditioned case.
% The adaption to the Riemannian case is straightforward: see in code
% for details. Modified rules take the max between 0 and the computed
% beta value, which provides automatic restart, except for H-Z and L-S
% which use a different modification. Sato's Liu-Storey rule is
% described in Sato 2021, "Riemannian conjugate gradient methods:
% General framework and specific algorithms with convergence analyses"
% orth_value (Inf)
% Following Powell's restart strategy (Math. prog. 1977), restart CG
% (that is, make a -preconditioned- gradient step) if two successive
% -preconditioned- gradients are "too" parallel. See for example
% Hager and Zhang 2006, "A survey of nonlinear conjugate gradient
% methods", page 12. An infinite value disables this strategy. See in
% code formula for the specific criterion used.
% linesearch (@linesearch_adaptive or @linesearch_hint)
% Function handle to a line search function. The options structure is
% passed to the line search too, so you can pass it parameters. See
% each line search's documentation for info. Another available line
% search in manopt is @linesearch, in /manopt/linesearch/linesearch.m
% If the problem structure includes a line search hint, then the
% default line search used is @linesearch_hint.
% statsfun (none)
% Function handle to a function that will be called after each
% iteration to provide the opportunity to log additional statistics.
% They will be returned in the info struct. See the generic Manopt
% documentation about solvers for further information.
% stopfun (none)
% Function handle to a function that will be called at each iteration
% to provide the opportunity to specify additional stopping criteria.
% See the generic Manopt documentation about solvers for further
% information.
% verbosity (3)
% Integer number used to tune the amount of output the algorithm
% generates during execution (mostly as text in the command window).
% The higher, the more output. 0 means silent.
% storedepth (2)
% Maximum number of different points x of the manifold for which a
% store structure will be kept in memory in the storedb. If the
% caching features of Manopt are not used, this is irrelevant. For
% the CG algorithm, a store depth of 2 should always be sufficient.
%
%
% In most of the examples bundled with the toolbox (see link below), the
% solver can be replaced by the present one if need be.
%
% See also: steepestdescent trustregions manopt/solvers/linesearch manopt/examples
% An explicit, general listing of this algorithm, with preconditioning,
% can be found in the following paper:
% @Article{boumal2015lowrank,
% Title = {Low-rank matrix completion via preconditioned optimization on the {G}rassmann manifold},
% Author = {Boumal, N. and Absil, P.-A.},
% Journal = {Linear Algebra and its Applications},
% Year = {2015},
% Pages = {200--239},
% Volume = {475},
% Doi = {10.1016/j.laa.2015.02.027},
% }
% This file is part of Manopt: www.manopt.org.
% Original author: Bamdev Mishra, Dec. 30, 2012.
% Contributors: Nicolas Boumal, Nick Vannieuwenhoven, Ivan Bioli
% Change log:
%
% March 14, 2013, NB:
% Added preconditioner support : see Section 8 in
% https://www.math.lsu.edu/~hozhang/papers/cgsurvey.pdf
%
% Sept. 13, 2013, NB:
% Now logging beta parameter too.
%
% Nov. 7, 2013, NB:
% The search direction is no longer normalized before it is passed
% to the linesearch. This way, it is up to the designers of the
% linesearch to decide whether they want to use the norm of the
% search direction in their algorithm or not. There are reasons
% against it, but practical evidence that it may help too, so we
% allow it. The default linesearch_adaptive used does exploit the
% norm information. The base linesearch does not. You may select it
% by setting options.linesearch = @linesearch;
%
% Nov. 29, 2013, NB:
% Documentation improved: options are now explicitly described.
% Removed the Daniel rule for beta: it was not appropriate for
% preconditioned CG and I could not find a proper reference for it.
%
% April 3, 2015 (NB):
% Works with the new StoreDB class system.
%
% Aug. 2, 2018 (NB):
% Now using storedb.remove() to keep the cache lean.
%
% Feb. 7, 2022 (NV):
% Added support for Liu-Storey rule (L-S).
%
% Nov. 7, 2023 (IB):
% Fixed Liu-Storey rule (L-S) + added H-S-SATO and P-R-SATO.
M = problem.M;
% Verify that the problem description is sufficient for the solver.
if ~canGetCost(problem)
warning('manopt:getCost', ...
'No cost provided. The algorithm will likely abort.');
end
if ~canGetGradient(problem) && ~canGetApproxGradient(problem)
warning('manopt:getGradient:approx', ...
['No gradient provided. Using an FD approximation instead (slow).\n' ...
'It may be necessary to increase options.tolgradnorm.\n' ...
'To disable this warning: warning(''off'', ''manopt:getGradient:approx'')']);
problem.approxgrad = approxgradientFD(problem);
end
% Set local defaults here
localdefaults.minstepsize = 1e-10;
localdefaults.maxiter = 1000;
localdefaults.tolgradnorm = 1e-6;
localdefaults.storedepth = 20;
% Changed by NB : H-S has the "auto restart" property.
% See Hager-Zhang 2005/2006 survey about CG methods.
% The auto restart comes from the 'max(0, ...)', not so much from the
% reason stated in Hager-Zhang I think. P-R also has auto restart.
localdefaults.beta_type = 'H-S';
localdefaults.orth_value = Inf; % by BM as suggested in Nocedal and Wright
% Depending on whether the problem structure specifies a hint for
% line-search algorithms, choose a default line-search that works on
% its own (typical) or that uses the hint.
if ~canGetLinesearch(problem)
localdefaults.linesearch = @linesearch_adaptive;
else
localdefaults.linesearch = @linesearch_hint;
end
% Merge global and local defaults, then merge w/ user options, if any.
localdefaults = mergeOptions(getGlobalDefaults(), localdefaults);
if ~exist('options', 'var') || isempty(options)
options = struct();
end
options = mergeOptions(localdefaults, options);
timetic = tic();
% If no initial point x is given by the user, generate one at random.
if ~exist('x', 'var') || isempty(x)
x = M.rand();
end
% Create a store database and generate a key for the current x
storedb = StoreDB(options.storedepth);
key = storedb.getNewKey();
% Compute cost-related quantities for x
[cost, grad] = getCostGrad(problem, x, storedb, key);
gradnorm = M.norm(x, grad);
Pgrad = getPrecon(problem, x, grad, storedb, key);
gradPgrad = M.inner(x, grad, Pgrad);
% Iteration counter (at any point, iter is the number of fully executed
% iterations so far)
iter = 0;
% Save stats in a struct array info and preallocate.
stats = savestats();
info(1) = stats;
info(min(10000, options.maxiter+1)).iter = [];
if options.verbosity >= 2
fprintf(' iter\t cost val\t grad. norm\n');
end
% Compute a first descent direction (not normalized)
desc_dir = M.lincomb(x, -1, Pgrad);
% Start iterating until stopping criterion triggers
while true
% Display iteration information
if options.verbosity >= 2
fprintf('%5d\t%+.16e\t%.8e\n', iter, cost, gradnorm);
end
% Start timing this iteration
timetic = tic();
% Run standard stopping criterion checks
[stop, reason] = stoppingcriterion(problem, x, options, info, iter+1);
% Run specific stopping criterion check
if ~stop && abs(stats.stepsize) < options.minstepsize
stop = true;
reason = sprintf(['Last stepsize smaller than minimum ' ...
'allowed; options.minstepsize = %g.'], ...
options.minstepsize);
end
if stop
if options.verbosity >= 1
fprintf([reason '\n']);
end
break;
end
% The line search algorithms require the directional derivative of the
% cost at the current point x along the search direction.
df0 = M.inner(x, grad, desc_dir);
% If we didn't get a descent direction: restart, i.e., switch to the
% negative gradient. Equivalent to resetting the CG direction to a
% steepest descent step, which discards the past information.
if df0 >= 0
% Or we switch to the negative gradient direction.
if options.verbosity >= 3
fprintf(['Conjugate gradient info: got an ascent direction '...
'(df0 = %2e), reset to the (preconditioned) '...
'steepest descent direction.\n'], df0);
end
% Reset to negative gradient: this discards the CG memory.
desc_dir = M.lincomb(x, -1, Pgrad);
df0 = -gradPgrad;
end
% Execute line search
[stepsize, newx, newkey, lsstats] = options.linesearch( ...
problem, x, desc_dir, cost, df0, options, storedb, key);
% Compute the new cost-related quantities for newx
[newcost, newgrad] = getCostGrad(problem, newx, storedb, newkey);
newgradnorm = M.norm(newx, newgrad);
Pnewgrad = getPrecon(problem, newx, newgrad, storedb, newkey);
newgradPnewgrad = M.inner(newx, newgrad, Pnewgrad);
% Apply the CG scheme to compute the next search direction.
%
% This paper https://www.math.lsu.edu/~hozhang/papers/cgsurvey.pdf
% by Hager and Zhang lists many known beta rules. The rules defined
% here can be found in that paper (or are provided with additional
% references), adapted to the Riemannian setting.
%
if strcmpi(options.beta_type, 'steep') || ...
strcmpi(options.beta_type, 'S-D') % Gradient Descent
beta = 0;
desc_dir = M.lincomb(newx, -1, Pnewgrad);
else
oldgrad = M.transp(x, newx, grad);
orth_grads = M.inner(newx, oldgrad, Pnewgrad) / newgradPnewgrad;
% Powell's restart strategy (see page 12 of Hager and Zhang's
% survey on conjugate gradient methods, for example)
if abs(orth_grads) >= options.orth_value
beta = 0;
desc_dir = M.lincomb(x, -1, Pnewgrad);
else % Compute the CG modification
old_desc_dir = desc_dir;
desc_dir = M.transp(x, newx, desc_dir);
switch upper(options.beta_type)
case 'F-R' % Fletcher-Reeves
beta = newgradPnewgrad / gradPgrad;
case 'P-R' % Polak-Ribiere+
% vector grad(new) - transported grad(current)
diff = M.lincomb(newx, 1, newgrad, -1, oldgrad);
ip_diff = M.inner(newx, Pnewgrad, diff);
beta = ip_diff / gradPgrad;
beta = max(0, beta);
case 'P-R-SATO' % Polak-Ribiere+ from Sato's paper
Poldgrad = M.transp(x, newx, Pgrad);
numo = newgradPnewgrad - M.inner(newx, newgrad, Poldgrad);
betaPRP = numo / gradPgrad;
betaFR = newgradPnewgrad / gradPgrad;
beta = max(0, min(betaPRP, betaFR));
case 'H-S' % Hestenes-Stiefel+
diff = M.lincomb(newx, 1, newgrad, -1, oldgrad);
ip_diff = M.inner(newx, Pnewgrad, diff);
beta = ip_diff / M.inner(newx, diff, desc_dir);
beta = max(0, beta);
case 'H-S-SATO' % Hestenes-Stiefel+ from Sato's paper
Poldgrad = M.transp(x, newx, Pgrad);
numo = newgradPnewgrad - M.inner(newx, newgrad, Poldgrad);
deno = M.inner(newx, newgrad, desc_dir) - M.inner(x, grad, old_desc_dir);
betaHS = numo / deno;
betaDY = newgradPnewgrad / deno;
beta = max(min(betaHS, betaDY), 0);
case 'H-Z' % Hager-Zhang+
diff = M.lincomb(newx, 1, newgrad, -1, oldgrad);
Poldgrad = M.transp(x, newx, Pgrad);
Pdiff = M.lincomb(newx, 1, Pnewgrad, -1, Poldgrad);
deno = M.inner(newx, diff, desc_dir);
numo = M.inner(newx, diff, Pnewgrad);
numo = numo - 2*M.inner(newx, diff, Pdiff)*...
M.inner(newx, desc_dir, newgrad) / deno;
beta = numo / deno;
% Robustness (see Hager-Zhang paper mentioned above)
desc_dir_norm = M.norm(newx, desc_dir);
eta_HZ = -1 / ( desc_dir_norm * min(0.01, gradnorm) );
beta = max(beta, eta_HZ);
case 'L-S' % Liu-Storey+ from Sato
Poldgrad = M.transp(x, newx, Pgrad);
numo = newgradPnewgrad - M.inner(newx, newgrad, Poldgrad);
deno = -1*M.inner(x, grad, old_desc_dir);
betaLS = numo / deno;
betaCD = newgradPnewgrad / deno;
beta = max(0, min(betaLS, betaCD));
otherwise
error(['Unknown options.beta_type. ' ...
'Should be steep, S-D, F-R, P-R, H-S, H-Z, ' ...
'L-S, P-R-SATO or H-S-SATO.']);
end
desc_dir = M.lincomb(newx, -1, Pnewgrad, beta, desc_dir);
end
end
% Transfer iterate info.
storedb.removefirstifdifferent(key, newkey);
x = newx;
key = newkey;
cost = newcost;
grad = newgrad;
Pgrad = Pnewgrad;
gradnorm = newgradnorm;
gradPgrad = newgradPnewgrad;
% iter is the number of iterations we have accomplished.
iter = iter + 1;
% Make sure we don't use too much memory for the store database.
storedb.purge();
% Log statistics for freshly executed iteration.
stats = savestats();
info(iter+1) = stats;
end
info = info(1:iter+1);
if options.verbosity >= 1
fprintf('Total time is %f [s] (excludes statsfun)\n', info(end).time);
end
% Routine in charge of collecting the current iteration stats
function stats = savestats()
stats.iter = iter;
stats.cost = cost;
stats.gradnorm = gradnorm;
if iter == 0
stats.stepsize = nan;
stats.time = toc(timetic);
stats.linesearch = [];
stats.beta = 0;
else
stats.stepsize = stepsize;
stats.time = info(iter).time + toc(timetic);
stats.linesearch = lsstats;
stats.beta = beta;
end
stats = applyStatsfun(problem, x, storedb, key, options, stats);
end
end