[WIP] BCG enhancements #121

alejandro-carderera · 2021-03-04T21:24:17Z

Reorder BCG to run on a more natural way, where there are two steps:

Minimize over convex hull until a given tolerance in strong Wolfe gap has been reached (these are simplex steps).
After tolerance has been reached, perform a lazy FW step which will potentially add a vertex and increase the cardinality of the active set. Otherwise increase the tolerance to which we compute the solution over the convex hull.

That way we avoid recomputing quantities, like for example the strong-wolfe gap in the main algorithm, and then c = [fast_dot(gradient, a) for a in active_set.atoms] inside the simplex descent algorithm, which computes the same quantities as those in the strong-wolfe gap computation. Writing BCG this way allows us to make the code faster when the objective function is a quadratic (like in the CINDy case!). As you can make many inner simplex steps reusing quantities.

Things that I still need to complete:

Add the enhancements for the quadratic variants.
Modularize things and reduce the clutter so that the code is better ordered.
Delete the previous code (right now it is there to make sure that the new code behaves like the old code).
Use the objective function class.
Find a good way to output things both inside the FW steps and inside the simplex steps. Right now it is a bit dirty.

Merge the new BCG structure into main before it diverges too much.

Write the problem in barycentric coordinates in closed form if the objective function is quadratic. Need to verify that the code works well for an active set with dense atoms (right now all my testing is with MaybeHotVectors), and perform more tests with the code. Next step is to implement Nesterov's AGD.

matbesancon · 2021-03-10T16:11:12Z

src/blended_cg.jl

+end
+
+#In case the active set atoms are not MaybeHotVectors
+function build_reduced_problem(active_set::AbstractVector{<:Array}, weights, gradient, hessian)


Is it for all possible types of array (except maybe hot) or just dense arrays?

This was just for dense arrays for now. I was thinking of adding sparse arrays afterwards.

matbesancon · 2021-03-10T16:13:05Z

src/blended_cg.jl

+    n = length(atoms[1])
+    k = length(atoms)
+    #Construct the matrix of vertices.
+    vertex_matrix = zeros(n, k)


if the atoms are sparse, this should be spzeros(...) to create a sparse matrix

matbesancon · 2021-03-10T16:16:09Z

src/blended_cg.jl

    return number_of_steps
 end

+function simplex_gradient_descent_over_probability_simplex(


add docstring on top of the function, like:

""" this function does this, the arguments are ... """ function simplex_gradient_descent_over_probability_simplex(

matbesancon · 2021-03-10T16:16:55Z

examples/blended_cg.jl

 f(x) = norm(x - xp)^2
 function grad!(storage, x)
    @. storage = 2 * (x - xp)
 end
+hessian = Matrix(1.0I, n, n) 


The Hessian should probably be 2I here

Suggested change

hessian = Matrix(1.0I, n, n)

hessian = Matrix(2I, n, n)

Added Nesterov's AGD and fixed some bugs. Many tests remaining, and I need to incorporate the comments from Mathieu above.

Reused some computations and fixed some bugs. Also added some function headers. Still testing...

Add an example over the probability simplex that utilizes MaybeHotVectors, and an example over the K-Sparse LMO that uses sparse arrays.

examples/blended_cg.jl

matbesancon · 2021-03-12T10:57:20Z

examples/blended_cg.jl

-const xp = xpi # ./ total;
-
-f(x) = norm(x - xp)^2
+"""


what is the triple quote for?

This is a typo, I was commenting out the first test because I only wanted to check that the second test would work!

src/blended_cg.jl

matbesancon · 2021-03-12T11:00:16Z

src/blended_cg.jl

+
+    #Construct the matrix of vertices.
+    vertex_matrix = zeros(n, k)
+    #reduced_linear = zeros(k)


removed commented out lines once they are unused

matbesancon · 2021-03-12T11:01:53Z

src/blended_cg.jl

+equal to the cardinality of the active set, the objective 
+function is:
+    f(λ) = reduced_linear^T λ + 0.5 * λ^T reduced_hessian λ
+"""


docstring should specify in which case the function returns (nothing, nothing)

src/blended_cg.jl

examples/blended_cg.jl

src/blended_cg.jl

matbesancon · 2021-03-12T11:08:19Z

src/blended_cg.jl

 """
-function update_simplex_gradient_descent!(
+
+function minimize_over_convex_hull(


This function acts by modifying the weights on the active set, by convention we should have a ! on its name

matbesancon · 2021-03-12T11:09:17Z

src/blended_cg.jl

-            storage
+        end
+        #Solve using gradient descent.
+        if !accelerated || L_reduced / mu_reduced == 1.0


L_reduced / mu_reduced == 1.0 do we want to check for exactly 1.0 here? Floating point errors could occur

In the case where L_reduced / mu_reduced == 1.0 there is no benefit from using an accelerated algorithm, as the accelerated algorithms give you an improvement from O(L_reduced / mu_reduced log(1\epsilon)) to O(sqrt(L_reduced / mu_reduced) log(1\epsilon)) (from standard gradient descent) in the convergence rate. For the case where L_reduced / mu_reduced == 1.0 these two guarantees are equal, so just gradient descent. Plus, the definition of the stepsize in the algorithm depends on the condition number, and in the case where L_reduced / mu_reduced == 1.0 the step sizes blow up.

Co-authored-by: Mathieu Besançon <mathieu.besancon@gmail.com>

Incorporate the rest of Mathieu's comments from Github.

Modified tests to account for the new structure of the code.

alejandro-carderera added 3 commits March 4, 2021 16:14

WIP: Starting reorganization of BCG and addition of enhancements.

1456118

Merge branch 'master' into bcg-enhancements

4c9e794

Merge the new BCG structure into main before it diverges too much.

Merge branch 'master' into bcg-enhancements

bd09c06

matbesancon mentioned this pull request Mar 9, 2021

Make AFW and BCG emphasis aware after refactoring #123

Open

matbesancon reviewed Mar 10, 2021

View reviewed changes

alejandro-carderera added 4 commits March 10, 2021 21:04

Intermediate commit

5084f09

Added Nesterov's AGD and fixed some bugs. Many tests remaining, and I need to incorporate the comments from Mathieu above.

Minor to test file.

48d914e

Fixes for some bugs.

a9770a5

Reused some computations and fixed some bugs. Also added some function headers. Still testing...

Delete old code and update test file.

bd69a58

Add an example over the probability simplex that utilizes MaybeHotVectors, and an example over the K-Sparse LMO that uses sparse arrays.