Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] BCG enhancements #121

Merged
merged 20 commits into from
Mar 12, 2021
Merged

[WIP] BCG enhancements #121

merged 20 commits into from
Mar 12, 2021

Conversation

alejandro-carderera
Copy link
Collaborator

@alejandro-carderera alejandro-carderera commented Mar 4, 2021

Reorder BCG to run on a more natural way, where there are two steps:

  • Minimize over convex hull until a given tolerance in strong Wolfe gap has been reached (these are simplex steps).
  • After tolerance has been reached, perform a lazy FW step which will potentially add a vertex and increase the cardinality of the active set. Otherwise increase the tolerance to which we compute the solution over the convex hull.

That way we avoid recomputing quantities, like for example the strong-wolfe gap in the main algorithm, and then c = [fast_dot(gradient, a) for a in active_set.atoms] inside the simplex descent algorithm, which computes the same quantities as those in the strong-wolfe gap computation. Writing BCG this way allows us to make the code faster when the objective function is a quadratic (like in the CINDy case!). As you can make many inner simplex steps reusing quantities.

Things that I still need to complete:

  • Add the enhancements for the quadratic variants.
  • Modularize things and reduce the clutter so that the code is better ordered.
  • Delete the previous code (right now it is there to make sure that the new code behaves like the old code).
  • Use the objective function class.
  • Find a good way to output things both inside the FW steps and inside the simplex steps. Right now it is a bit dirty.

Write the problem in barycentric coordinates in closed form if the objective function is quadratic. Need to verify that the code works well for an active set with dense atoms (right now all my testing is with MaybeHotVectors), and perform more tests with the code.

Next step is to implement Nesterov's AGD.
end

#In case the active set atoms are not MaybeHotVectors
function build_reduced_problem(active_set::AbstractVector{<:Array}, weights, gradient, hessian)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it for all possible types of array (except maybe hot) or just dense arrays?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just for dense arrays for now. I was thinking of adding sparse arrays afterwards.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok perfect

n = length(atoms[1])
k = length(atoms)
#Construct the matrix of vertices.
vertex_matrix = zeros(n, k)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the atoms are sparse, this should be spzeros(...) to create a sparse matrix

return number_of_steps
end

function simplex_gradient_descent_over_probability_simplex(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add docstring on top of the function, like:

"""
this function does this, the arguments are ...
"""
function simplex_gradient_descent_over_probability_simplex(

f(x) = norm(x - xp)^2
function grad!(storage, x)
@. storage = 2 * (x - xp)
end
hessian = Matrix(1.0I, n, n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Hessian should probably be 2I here

Suggested change
hessian = Matrix(1.0I, n, n)
hessian = Matrix(2I, n, n)

Added Nesterov's AGD and fixed some bugs.

Many tests remaining, and I need to incorporate the comments from Mathieu above.
Reused some computations and fixed some bugs. Also added some function headers. Still testing...
Add an example over the probability simplex that utilizes MaybeHotVectors, and an example over the K-Sparse LMO that uses sparse arrays.
examples/blended_cg.jl Outdated Show resolved Hide resolved
const xp = xpi # ./ total;

f(x) = norm(x - xp)^2
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the triple quote for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a typo, I was commenting out the first test because I only wanted to check that the second test would work!

src/blended_cg.jl Outdated Show resolved Hide resolved
src/blended_cg.jl Outdated Show resolved Hide resolved
src/blended_cg.jl Outdated Show resolved Hide resolved

#Construct the matrix of vertices.
vertex_matrix = zeros(n, k)
#reduced_linear = zeros(k)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed commented out lines once they are unused

equal to the cardinality of the active set, the objective
function is:
f(λ) = reduced_linear^T λ + 0.5 * λ^T reduced_hessian λ
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring should specify in which case the function returns (nothing, nothing)

src/blended_cg.jl Outdated Show resolved Hide resolved
src/blended_cg.jl Outdated Show resolved Hide resolved
src/blended_cg.jl Outdated Show resolved Hide resolved
examples/blended_cg.jl Outdated Show resolved Hide resolved
src/blended_cg.jl Outdated Show resolved Hide resolved
src/blended_cg.jl Outdated Show resolved Hide resolved
"""
function update_simplex_gradient_descent!(

function minimize_over_convex_hull(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function acts by modifying the weights on the active set, by convention we should have a ! on its name

storage
end
#Solve using gradient descent.
if !accelerated || L_reduced / mu_reduced == 1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L_reduced / mu_reduced == 1.0 do we want to check for exactly 1.0 here? Floating point errors could occur

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case where L_reduced / mu_reduced == 1.0 there is no benefit from using an accelerated algorithm, as the accelerated algorithms give you an improvement from O(L_reduced / mu_reduced log(1\epsilon)) to O(sqrt(L_reduced / mu_reduced) log(1\epsilon)) (from standard gradient descent) in the convergence rate. For the case where L_reduced / mu_reduced == 1.0 these two guarantees are equal, so just gradient descent. Plus, the definition of the stepsize in the algorithm depends on the condition number, and in the case where L_reduced / mu_reduced == 1.0 the step sizes blow up.

matbesancon and others added 10 commits March 12, 2021 12:20
Co-authored-by: Mathieu Besançon <mathieu.besancon@gmail.com>
Co-authored-by: Mathieu Besançon <mathieu.besancon@gmail.com>
Co-authored-by: Mathieu Besançon <mathieu.besancon@gmail.com>
Co-authored-by: Mathieu Besançon <mathieu.besancon@gmail.com>
Co-authored-by: Mathieu Besançon <mathieu.besancon@gmail.com>
Co-authored-by: Mathieu Besançon <mathieu.besancon@gmail.com>
Co-authored-by: Mathieu Besançon <mathieu.besancon@gmail.com>
Incorporate the rest of Mathieu's comments from Github.
Modified tests to account for the new structure of the code.
@alejandro-carderera alejandro-carderera merged commit 37d1431 into master Mar 12, 2021
@alejandro-carderera alejandro-carderera deleted the bcg-enhancements branch March 12, 2021 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants