-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] BCG enhancements #121
Conversation
Merge the new BCG structure into main before it diverges too much.
Write the problem in barycentric coordinates in closed form if the objective function is quadratic. Need to verify that the code works well for an active set with dense atoms (right now all my testing is with MaybeHotVectors), and perform more tests with the code. Next step is to implement Nesterov's AGD.
src/blended_cg.jl
Outdated
end | ||
|
||
#In case the active set atoms are not MaybeHotVectors | ||
function build_reduced_problem(active_set::AbstractVector{<:Array}, weights, gradient, hessian) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it for all possible types of array (except maybe hot) or just dense arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just for dense arrays for now. I was thinking of adding sparse arrays afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok perfect
src/blended_cg.jl
Outdated
n = length(atoms[1]) | ||
k = length(atoms) | ||
#Construct the matrix of vertices. | ||
vertex_matrix = zeros(n, k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the atoms are sparse, this should be spzeros(...)
to create a sparse matrix
return number_of_steps | ||
end | ||
|
||
function simplex_gradient_descent_over_probability_simplex( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add docstring on top of the function, like:
"""
this function does this, the arguments are ...
"""
function simplex_gradient_descent_over_probability_simplex(
examples/blended_cg.jl
Outdated
f(x) = norm(x - xp)^2 | ||
function grad!(storage, x) | ||
@. storage = 2 * (x - xp) | ||
end | ||
hessian = Matrix(1.0I, n, n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Hessian should probably be 2I
here
hessian = Matrix(1.0I, n, n) | |
hessian = Matrix(2I, n, n) |
Added Nesterov's AGD and fixed some bugs. Many tests remaining, and I need to incorporate the comments from Mathieu above.
Reused some computations and fixed some bugs. Also added some function headers. Still testing...
Add an example over the probability simplex that utilizes MaybeHotVectors, and an example over the K-Sparse LMO that uses sparse arrays.
examples/blended_cg.jl
Outdated
const xp = xpi # ./ total; | ||
|
||
f(x) = norm(x - xp)^2 | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the triple quote for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a typo, I was commenting out the first test because I only wanted to check that the second test would work!
src/blended_cg.jl
Outdated
|
||
#Construct the matrix of vertices. | ||
vertex_matrix = zeros(n, k) | ||
#reduced_linear = zeros(k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed commented out lines once they are unused
equal to the cardinality of the active set, the objective | ||
function is: | ||
f(λ) = reduced_linear^T λ + 0.5 * λ^T reduced_hessian λ | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docstring should specify in which case the function returns (nothing, nothing)
src/blended_cg.jl
Outdated
""" | ||
function update_simplex_gradient_descent!( | ||
|
||
function minimize_over_convex_hull( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function acts by modifying the weights on the active set, by convention we should have a !
on its name
storage | ||
end | ||
#Solve using gradient descent. | ||
if !accelerated || L_reduced / mu_reduced == 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L_reduced / mu_reduced == 1.0
do we want to check for exactly 1.0 here? Floating point errors could occur
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case where L_reduced / mu_reduced == 1.0 there is no benefit from using an accelerated algorithm, as the accelerated algorithms give you an improvement from O(L_reduced / mu_reduced log(1\epsilon)) to O(sqrt(L_reduced / mu_reduced) log(1\epsilon)) (from standard gradient descent) in the convergence rate. For the case where L_reduced / mu_reduced == 1.0 these two guarantees are equal, so just gradient descent. Plus, the definition of the stepsize in the algorithm depends on the condition number, and in the case where L_reduced / mu_reduced == 1.0 the step sizes blow up.
Co-authored-by: Mathieu Besançon <mathieu.besancon@gmail.com>
Co-authored-by: Mathieu Besançon <mathieu.besancon@gmail.com>
Co-authored-by: Mathieu Besançon <mathieu.besancon@gmail.com>
Co-authored-by: Mathieu Besançon <mathieu.besancon@gmail.com>
Co-authored-by: Mathieu Besançon <mathieu.besancon@gmail.com>
Co-authored-by: Mathieu Besançon <mathieu.besancon@gmail.com>
Co-authored-by: Mathieu Besançon <mathieu.besancon@gmail.com>
Incorporate the rest of Mathieu's comments from Github.
Modified tests to account for the new structure of the code.
Reorder BCG to run on a more natural way, where there are two steps:
That way we avoid recomputing quantities, like for example the strong-wolfe gap in the main algorithm, and then c = [fast_dot(gradient, a) for a in active_set.atoms] inside the simplex descent algorithm, which computes the same quantities as those in the strong-wolfe gap computation. Writing BCG this way allows us to make the code faster when the objective function is a quadratic (like in the CINDy case!). As you can make many inner simplex steps reusing quantities.
Things that I still need to complete: