# Local Descent

In the general sense, local descent methods provide an optimization framework for solving multivariate functions to converge to some local minima based on some convergence criterion (i.e., gradient descent).

A common approach is to incrementally improve the solution by taking steps in the descent direction (e.g., negative of gradient) of the function at the current point until a terminal or convergence condition is reached.

$x_{k+t} = x_k + α_k * d_k$

where $x_k$ is the current point, $α_k$ is the step size, and $d_k$ is the descent direction.

## Line Search

In this first form of local descent, we will use a line search to find the optimal step size $α_k$ followed by computing the next point $x_{k+t}$.

$minimize \space f(x_k + α_k * d_k)$

note this is still a univariate function.

### Line Search via Backtracking

In this type of algorithm we seek to find the optimum α for a direcitonal descent method.
        
In order to do this we need to define the two Wolfe Conditions:

1. Sufficient decrease in α
    1. $f(x_k) <= f(x) + α * t * dot(df(x), d)$ where $x$ is the design point, $d$ is the direction, and $t$ is the step size
2. Curvature condition
    1. $|∇d(k) * f(x(k+1))| ≤ −σ * dot(∇d(k), f(x(k)))$ where $σ$ is a shallowness indicator typically between $β < σ < 1$

These two conditions form the strong backtracking algorithm. In the below 

In [1]:
using Plots
using LinearAlgebra
using OptiMize: complex_step, plot_min_of_f

In [2]:
function line_search_backtracking(f, df, x, d, n; α=0.5, β=0.8, σ=0.9)
    t = 1.0
    
    for i in 1:n
        # Calculate x_k update
        x_k = x + t * d

        # Caclulate both wolfe conditions
        armijo_condition = f(x_k) <= f(x) .+ α * t * dot(df(x), d)
        curvature_condition = abs(dot(df(x_k), d)) <= -σ * dot(df(x), d)
        
        # Terminate optimization of alpha if wolfe conditions are met
        if armijo_condition && curvature_condition
            return t
        end
        
        # Update step size by beta
        t *= β
    end
    
    return t
end

line_search_backtracking (generic function with 1 method)

In [3]:
# Define the function and its derivative
f(x) = x.^2
df(x) = complex_step(f, x)

# Initial guess and search direction
x0 = [4.]
d = -df(x0) # Search direction

# Perform the line search
step_size = line_search_backtracking(f, df, x0, d, 10)

println("Optimal step size: $step_size")

Optimal step size: 0.40960000000000013


### Trust Region Methods

One primary issue with direct descent is that they tend to either prematurely converge or overshoot. A way around this issue is to deploy trust regions which contract or expand the region of descent depending on how aligned the prediction was to the actual improvement.

$minimize f\hat(x_k)$ s.t. $||x - x_k|| \le δ$

1. where $x$ is our design point, $x_k$ is the next design point, and $δ$ is trust region's radius. This is also formulated as a constrained optimization problem.

1. $δ$ is exapnded or contracted depending on the model's predictive performance.

1. $η = actual / predicted = (f(x) - f(x_k)) \div (f(x) - f\hat(x_hat))$

1. we'd like to then iteratively improve this performance by scaling δ based on performance η utilizing scale factor γ

In [4]:
function trust_region_descent(f, df, d2f, x0, δ, n; η1=0.25, η2=0.5, γ1=0.5, γ2=2.0)
    x = copy(x0)
    
    for i in 1:n
        gradient = df(x)
        hessian = d2f(x)
        
        # Solve the trust region subproblem: minimize m(s) = f(x) + gradient' * s + 0.5 * s' * hessian * s subject to ||s|| <= Δ
        rhs = -gradient
        lhs = hessian + δ * I  # Add Δ * I to the Hessian matrix for the trust region constraint
        s = zeros(size(x))
        s = lhs \ rhs
        
        actual_reduction = f(x + s) - f(x)
        predicted_reduction = -0.5 * dot(gradient, s) - 0.5 * dot(s, hessian * s)
        rho = actual_reduction / predicted_reduction
        
        if rho < η1
            δ *= γ1
        elseif rho > η2
            δ *= γ2
        end
        
        x += s
    end
    
    return x
end

trust_region_descent (generic function with 1 method)

In [5]:
# Define the function to minimize, its gradient, and Hessian
# TODO solve this with differentiate functions (update to differentiate mxn matrices)
f(x) = x[1]^2 + x[2]^2
df(x) = [2x[1], 2x[2]]
d2f(x) = [2 0; 0 2]

# Initial guess, trust region size, and tolerance
x0 = [2.0, 3.0]
Δ = 0.5

# Run the trust region descent algorithm
result = trust_region_descent(f, df, d2f, x0, Δ, 10)

println("Optimal solution: $result")

Optimal solution: [3.412205993379098e-20, 5.1183089900686466e-20]
