# Trust-Region (TR)


### Overview
TR methods define a region around the current iterate wich they trust a _model function_ to be a good representation of the objective function, and choose the step to be the approximate minimizer of the model in this region.
Thus, the step and direction on choosen by the minimization in the region.
In general, the _model function_ $m_k$ that is used at the $x_k$ iterate is the quadratic model from the Taylor-series expansion of the objective function $f$ around $x_k$, i.e.  

$$f(x_k + p) = f_k +g_k^T p + \frac12p^T \nabla^2 f(x_K + tp) p $$  

where scalar $t$ is in $(0,1)$ and $g_k = \nabla f(x_k)$.
For ease of computation, we generally approximate the Hessian $\nabla^2 f(x_K + tp)$ with a symmetric matrix $B_k$. Then our model function $m_k$ becomes  

$$m_k(p) = f_k + g^T_k p + \frac12p^TB_kp, $$  
and by the definition of $t$, our introduction of $B_K$, and our use of second-order Taylor Series Truncation, we expect $||f(x_k + p)-m_k(p)|| \approx O(||p||^2)$.
When $B_k = \nabla^2f(x_k)$, as it does in a TR Newton method, $O(||p||^3)$, which is very small when $p$ is small; however, we are concerned about quasi-Newton methods. Then to obtain each step to progress to the $x_{k+1}$ iterrate, we seek a solution of the subproblem  

$$ \min_{p\in\mathbb{R^n}} m_k(p) = f_k + g^T_k p + \frac12 p^T B_k p ~ : ||p|| \leq \Delta_k, $$  
where $\Delta_k$ is a choosen trust region radius at the $k$th step.
We refer to the solution our minimization problem as $p_k^*$.
Note, when $||\cdot||$ is not given a subscript we are reffering to the Euclidean norm.
Thus, our constraint condition $||p|| \leq \Delta_k$ is equivalent to $p^Tp \leq \Delta_k^2$.
- When B_k is positive definite, such as in a steepest descent or a Newton direction for a Linear objective function, we have an analytic solution, i.e. $p_k^* = -B_k^{-1} g_k.$  

### Selection of TR Radius, $\Delta_k$   
We use the information at the $k-1$ iteration to determine the trust-region radius on the $kth$ iteration.  
Specifically, we compute the ratio of the _actual reductions_ to the _predicted reduction_ as given by  

$$ \rho_k = \frac{f(x_k) - f(x_k + p_k^*)}{m_k(0) - m_k(p_k^*)}. $$  
Since $p^*_k$ is determined over a region that includes $p = \vec{0}$, and $p_k^*$ is a minimizer of $m_k$ in $\Delta_k$, our _predicted reduction_ will always be nonnegative.  

From this we infer, if $\rho_k$ is negative it is a consequence of the numerator and that $f(x_k + p^*_k) > f(x_k)$. And if we enforce our TR scheme to be _monotone_, i.e. $f(x_k) < f(x_{k-1})$, we must reject our current step of $x_k + tp^*_k$ and perform the subproblem again with a smaller $\Delta_k$.
Similarly, we can accept any positive valule of $\rho_k$ with saftery that our scheme is monotone. If $\rho_k$ is close to $1$, it is the case that $m_k$ did a satisfactory job of approximating $f$ over our trust region and we should increase our trust-region radius at the next iteration.
If $\rho_k$ is small and positive, we should shrink $\Delta_k$ at the next iteration. 

The following algorithm, `tr(..)` conforms to our described selection process, but we must have a means to solve $p^*_k$ before we can implement it.  

### Trust-Region Subproblem  
As in a line-search, we can approximate $p_k^*$ and still preserve global convergence as long as $p_k^*$ gives a _sufficient reduction_ in the model function. One simple procedurce is to calculate the Cauchy point, which is inexpensive, robust, and does a satisfactory job. This is performed below, in the implementation sketch of `tr(..)`. Note, there are many improvents to the Cauchy point method, building upon it while exploiting properties of $B_k$. 

using LinearAlgebra

Takes a radius, a max radius, an objective function f, and a starting guess for x^*  

function tr(Δk, ΔM, f, x)  

    while # TODO stop criterion   
        # TODO calculate gk, Bk, fk, m = (x) -> ...  
        
        # Cauchy point approx p*_k as p  
        gk_norm = norm(gk)   
        tau = gk' * Bk * gk # currently a place holder  
        if tau <= 0         # then mk is not convex quadratic  
            tau = 1  
        else  
            tau = min(gk_norm^3/(Δk * tau), 1)  
        end  
        p = -tau Δk / gk_norm * gk  
        
        # Handle step selection  
        rho = (f(x) - f(x + p)) / (m(0) - m(p))  
        if rho =< 0.25  
            Δk = 0.25 * Δk  
        elseif rho > 0.75 && norm(rho) ≈ Δk # else we don't increase Δk  
            Δk = min(2*Δk, ΔM)  
        end  
        
        # Update xk if it makes sense (upper bound can be reduced)  
        if rho < 0.25 # else don't update xk, reject and perform iteration again at reduced Δk  
            xk += p  
        end    
    end  
    x # returns a global minimizer  
end

# 


# 



# 


# 


# 