# Symbolic verification for FR method (using the Wolfram Language)

This notebook provides a symbolic verification of the technical statements of Lemma 2.2 and Lemma 2.3 of the paper, which can be stated as follows.

### Lemma 2.2

 Let $f\in\mathcal{F}_{\mu,L}$, and let $x_{k-1},d_{k-1}\in\mathbb{R}^{n}$
and $x_{k}$, $d_{k}$ be generated by the {\FR} method (i.e., $\eta=0$). For any $c_{k-1}\in\mathbb{R}$ such that $\frac{\|d_{k-1}\|^{2}}{\|\nabla f(x_{k-1})\|^{2}}=c_{k-1}$,
where $c_{k-1}>1$, it holds that: 
\begin{equation}
0\leq\beta_{k-1}\leq\frac{1}{c_{k-1}}\frac{\left(1-q+2\sqrt{(c_{k-1}-1)q}\right)^{2}}{4q},\label{eq:case_2_beta_FR}
\end{equation}
where $q \triangleq \frac{\mu}{L}$.

### Proof

First, note that $\beta_{k-1}\geq0$ by definition. The other part of the proof consists of the following
weighted sum of inequalities:

- relation between $\nabla f(x_{k-1})$ and $d_{k-1}$ with weight
$\lambda_{1}=\gamma_{k-1}(L+\mu)-\frac{2\sqrt{\beta_{k-1}}}{\sqrt{(c_{k-1}-1)c_{k-1}}}$: 
$$
0=\left\langle \nabla f(x_{k-1});\,d_{k-1}\right\rangle -\|\nabla f(x_{k-1})\|^{2},
$$
- optimality condition of the line search with weight $\lambda_{2}=\frac{2}{c_{k-1}}-\gamma_{k-1}(L+\mu)$: 

$$0=\left\langle \nabla f(x_{k});\,d_{k-1}\right\rangle ,$$

- definition of $\beta_{k-1}$ with weight $\lambda_{3}=\frac{\sqrt{c_{k-1}-1}}{\sqrt{\beta_{k-1}c_{k-1}}}$: 
$$0=\|\nabla f(x_{k})\|^{2}-\beta_{k-1}\|\nabla f(x_{k-1})\|^{2},$$

- initial condition on the ratio $\frac{\|d_{k-1}\|^{2}}{\|\nabla f(x_{k-1})\|^{2}}$
with weight $\lambda_{4}=-\gamma_{k-1}^{2}L \mu +\frac{\sqrt{\beta_{k-1}}}{c_{k-1}\sqrt{(c_{k-1}-1)c_{k-1}}}$
: 
$$
0=\|d_{k-1}\|^{2}-c_{k-1}\|g_{k-1}\|^{2}$$

- smoothness and strong convexity of $f$ between $x_{k-1}$ and $x_{k}$,
with weight $\lambda_{5}=L-\mu$: 

$$\begin{aligned}0\geq & -f(x_{k-1})+f(x_{k})+\langle\nabla f(x_{k});\,x_{k-1}-x_{k}\rangle+\tfrac{1}{2L}\|\nabla f(x_{k-1})-\nabla f(x_{k})\|^{2}\\
 & \quad+\tfrac{\mu}{2(1-\mu/L)}\|x_{k-1}-x_{k}-\tfrac{1}{L}(\nabla f(x_{k-1})-\nabla f(x_{k}))\|^{2}\\
= & f(x_{k})+\gamma_{k-1}\langle\nabla f(x_{k});\,d_{k-1}\rangle+\tfrac{1}{2L}\|\nabla f(x_{k-1})-\nabla f(x_{k})\|^{2}\\
 & \quad+\tfrac{\mu}{2(1-\mu/L)}\|\gamma_{k-1}d_{k-1}-\tfrac{1}{L}(\nabla f(x_{k-1})-\nabla f(x_{k}))\|^{2}
\end{aligned}$$

- smoothness and strong convexity of $f$ between $x_{k}$ and $x_{k-1}$, with weight $\lambda_{6}=\lambda_{5}$: 

$$\begin{aligned}0\geq & -f(x_{k})+f(x_{k-1})+\langle\nabla f(x_{k-1});\,x_{k}-x_{k-1}\rangle+\tfrac{1}{2L}\|\nabla f(x_{k-1})-\nabla f(x_{k})\|^{2}\\
 & \quad+\tfrac{\mu}{2(1-\mu/L)}\|x_{k-1}-x_{k}-\tfrac{1}{L}(\nabla f(x_{k-1})-\nabla f(x_{k}))\|^{2}\\
= & f(x_{k-1})-\gamma_{k-1}\langle\nabla f(x_{k-1});\,d_{k-1}\rangle+\tfrac{1}{2L}\|\nabla f(x_{k-1})-\nabla f(x_{k})\|^{2}\\
 & \quad+\tfrac{\mu}{2(1-\mu/L)}\|\gamma_{k-1}d_{k-1}-\tfrac{1}{L}(\nabla f(x_{k-1})-\nabla f(x_{k}))\|^{2}
\end{aligned}$$


The weighted sum can be written as: 
$$\begin{aligned}
0 & \geq\lambda_{1}\left[\left\langle \nabla f(x_{k-1});\,d_{k-1}\right\rangle -\|\nabla f(x_{k-1})\|^{2}\right]+\lambda_{2}\left[\left\langle \nabla f(x_{k});\,d_{k-1}\right\rangle \right]\\
 & +\lambda_{3}\left[\|\nabla f(x_{k})\|^{2}-\beta_{k-1}\|\nabla f(x_{k-1})\|^{2}\right]+\lambda_{4}\left[\|d_{k-1}\|^{2}-c_{k-1}\|g_{k-1}\|^{2}\right]\\
 & +\lambda_{5}\Big[f(x_{k})+\gamma_{k-1}\langle\nabla f(x_{k});\,d_{k-1}\rangle+\tfrac{1}{2L}\|\nabla f(x_{k-1})-\nabla f(x_{k})\|^{2}\\
 & \quad\quad+\tfrac{\mu}{2(1-\mu/L)}\|\gamma_{k-1}d_{k-1}-\tfrac{1}{L}(\nabla f(x_{k-1})-\nabla f(x_{k}))\|^{2}\Big]\\
 & +\lambda_{6}\Big[f(x_{k-1})-\gamma_{k-1}\langle\nabla f(x_{k-1});\,d_{k-1}\rangle+\tfrac{1}{2L}\|\nabla f(x_{k-1})-\nabla f(x_{k})\|^{2}\\
 & \quad\quad+\tfrac{\mu}{2(1-\mu/L)}\|\gamma_{k-1}d_{k-1}-\tfrac{1}{L}(\nabla f(x_{k-1})-\nabla f(x_{k}))\|^{2}\Big],
\end{aligned}$$

which can be reformulated exactly as (expand both expressions and
observe that all terms match---this is done symbolically below):

$$\begin{align*}
0\geq & \|\nabla f(x_{k})\|^{2}-\nu(\beta_{k-1},\gamma_{k-1},c_{k-1},\mu,L)\|\nabla f(x_{k-1})\|^{2}\\
 & +\left\Vert \sqrt[4]{\frac{\beta_{k-1}}{(c_{k-1}-1)c_{k-1}^{3}}}d_{k-1}-\sqrt[4]{\frac{\beta_{k-1}c_{k-1}}{c_{k-1}-1}}\nabla f(x_{k-1})+\sqrt[4]{\frac{c_{k-1}-1}{\beta_{k-1}c_{k-1}}}\nabla f(x_{k})\right\Vert ^{2}
\end{align*}$$

where 
$$
\nu(\beta_{k-1},\gamma_{k-1},c_{k-1},\mu,L)=2\sqrt{1-\frac{1}{c_{k-1}}}\sqrt{\beta_{k-1}}-c_{k-1}\gamma_{k-1}^{2}L\mu+\gamma_{k-1}(L+\mu)-1.
$$

The remaining parts are provided in the main text. In the sequel, we provide a symbolical verification for this equivalence.

### Symbolical verification

In [101]:
(*Clear memory and all the variables*)
ClearAll["Global`*"];
Remove["Global`*"];
SetOptions[EvaluationNotebook[], 
  CellEpilog :> SelectionMove[EvaluationNotebook[], Next, Cell]];
SetOptions[$FrontEnd, "FileChangeProtection" -> None];
(*You may get the warning:
"Remove::rmnsm: There are no symbols matching "Global`*"."
if you run this block twice,
but that is fine
*)

In [107]:
(* For notational convenience, in the code we let c[k-1] ≜ c, β[k-1] ≜ β, γ[k-1] ≜ γ*)

(* System of NCGM *)
(* ============== *)

d[k] = g[k] + β d[k - 1];

x[k] = x[k - 1] - γ d[k - 1];

(* Constraints in consideration *)
(* ============================ *)

constraint1 = g[k - 1] d[k - 1] - g[k - 1]^2 (* == 0*);

constraint2 = g[k] d[k - 1] (* == 0*);

constraint3 = g[k]^2 - β g[k - 1]^2 (* == 0*);

constraint4 = d[k - 1]^2 - c g[k - 1]^2 (* == 0*);

constraint5 = 
  f[k] - f[k - 1] + γ g[k] d[k - 1] + (g[k - 1] - g[k])^2/(
   2 L) + (μ (γ d[k - 1] - (g[k - 1] - g[k])/L)^2)/(
   2 (1 - μ/L)) (* <= 0*);
   
constraint6 = 
  f[k - 1] - f[k] - γ g[k - 1] d[k - 1] + (g[k - 1] - g[k])^2/(
   2 L) + (μ (γ d[k - 1] - (g[k - 1] - g[k])/L)^2)/(
   2 (1 - μ/L)) (* <= 0*);
   
(* Weight λ"i" for constraint"i" *)   
(* ============================ *)
   
λ1 = -((2 Sqrt[β])/(Sqrt[-1 + c] Sqrt[c])) + γ (L + μ);
    
λ2 = -γ (L + μ) + 2/c;

λ3 = Sqrt[c - 1]/(Sqrt[β] Sqrt[c]);

λ4 = 
  Sqrt[β]/(c (Sqrt[-1 + c] Sqrt[c])) - L γ^2 μ;
  
λ5 = L - μ;

λ6 = L - μ;


(* Weighted sum *)   
(* ==============*)

WeightedSum = ((λ1*constraint1) + (λ2*
       constraint2) + (λ3*constraint3) + (λ4*
       constraint4) + (λ5*constraint5) + (λ6*
       constraint6) // FullSimplify //Expand);
       
(* Target expresion*)   
(* ================*)       
       
ν = (2 Sqrt[c - 1] Sqrt[β])/Sqrt[
   c] + (-c L γ^2 μ + γ (L + μ) - 1);
   
a2 = -((c β)/(-1 + c))^(1/4); 

a3 = 1/((c β)/(-1 + c))^(
 1/4); 
 
a4 = (β/((-1 + c) c^3))^(1/4);

positiveTerm1 = (a4 d[k - 1] + a2 g[k - 1] + a3 g[k])^2;

restTerm2 = -ν g[k - 1]^2 + g[k]^2;

SimplifiedTerm = 
 Assuming[β > 0 && c > 1, 
   Simplify[(positiveTerm1 + restTerm2)]]//Expand;       

(* See if both term matches *)
TermDiff = Assuming[β > 0 && c > 1 , 
 FullSimplify[WeightedSum - SimplifiedTerm]]

### Lemma 2.3

Let $f\in\mathcal{F}_{\mu,L}$, and let $x_{k-1},d_{k-1}\in\mathbb{R}^{n}$
and $x_{k}$, $d_{k}$ be generated by the {\FR} method (i.e., $\eta=0$). For any $c_{k-1}\in\mathbb{R}$ such that $\frac{\|d_{k-1}\|^{2}}{\|\nabla f(x_{k-1})\|^{2}}=c_{k-1}$,
where $c_{k-1}>1$, it holds that: 
\begin{equation}
\frac{\|d_{k}\|^{2}}{\|\nabla f(x_{k})\|^{2}}\leq c_{k}\triangleq1+\frac{\left(1-q+2\sqrt{(c_{k-1}-1)q}\right)^{2}}{4q},\label{eq:FR_angle}
\end{equation}
with $q\triangleq\frac{\mu}{L}$. Equivalently, $\|d_{k}-\nabla f(x_{k})\|\leq\epsilon\|\nabla f(x_{k})\|$
holds with $\epsilon=\sqrt{1-\frac{1}{c_{k}}}$.


### Proof

The proof consists of the following weighted sum of inequalities: 

- optimality condition of the line search with weight $\lambda_{1}=2\beta_{k-1}$:

$$
0=\langle\nabla f(x_{k});d_{k-1}\rangle,$$

- the quality of the search direction with weight $\lambda_{2}=\beta_{k-1}^{2}$:

$$0=\|d_{k-1}\|^{2}-c_{k-1}\|\nabla f(x_{k-1})\|^{2},$$

- definition of $\beta_{k-1}$ with weight $\lambda_{3}=-c_{k-1}\beta_{k-1}$:

$$\begin{aligned}0=\|\nabla f(x_{k})\|^{2}-\beta_{k-1}\|\nabla f(x_{k-1})\|^{2}.\end{aligned}$$

The weighted sum can be written as 
$$
\begin{aligned}0= & \lambda_{1}\left[\langle\nabla f(x_{k});d_{k-1}\rangle\right]+\lambda_{2}\left[\|d_{k-1}\|^{2}-c_{k-1}\|\nabla f(x_{k-1})\|^{2}\right]+\lambda_{3}\left[\|\nabla f(x_{k})\|^{2}-\beta_{k-1}\|\nabla f(x_{k-1})\|^{2}\right],\end{aligned}
$$  
and can be reformulated exactly as (expand both expressions and
observe that all terms match---this is done symbolically below) 
$$
\begin{aligned}0=\|d_{k}\|^{2}-(1+c_{k-1}\beta_{k-1})\|\nabla f(x_{k})\|^{2}.
\end{aligned}
$$

The remaining parts are provided in the main text. In the sequel, we provide a symbolical verification for this equivalence.

### Symbolical verification

In [151]:
(*Clear memory and all the variables*)
ClearAll["Global`*"];
Remove["Global`*"];
SetOptions[EvaluationNotebook[], 
  CellEpilog :> SelectionMove[EvaluationNotebook[], Next, Cell]];
SetOptions[$FrontEnd, "FileChangeProtection" -> None];
(*You may get the warning:
"Remove::rmnsm: There are no symbols matching "Global`*"."
if you run this block twice,
but that is fine
*)

In [157]:
(* For notational convenience, in the code we let c[k-1] ≜ c, β[k-1] ≜ β, γ[k-1] ≜ γ*)

(* System of NCGM *)
(* ============== *)

d[k] = g[k] + β d[k - 1];

x[k] = x[k - 1] - γ d[k - 1];

(* Constraints in consideration *)
(* ============================ *)

constraint1 = g[k] d[k - 1] (* == 0*);

constraint2 = d[k - 1]^2 - c g[k - 1]^2 (* == 0*);

constraint3 = g[k]^2 - β g[k - 1]^2 (* == 0*);

   
(* Weight λ"i" for constraint"i" *)   
(* ============================= *)
   
λ1 = 2 β;
    
λ2 = β^2;

λ3 = -c β;

(* Weighted sum *)   
(* ==============*)

WeightedSum = ( (λ1*constraint1) + (λ2*constraint2) + (λ3*constraint3)) // FullSimplify;

(* Target expresion*)   
(* ================*) 

SimplifiedTerm = 
 Assuming[β > 0 && c > 1, d[k]^2 - (1+c β) g[k]^2];
 
(* See if both term matches *)

TermDiff = Assuming[β > 0 && c > 1 , 
 FullSimplify[WeightedSum - SimplifiedTerm]] 

#### Verify if

$$
\left(\sqrt{\frac{c_{k-1}-1}{c_{k-1}}}+\sqrt{\frac{(\mu+L)^{2}}{4c_{k-1}\mu L}-\frac{1}{c_{k-1}}}\right)^{2}
 =1+\frac{(L-\mu)}{c_{k-1}}\sqrt{\frac{(c_{k-1}-1)}{\mu L}}+\frac{\mu^{2}-6\mu L+L^{2}}{4c_{k-1}\mu L},
$$ 


In [184]:
(*Clear memory and all the variables*)
ClearAll["Global`*"];
Remove["Global`*"];
SetOptions[EvaluationNotebook[], 
  CellEpilog :> SelectionMove[EvaluationNotebook[], Next, Cell]];
SetOptions[$FrontEnd, "FileChangeProtection" -> None];
(*You may get the warning:
"Remove::rmnsm: There are no symbols matching "Global`*"."
if you run this block twice,
but that is fine
*)

In [190]:
βMax1 = 
  1 + ((L - μ) Sqrt[(-1 + c)/(L μ)])/c + (
   L^2 - 6 L μ + μ^2)/(4 c L μ);
   
βMax2 = (Sqrt[c - 1]/Sqrt[c] + 
    Sqrt[(μ + L)^2/(4 c μ L) - 1/c])^2;
    
Assuming[c > 1 && μ > 0 && μ < L, 
 Simplify[βMax1 - βMax2]]