In [1]:
!pip install Sympy



# Symbolic verification for FR method (using Sympy)

This notebook provides a symbolic verification of the technical statements of Lemma 2.2 and Lemma 2.3 of the paper, which can be stated as follows.

### Lemma 2.2

 Let $f\in\mathcal{F}_{\mu,L}$, and let $x_{k-1},d_{k-1}\in\mathbb{R}^{n}$
and $x_{k}$, $d_{k}$ be generated by the {\FR} method (i.e., $\eta=0$). For any $c_{k-1}\in\mathbb{R}$ such that $\frac{\|d_{k-1}\|^{2}}{\|\nabla f(x_{k-1})\|^{2}}=c_{k-1}$,
where $c_{k-1}>1$, it holds that: 
\begin{equation}
0\leq\beta_{k-1}\leq\frac{1}{c_{k-1}}\frac{\left(1-q+2\sqrt{(c_{k-1}-1)q}\right)^{2}}{4q},\label{eq:case_2_beta_FR}
\end{equation}
where $q \triangleq \frac{\mu}{L}$.

### Proof

First, note that $\beta_{k-1}\geq0$ by definition. The other part of the proof consists of the following
weighted sum of inequalities:

- relation between $\nabla f(x_{k-1})$ and $d_{k-1}$ with weight
$\lambda_{1}=\gamma_{k-1}(L+\mu)-\frac{2\sqrt{\beta_{k-1}}}{\sqrt{(c_{k-1}-1)c_{k-1}}}$: 
$$
0=\left\langle \nabla f(x_{k-1});\,d_{k-1}\right\rangle -\|\nabla f(x_{k-1})\|^{2},
$$
- optimality condition of the line search with weight $\lambda_{2}=\frac{2}{c_{k-1}}-\gamma_{k-1}(L+\mu)$: 

$$0=\left\langle \nabla f(x_{k});\,d_{k-1}\right\rangle ,$$

- definition of $\beta_{k-1}$ with weight $\lambda_{3}=\frac{\sqrt{c_{k-1}-1}}{\sqrt{\beta_{k-1}c_{k-1}}}$: 
$$0=\|\nabla f(x_{k})\|^{2}-\beta_{k-1}\|\nabla f(x_{k-1})\|^{2},$$

- initial condition on the ratio $\frac{\|d_{k-1}\|^{2}}{\|\nabla f(x_{k-1})\|^{2}}$
with weight $\lambda_{4}=-\gamma_{k-1}^{2}L \mu +\frac{\sqrt{\beta_{k-1}}}{c_{k-1}\sqrt{(c_{k-1}-1)c_{k-1}}}$
: 
$$
0=\|d_{k-1}\|^{2}-c_{k-1}\|g_{k-1}\|^{2}$$

- smoothness and strong convexity of $f$ between $x_{k-1}$ and $x_{k}$,
with weight $\lambda_{5}=L-\mu$: 

$$\begin{aligned}0\geq & -f(x_{k-1})+f(x_{k})+\langle\nabla f(x_{k});\,x_{k-1}-x_{k}\rangle+\tfrac{1}{2L}\|\nabla f(x_{k-1})-\nabla f(x_{k})\|^{2}\\
 & \quad+\tfrac{\mu}{2(1-\mu/L)}\|x_{k-1}-x_{k}-\tfrac{1}{L}(\nabla f(x_{k-1})-\nabla f(x_{k}))\|^{2}\\
= & f(x_{k})+\gamma_{k-1}\langle\nabla f(x_{k});\,d_{k-1}\rangle+\tfrac{1}{2L}\|\nabla f(x_{k-1})-\nabla f(x_{k})\|^{2}\\
 & \quad+\tfrac{\mu}{2(1-\mu/L)}\|\gamma_{k-1}d_{k-1}-\tfrac{1}{L}(\nabla f(x_{k-1})-\nabla f(x_{k}))\|^{2}
\end{aligned}$$

- smoothness and strong convexity of $f$ between $x_{k}$ and $x_{k-1}$, with weight $\lambda_{6}=\lambda_{5}$: 

$$\begin{aligned}0\geq & -f(x_{k})+f(x_{k-1})+\langle\nabla f(x_{k-1});\,x_{k}-x_{k-1}\rangle+\tfrac{1}{2L}\|\nabla f(x_{k-1})-\nabla f(x_{k})\|^{2}\\
 & \quad+\tfrac{\mu}{2(1-\mu/L)}\|x_{k-1}-x_{k}-\tfrac{1}{L}(\nabla f(x_{k-1})-\nabla f(x_{k}))\|^{2}\\
= & f(x_{k-1})-\gamma_{k-1}\langle\nabla f(x_{k-1});\,d_{k-1}\rangle+\tfrac{1}{2L}\|\nabla f(x_{k-1})-\nabla f(x_{k})\|^{2}\\
 & \quad+\tfrac{\mu}{2(1-\mu/L)}\|\gamma_{k-1}d_{k-1}-\tfrac{1}{L}(\nabla f(x_{k-1})-\nabla f(x_{k}))\|^{2}
\end{aligned}$$


The weighted sum can be written as: 
$$\begin{aligned}
0 & \geq\lambda_{1}\left[\left\langle \nabla f(x_{k-1});\,d_{k-1}\right\rangle -\|\nabla f(x_{k-1})\|^{2}\right]+\lambda_{2}\left[\left\langle \nabla f(x_{k});\,d_{k-1}\right\rangle \right]\\
 & +\lambda_{3}\left[\|\nabla f(x_{k})\|^{2}-\beta_{k-1}\|\nabla f(x_{k-1})\|^{2}\right]+\lambda_{4}\left[\|d_{k-1}\|^{2}-c_{k-1}\|g_{k-1}\|^{2}\right]\\
 & +\lambda_{5}\Big[f(x_{k})+\gamma_{k-1}\langle\nabla f(x_{k});\,d_{k-1}\rangle+\tfrac{1}{2L}\|\nabla f(x_{k-1})-\nabla f(x_{k})\|^{2}\\
 & \quad\quad+\tfrac{\mu}{2(1-\mu/L)}\|\gamma_{k-1}d_{k-1}-\tfrac{1}{L}(\nabla f(x_{k-1})-\nabla f(x_{k}))\|^{2}\Big]\\
 & +\lambda_{6}\Big[f(x_{k-1})-\gamma_{k-1}\langle\nabla f(x_{k-1});\,d_{k-1}\rangle+\tfrac{1}{2L}\|\nabla f(x_{k-1})-\nabla f(x_{k})\|^{2}\\
 & \quad\quad+\tfrac{\mu}{2(1-\mu/L)}\|\gamma_{k-1}d_{k-1}-\tfrac{1}{L}(\nabla f(x_{k-1})-\nabla f(x_{k}))\|^{2}\Big],
\end{aligned}$$

which can be reformulated exactly as (expand both expressions and
observe that all terms match---this is done symbolically below):

$$\begin{align*}
0\geq & \|\nabla f(x_{k})\|^{2}-\nu(\beta_{k-1},\gamma_{k-1},c_{k-1},\mu,L)\|\nabla f(x_{k-1})\|^{2}\\
 & +\left\Vert \sqrt[4]{\frac{\beta_{k-1}}{(c_{k-1}-1)c_{k-1}^{3}}}d_{k-1}-\sqrt[4]{\frac{\beta_{k-1}c_{k-1}}{c_{k-1}-1}}\nabla f(x_{k-1})+\sqrt[4]{\frac{c_{k-1}-1}{\beta_{k-1}c_{k-1}}}\nabla f(x_{k})\right\Vert ^{2}
\end{align*}$$

where 
$$
\nu(\beta_{k-1},\gamma_{k-1},c_{k-1},\mu,L)=2\sqrt{1-\frac{1}{c_{k-1}}}\sqrt{\beta_{k-1}}-c_{k-1}\gamma_{k-1}^{2}L\mu+\gamma_{k-1}(L+\mu)-1.
$$

The remaining parts are provided in the main text. In the sequel, we provide a symbolical verification for this equivalence.

### Symbolical verification

In [2]:
import sympy as sm

# create symbols for the problem parameters:
q = sm.Symbol('q', positive=True) # mu/L
gamma = sm.Symbol('gamma_{k-1}')
beta = sm.Symbol('beta_{k-1}', positive=True)
L = sm.Symbol('L', positive=True)
mu = L * q

# for enforcing c >= 1, we define c = 1 + cm with cm >= 0.
cm = sm.Symbol('cm', positive=True)
c = 1 + cm

# create symbols for the "primal" variables:
xk = sm.Symbol('x_{k-1}')
gk = sm.Symbol('g_{k-1}')
fk = sm.Symbol('f_{k-1}')
dk = sm.Symbol('d_{k-1}')
gk1 = sm.Symbol('g_{k}')
fk1 = sm.Symbol('f_{k}')

dk1 = gk1 + beta * dk # define d_{k+1} using previous symbols
xk1 = xk - gamma * dk # define x_{k+1} using previous symbols

# inequalities
constraint1 = gk*dk - gk**2 ## =0
constraint2 = gk1 * dk # == 0
constraint3 = gk1**2 - beta * gk**2 # == 0 
constraint4 =  dk**2 - c * gk**2 # == 0
constraint5 = fk1 - fk + gamma * gk1 * dk + 1/2/L * (gk - gk1)**2 + mu/2/(1-q) * (gamma * dk - 1/L * (gk - gk1))**2 # <= 0
constraint6 = fk - fk1 - gamma * gk * dk + 1/2/L * (gk - gk1)**2 + mu/2/(1-q)*(gamma * dk - 1/L * (gk - gk1))**2 # <= 0

# multipliers
lambda1 = gamma*(mu+L) - 2*sm.sqrt(beta)/sm.sqrt((c-1)*c)
lambda2 = (2/c) - gamma*(mu+L)
lambda3 = sm.sqrt(c-1)/sm.sqrt(beta*c)
lambda4 = - (gamma**2)* mu * L + sm.sqrt(beta)/(c*sm.sqrt((c-1)*c))
lambda5 = L-mu
lambda6 = L-mu

# weighted sum
weightedSum = lambda1 * constraint1 + lambda2 * constraint2 + lambda3 * constraint3 + lambda4 * constraint4 + lambda5*constraint5 + lambda6*constraint6
weightedSum = sm.expand(weightedSum)

# target expression

nu = 2*(sm.sqrt(c-1)/sm.sqrt(c))*sm.sqrt(beta) - c*(gamma**2)*mu*L + gamma*(mu+L) - 1
coef1 = sm.root(beta/((c-1)*(c**3)),4)
coef2 = -sm.root((beta*c)/((c-1)),4)
coef3 = (sm.root((c-1)/(beta*c),4))
target = sm.expand(gk1 ** 2 - nu * gk**2 + ( (coef1*dk) + (coef2*gk) + (coef3*gk1) )**2)

# Verify that WeightedSum - Target == 0
sm.simplify(target-weightedSum)

0

### Lemma 2.3

Let $f\in\mathcal{F}_{\mu,L}$, and let $x_{k-1},d_{k-1}\in\mathbb{R}^{n}$
and $x_{k}$, $d_{k}$ be generated by the {\FR} method (i.e., $\eta=0$). For any $c_{k-1}\in\mathbb{R}$ such that $\frac{\|d_{k-1}\|^{2}}{\|\nabla f(x_{k-1})\|^{2}}=c_{k-1}$,
where $c_{k-1}>1$, it holds that: 
\begin{equation}
\frac{\|d_{k}\|^{2}}{\|\nabla f(x_{k})\|^{2}}\leq c_{k}\triangleq1+\frac{\left(1-q+2\sqrt{(c_{k-1}-1)q}\right)^{2}}{4q},\label{eq:FR_angle}
\end{equation}
with $q\triangleq\frac{\mu}{L}$. Equivalently, $\|d_{k}-\nabla f(x_{k})\|\leq\epsilon\|\nabla f(x_{k})\|$
holds with $\epsilon=\sqrt{1-\frac{1}{c_{k}}}$.


### Proof

The proof consists of the following weighted sum of inequalities: 

- optimality condition of the line search with weight $\lambda_{1}=2\beta_{k-1}$:

$$
0=\langle\nabla f(x_{k});d_{k-1}\rangle,$$

- the quality of the search direction with weight $\lambda_{2}=\beta_{k-1}^{2}$:

$$0=\|d_{k-1}\|^{2}-c_{k-1}\|\nabla f(x_{k-1})\|^{2},$$

- definition of $\beta_{k-1}$ with weight $\lambda_{3}=-c_{k-1}\beta_{k-1}$:

$$\begin{aligned}0=\|\nabla f(x_{k})\|^{2}-\beta_{k-1}\|\nabla f(x_{k-1})\|^{2}.\end{aligned}$$

The weighted sum can be written as 
$$
\begin{aligned}0= & \lambda_{1}\left[\langle\nabla f(x_{k});d_{k-1}\rangle\right]+\lambda_{2}\left[\|d_{k-1}\|^{2}-c_{k-1}\|\nabla f(x_{k-1})\|^{2}\right]+\lambda_{3}\left[\|\nabla f(x_{k})\|^{2}-\beta_{k-1}\|\nabla f(x_{k-1})\|^{2}\right],\end{aligned}
$$  
and can be reformulated exactly as (expand both expressions and
observe that all terms match---this is done symbolically below) 
$$
\begin{aligned}0=\|d_{k}\|^{2}-(1+c_{k-1}\beta_{k-1})\|\nabla f(x_{k})\|^{2}.
\end{aligned}
$$

The remaining parts are provided in the main text. In the sequel, we provide a symbolical verification for this equivalence.

### Symbolical verification

In [3]:
import sympy as sm

# create symbols for the problem parameters:
q = sm.Symbol('q') # mu/L
gamma = sm.Symbol('gamma_{k-1}')
beta = sm.Symbol('beta_{k-1}')
c = sm.Symbol('c_{k-1}')

# create symbols for the "primal" variables:
gk = sm.Symbol('g_{k-1}')
dk = sm.Symbol('d_{k-1}')
gk1 = sm.Symbol('g_{k}')

dk1 = gk1 + beta * dk # define d_{k+1} using previous symbols

# inequalities
constraint1 = gk1 * dk # == 0
constraint2 = dk**2 - c * gk**2 # == 0
constraint3 = gk1**2 - beta * gk**2 # == 0

lambda1 = 2*beta
lambda2 = beta**2 
lambda3 = -c * beta 

# weighted sum
weightedSum = lambda1 * constraint1 + lambda2 * constraint2 + lambda3 * constraint3

# the weighted sum should be equal to the target reformulation
ck1 = (1 + c * beta)

target = dk1 ** 2 - ck1 * gk1**2 

# Verify that WeightedSum - Target == 0
sm.simplify(target-weightedSum)

0