# Summary
This document corresponds to Exercise 1 of [this file](https://github.com/PerformanceEstimation/Learning-Performance-Estimation/blob/main/Course.pdf).

The goal of this exercise is to compute the value of the worst-case ratio $\frac{\|x_{k+1}-x_\star\|^2}{\|x_k-x_\star\|^2}$ when $x_\star=\textrm{argmin}_x f(x)$ for some $f$ that is $L$-smooth and $\mu$-strongly convex, and where $x_{k+1}=x_k-\gamma_k \nabla f(x_k)$ is obtained from a gradient step (with stepsize $\gamma_k$) from $x_k$.

If [PEPit](https://pypi.org/project/PEPit/) is not already installed, please execute the following cell.

In [None]:
!pip install pepit

Exercises 1.1 to 1.6 are presented in the document, along with their corrections, and do not involve any numerics.

### Exercise 1.7
Complete the code for computing the worst-case behavior of the ratio $\frac{\|x_{k+1}-x_\star\|^2}{\|x_k-x_\star\|^2}$ (we use $k=0$ without loss of generality and for readability below).

As seen in the exercise file, this is done by looking for the worst-case value of $\|x_{1}-x_\star\|^2$ (this is often referred to as the "performance measure" in the performance estimation framework, and corresponds to the objective function of the problem of computing the worst-case ratio), when $\|x_0-x_\star\|^2 =1$ (which is often referred to as an "initial condition", as it quantifies the quality of the "initial" iterate).

To see how to specify such things within the PEPit framework, we refer to [the documentation](https://pepit.readthedocs.io/), which contains numerous examples. In particular, one can check [this example](https://pepit.readthedocs.io/en/0.1.0/_modules/PEPit/examples/composite_convex_minimization/proximal_gradient.html#wc_proximal_gradient), which is a bit more complex than what is asked in the exercise, but very related.


In [None]:
from PEPit import PEP
from PEPit.functions import SmoothStronglyConvexFunction

def wc_gradient(L, mu, gamma, verbose=1):
    # It is intended to compute the worst-case convergence of gradient descent in terms of the distance to 
    # an optimal solution: || x_{k+1} - x_\star ||^2 / || x_k - x_\star ||^2.
    # Note that we use k = 0 in the code below for readability.
    
    # Instantiate PEP
    problem = PEP()

    # Declare a strongly convex smooth function and a closed convex proper function
    f = problem.declare_function(SmoothStronglyConvexFunction, mu=mu, L=L)

    # Start by defining its unique optimal point xs = x_\star
    xs = f.stationary_point()

    # Then define the point x0 of the algorithm
    x0 = problem.set_initial_point()    
    
    # Perform one iteration of gradient descent
    x = # TODO complete this line. Hint: use f.gradient() to call the gradient of f a a given point.

    # Set the "performance metric" to the distance between x1 and xs
    # TO COMPLETE (use "problem.set_performance_metric" to specify the objective function of the SDP)
    problem.set_performance_metric( ) # TODO complete this line
    
    
    # Set the "initial condition" to the distance between x0 and xs
    # TO COMPLETE (use "problem.set_initial_condition" or "problem.add_constraint" to specify the
    # constraint || x0 - xs ||^2 == 1).
    problem.set_initial_condition( ) # TODO complete this line

    # Solve the PEP
    pepit_tau = problem.solve(verbose=verbose)
    
    # Return the worst-case convergence rate output by the SDP solver
    return pepit_tau

Once the previous code is completed, one can test it for a few values of the problem and algorithmic parameters.

In [None]:
mu = .1
L = 1
gamma = 1
verbose = 1

pepit_tau = wc_gradient(L=L, mu=mu, gamma=gamma[i], verbose=verbose)

### Exercise 1.8: optimal stepsize and range of acceptable stepsizes?

Compute numerical values of the worst-case ratio for a few values of $\mu$ and $\gamma_k$ (with $L=1$), and try to infer rules for the stepsize choices.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import time

nb_test = 20

mu = .1
L = 1
gamma = np.linspace(0., 2., num=nb_test)
verbose = 0

pepit_taus = list()

for i in range(nb_test):
    t0= time.process_time()
    pepit_tau = wc_gradient(L=L, mu=mu, gamma=gamma[i], verbose=verbose)
    pepit_taus.append(pepit_tau)
    t1 = time.process_time() - t0
    print(i+1, '/', nb_test,' done (elapsed time:',"%.2f" %t1,'[s])')
    
plt.plot(gamma, pepit_taus, '-')

plt.xlabel('Step size')
plt.ylabel('||x_1-x_*||^2 / ||x_0-x_*||^2')

plt.show()

### Exercise 1.9: variations (performance measures)
Update the previous code for computing worst-case ratio $\frac{\|\nabla f(x_{k+1})\|^2}{\|\nabla f(x_k)\|^2}$ and experiment with it.

A good practice in PEPit is to limit the number of calls to evaluate function values and gradients. Indeed, each time a gradient or a function value is evaluated, it corresponds to (i) add points in the discrete representation of the worst-case function, and (ii) thereby, the problem also contain more "interpolation inequalities", rendering it numerically more complicated.


Compute the worst-case ratios:

Perform numerical experiments for a few values of the parameters:

### Exercise 1.10: variations (performance measures)
Update the previous code for computing worst-case ratio $\frac{f(x_{k+1})-f_\star}{f(x_k)-f_\star}$ and experiment with it.

As before, limit as much as possible the number of gradient/function value evaluations. Note though that for certain classes of functions, PEPit detect when the gradient (or function value) at a point was already evaluated, and does not add the redundant points in the discrete representation.

### Exercise 1.11: variations (number of iterations)
Update the previous PEPit code for computing worst-case ratio $\frac{\|x_{N}-x_\star\|^2}{\|x_0-x_\star\|^2}$ and experiment with it.

### Exercise 1.12: dimension of the numerical worst-case

This question does not require numerical experiments.

### Exercise 1.13: identify low-dimensional counter examples
The following code is an update of the previous one for computing the worst-case ratio $\frac{\|x_{N}-x_\star\|^2}{\|x_0-x_\star\|^2}$. It could help identifying low dimensional worst-case examples.

In [None]:
from PEPit import PEP
from PEPit.functions import SmoothStronglyConvexFunction

def wc_gradient(L, mu, gamma, n, verbose=1):
    # It is intended to compute a worst-case guarantee of gradient descent in terms of the distance to 
    # an optimal solution: ||x_{N} - x_\star ||^2 / || x_0 - x_\star \\^2.
    
    # Instantiate PEP
    problem = PEP()

    # Declare a strongly convex smooth function and a closed convex proper function
    f = problem.declare_function(SmoothStronglyConvexFunction, mu=mu, L=L)

    # Start by defining its unique optimal point xs = x_\star
    xs = f.stationary_point()
    fs = f(xs)

    # Then define the point x0 of the algorithm
    x0 = problem.set_initial_point()

    # Gradient descent
    x = x0
    list_x = list() # store all x's
    list_f = list() # store all f's
    for i in range(n):
        gx,fx = f.oracle(x)
        list_x.append(x)
        list_f.append(fx)
        x = x - gamma * gx

    # Set the "performance metric" to the distance between xN and xs
    problem.set_performance_metric( (x-xs)**2 )
    
    # Set the "initial condition" to the distance between x0 and xs
    problem.set_initial_condition( (x0-xs)**2 == 1) 

    # Solve the PEP with dimension_reduction_heuristic set to "trace" to use the trace heuristic
    pepit_tau = problem.solve(verbose=verbose, dimension_reduction_heuristic="trace")
    
    # INFO: for recovering points in the discrete representation of the function, you can use, e.g.:
    # (x-xs).eval(), (x0-xs).eval()
    # which correspond to the values of x_n-x_* and x_0-x_* obtained in the worst-case scenario.
    
    list_x_solved = list()
    list_f_solved = list()
    for i in range(n):
        list_x_solved.append((list_x[i]-xs).eval())
        list_f_solved.append((list_f[i]-fs).eval())
        
    # Return the output by the SDP solver
    return pepit_tau, list_x_solved, list_f_solved

In [None]:
L = 1
mu = .1
gamma = 1/L 
n = 10

# compute a low-dimensional worst-case example
pepit_tau, list_x_solved, list_f_solved = wc_gradient(L, mu, gamma, n, verbose=1)

What is the dimension of the output? (hint: how many nonzero eigenvalue(s) does $G\succcurlyeq 0$ contain? Check the output message of PEPit)

In PEPit, if $G$ has $r$ nonzero eigenvalue(s), one can use the first $r$ coordinate of the output of the $x_k$'s and $g_k$'s for trying to represent a worst-case function, as follows.  What do you observe? (hint: try different values of the stepsize in $(0,2/L)$).

In [None]:
import matplotlib.pyplot as plt

# if there is only 1 nonzero eigenvalue, plot the 1-dimensional WC function:
first_coordinate_x = list()
for i in range(n):
    first_coordinate_x.append(list_x_solved[i][0])
    
plt.plot(first_coordinate_x,list_f_solved,'.',label='iterates')
plt.plot(first_coordinate_x[0],list_f_solved[0],'.',label='x0')
plt.plot(0,0,'.',label='xs')
plt.legend()

### Exercise 1.14: Variations (no strong convexity)
What is the value of the worst-case ratio $\frac{\|x_{N}-x_\star\|^2}{\|x_0-x_\star\|^2}$ when $\mu=0$? Can you deduce convergence of gradient descent from it? Can you extract/deduce simple counter-examples from the numerics?

Update the previous code for computing worst-case ratios $\frac{f(x_N)-f_\star}{\|x_0-x_\star\|^2}$ when $\mu=0$; can you deduce the apparent dependence on $N$?

### Exercise 1.15: Variations (no strong convexity & alternate performance measure)
Update the previous code for computing worst-case ratios $\frac{\|\nabla f(x_N)\|^2}{\|x_0-x_\star\|^2}$ when $\mu=0$; can you deduce the apparent dependence on $N$?

### Exercise 1.16: Variations (no strong convexity & alternate performance measure)
Update the previous code for computing worst-case ratios $\frac{\|x_N-x_\star\|^2}{\|\nabla f(x_0)\|^2}$ when $\mu=0$; how does it depend on $N$?

### Exercise 1.17: learning outcomes

This question does not require any numerical experiment.