# Summary
This document corresponds to Exercise 1 of [this file](https://github.com/PerformanceEstimation/Learning-Performance-Estimation/blob/main/Exercises/Course.pdf).

The first step consists in installing [PEPit](https://pypi.org/project/PEPit/) and its dependencies:

In [None]:
!pip install pepit

Secondly, complete the code for computing the worst-case behavior of the ratio $\frac{\|x_{k+1}-x_\star\|^2}{\|x_k-x_\star\|^2}$ (we use $k=0$ without loss of generality and for readability below).

In [None]:
from PEPit import PEP
from PEPit.functions import SmoothStronglyConvexFunction

def wc_gradient(L, mu, gamma, verbose=1):
    # It is intended to compute the worst-case convergence of gradient descent in terms of the distance to 
    # an optimal solution: ||x_{k+1} - x_\star ||^2 / || x_k - x_\star \\^2.
    # Note that we use k = 0 in the code below for readability.
    
    # Instantiate PEP
    problem = PEP()

    # Declare a strongly convex smooth function and a closed convex proper function
    f = problem.declare_function(SmoothStronglyConvexFunction, mu=mu, L=L)

    # Start by defining its unique optimal point xs = x_\star
    xs = f.stationary_point()

    # Then define the point x0 of the algorithm
    x0 = problem.set_initial_point()    
    
    # Perform one iteration of gradient descent
    x1 = x0 - gamma * f.gradient(x0)

    # Set the "performance metric" to the distance between x1 and xs
    # TO COMPLETE (use "problem.set_performance_metric" to specify the objective function of the SDP)
    problem.set_performance_metric( ) # complete this line
    
    # Set the "initial condition" to the distance between x0 and xs
    # TO COMPLETE (use "problem.set_initial_condition" or "problem.add_constraint" to specify the
    # constraint || x0 - xs ||^2 == 1).
    problem.set_initial_condition( ) # complete this line

    # Solve the PEP
    pepit_tau = problem.solve(verbose=verbose)
    
    # Return the worst-case convergence rate output by the SDP solver
    return pepit_tau

Once the previous code is completed, one can test it for a few values of the problem and algorithmic parameters.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import time

nb_test = 20

mu = .1
L = 1
gamma = np.linspace(0., 2., num=nb_test)
verbose = 0

pepit_taus = list()

for i in range(nb_test):
    t0= time.process_time()
    pepit_tau = wc_gradient(L=L, mu=mu, gamma=gamma[i], verbose=verbose)
    pepit_taus.append(pepit_tau)
    t1 = time.process_time() - t0
    print(i+1, '/', nb_test,' done [elapsed time:',"%.2f" %t1,'[s]')
    
plt.plot(gamma, pepit_taus, '-')

plt.xlabel('Step size')
plt.ylabel('||x_1-x_*||^2 / ||x_0-x_*||^2')

plt.show()

### Variations (performance measures)
Update the previous code for computing worst-case ratios $\frac{\|\nabla f(x_{k+1})\|^2}{\|\nabla f(x_k)\|^2}$ and $\frac{f(x_{k+1})-f_\star}{f(x_k)-f_\star}$ and experiment with it.

### Variations (number of iterations)
Update the previous code for computing worst-case ratio $\frac{\|x_{N}-x_\star\|^2}{\|x_0-x_\star\|^2}$ and experiment with it.

## Identifying low-dimensional counter examples
Update the previous code for computing worst-case ratios $\frac{\|x_{N}-x_\star\|^2}{\|x_0-x_\star\|^2}$ when $\mu=0$. What can you deduce convergence of gradient descent from it? Can you extract/deduce simple counter examples from the numerics?
The following code could help picturing what a problem might be in such types of worst-case analyses, by trying to identify low dimensional worst-case examples.

In [33]:
from PEPit import PEP
from PEPit.functions import SmoothStronglyConvexFunction

def wc_gradient(L, mu, gamma, n, verbose=1):
    # It is intended to compute a worst-case guarantee of gradient descent in terms of the distance to 
    # an optimal solution: ||x_{N} - x_\star ||^2 / || x_0 - x_\star \\^2.
    
    # Instantiate PEP
    problem = PEP()

    # Declare a strongly convex smooth function and a closed convex proper function
    f = problem.declare_function(SmoothStronglyConvexFunction, mu=mu, L=L)

    # Start by defining its unique optimal point xs = x_\star
    xs = f.stationary_point()

    # Then define the point x0 of the algorithm
    x0 = problem.set_initial_point()

    # Gradient descent
    x = x0
    for i in range(n):
        g = f.gradient(x)
        x = x - gamma * g

    # Set the "performance metric" to the distance between xN and xs
    problem.set_performance_metric( (x-xs)**2 ) 
    
    # Set the "initial condition" to the distance between x0 and xs
    problem.set_initial_condition( (x0-xs)**2 == 1) 

    # Solve the PEP
    pepit_tau = problem.solve(verbose=verbose, dimension_reduction_heuristic="trace")
    
    # Return the output by the SDP solver
    return pepit_tau, (x-xs).eval(), (x0-xs).eval(), g.eval()

### Variations (no strong convexity)
Update the previous code for computing worst-case ratios $\frac{f(x_N)-f_\star}{\|x_0-x_\star\|^2}$ when $\mu=0$; can you deduce the apparent dependence on $N$?