## Implementing a simple gradient descent algorithm in ``Julia``

Given a differentiable convex function $f$, our goal is to solve $\textrm{minimize}\;f(x)$, where $x\in \mathbf{R}^n$ is the decision variable. To solve the problem above, we consider gradient descent algorithm. The gradient descent implements the following iteration scheme:

$
x_{n+1}  =  x_{n}-\gamma_{n}{\nabla f(x_{n})},\qquad (1)
$

where ${\nabla f(x_{n})}$ denotes a gradient of $f$ evaluated at the iterate $x_{n}$, and $n$ is our iteration counter. As our step size rule, we pick a sequence that is square-summable but not summable, e.g., $\gamma_{n}=1/n$, will do the job. 

We will go through the following steps:
1. Load the packages
2. Create the types
3. Write the functions


### 1. Load the packages
Let us load the necessary packages that we are going to use.

In [1]:
## Load the packages to be used
# -----------------------------
using ProximalOperators, LinearAlgebra

### 2. Create the types

Next, we define a few Julia types, that we require to write an optimization solver in `Julia`. 

#### 2.1. ``GD_problem``

This type contains information about the problem instance, this bascially tells us what function $f$ we are trying to optimize over, one initial point $x_0$, and what should be the beginning step size $\gamma_0$.

In [2]:
struct GD_problem 
    
    # problem structure, contains information regarding the problem
    
    f # the objective function
    x0 # the intial condition
    γ # the stepsize
    
end

**Usage of ``GD_problem``.**  For example, the user may wish to solve a simple least-squares problem using gradient descent. Then he can create a problem instance. A list of functions that we can use in this regard can be found in the documentation of ``ProximalOperators``: [https://kul-forbes.github.io/ProximalOperators.jl/latest](https://kul-forbes.github.io/ProximalOperators.jl/latest).

In [3]:
# create a problem instance
# ------------------------

A = randn(6,5)

b = randn(6)

m, n = size(A)

# randomized intial point:

x0 = randn(n)

f = LeastSquares(A, b)

γ = 1.0

# create GD_problem

problem = GD_problem(f, x0, γ)

GD_problem(description : Least squares penalty
domain      : n/a
expression  : n/a
parameters  : n/a, [1.8897906311257238, -0.08306021415622838, 1.5446871907852175, -0.3901442030510987, -0.13333660370434405], 1.0)

### 2.2. ``GD_setting`` 

This type contains different parameters required to implement our algorithm, such as, 

* the initial step size $\gamma$, 
* maximum number of iterations $\textrm{maxit}$, 
* what should be the tolerance $\textrm{tol}$ (i.e., if $\| \nabla{f(x)} \| \leq \textrm{tol}$, we take that $x$ to be an optimal solution and terminate our algorithm), 
* whether to print out information about  the iterates or not controlled by a boolean variable $\textrm{verbose}$, and 
* how frequently to print out such information controlled by the variable $\textrm{freq}$.

The user may specify what values for these parameters above should be used. But if he does not specify anything, we should be able to have a default set of values to be used. We can achieve this by creating a simple constructor function for ``GD_setting``.

In [4]:
struct GD_setting
    
    # user settings to solve the problem using Gradient Descent
    
    γ # the step size
    maxit # maximum number of iteration
    tol # tolerance, i.e., if ||∇f(x)|| ≤ tol, we take x to be an optimal solution
    verbose # whether to print information about the iterates
    freq # how often print information about the iterates

    # constructor for the structure, so if user does not specify any particular values, 
    # then we create a GD_setting object with default values
    function GD_setting(; γ = 1, maxit = 1000, tol = 1e-8, verbose = false, freq = 10)
        new(γ, maxit, tol, verbose, freq)
    end
    
end

**Usage of ``GD_setting``.** For the previously described least squares problem, we create the following setting instance.

In [5]:
setting = GD_setting(verbose = true, tol = 1e-2, maxit = 1000, freq = 100)

GD_setting(1, 1000, 0.01, true, 100)

### 2.3. ``GD_state``
Now we define the type named ``GD_state`` that describes the state our algorithm at iteration number $n$. The state is controlled by 

* current iterte $x_n$,
* the gradient of $f$ at the current iterate: ${\nabla{f}(x_n)}$,
* the stepsize at iteration $n$: $\gamma_n$, and
* iteration number: $n$.

In [6]:
mutable struct GD_state # contains information regarding one iterattion sequence
    
    x::Any # iterate x_n
    ∇f_x::Any # one gradient ∇f(x_n)
    γ::Any # stepsize
    n::Any # iteration counter
    
end

Also, once the user has given the problem information by creating a problem instance ``GD_problem``, we need a method to construct the intial value of the type `GD_state`,  as we did earlier for the least-squares problem. We create the intial state from the problem instance by writing a constructor function.

In [7]:
function GD_state(problem::GD_problem) 
    # a constructor for the struct GD_state, it will take the problem data and create one state containing all 
    # the iterate information, current state of the gradient etc so that we can start our gradient descent scheme
    
    # unpack information from iter which is GD_iterable type
    x0 = problem.x0
    f = problem.f
    γ = problem.γ
    ∇f_x, f_x = gradient(f, x0)
    n = 1
    
    return GD_state(x0, ∇f_x, γ, n)
    
end

GD_state

### 3. Write the functions 

Now that we are done defining the types, we can now focus on writing the functions that will implement our gradient descent scheme. 

#### 3.1. ```GD_iteration!```
First, we need a function that will take the problem information and the state of our algorithm at iteration number $n$, and then compute the next state for iteration number $n+1$ according to (1). 

In [8]:
function GD_iteration!(problem::GD_problem, state::GD_state)
    
    # this is the main iteration function, that takes the problem information, and the previous state, 
    # and create the new state using Gradient Descent algorithm
    
    # unpack the current state information
    x_n = state.x
    ∇f_x_n = state.∇f_x
    γ_n = state.γ
    n = state.n
    
    # compute the next state
    x_n_plus_1 = x_n - γ_n*∇f_x_n
    
    # now load the computed values in the state
    state.x = x_n_plus_1
    state.∇f_x, f_x = gradient(problem.f, x_n_plus_1) # note that f_x is not used anywhere
    state.γ = 1/(n+1)
    state.n = n+1
    
    # done computing return the new state
    return state
    
end

GD_iteration! (generic function with 1 method)

#### ``GD_solver``
Now we are in a position to write the main solver function named ``GD_solver`` that will be used by the end user. Internally, this function will take the problem information and the problem setting, and then it will

* create the initial state,
* keep updating the state using ``GD_iteration!`` function until we reach the termination criterion or the maximum number of iterations,
* print state of the algorithm if ``verbose`` is ``true`` at the specified frequency, and 
* return the final state.

In [9]:
## The solver function

function GD_solver(problem::GD_problem, setting::GD_setting)
    
    # this is the function that the end user will use to solve a particular problem, internally it is using the previously defined types and functions to run Gradient Descent Scheme
    # create the intial state
    state = GD_state(problem::GD_problem)
    
    ## time to run the loop
    while  (state.n < setting.maxit) & (norm(state.∇f_x, Inf) > setting.tol)
        # compute a new state
        state =  GD_iteration!(problem, state)
        # print information if verbose = true
        if setting.verbose == true
            if mod(state.n, setting.freq) == 0
                @info "iteration = $(state.n) |  
                obj val = $(problem.f(state.x)) | 
                gradient norm = $(norm(state.∇f_x, Inf))"
            end
        end
    end
    
    # print information regarding the final state
    
    @info "final iteration = $(state.n) | 
    final obj val = $(problem.f(state.x)) | 
    final gradient norm = $(norm(state.∇f_x, Inf))"
    return state
    
end

GD_solver (generic function with 1 method)

**Usage of ``GD_solver``.** For the previously created ``problem`` and ``setting``, we run our ``GD_solver`` function as follows.


In [10]:
# The following function will run the entire loop over the struct GradientDescent

In [11]:
final_state_GD = GD_solver(problem, setting)

┌ Info: final iteration = 18 | 
│     final obj val = 1.8840226555101893 | 
│     final gradient norm = 0.0017729146193610212
└ @ Main In[9]:25


GD_state([-0.40875893212646097, 0.5866586203422339, 0.18948627216649985, 0.27917741589861966, -0.48461136196842236], [-0.0016798892487610573, 0.0011736858771698166, -0.001489579860942003, -0.0017729146193610212, 0.0005631929687850423], 0.05555555555555555, 18)

In [12]:
println("objective value found by our gradient descent $(f(final_state_GD.x))")

println("real objective value $(f(pinv(A)*b)) ")

objective value found by our gradient descent 1.8840226555101893
real objective value 1.8840206671818387 


So, we do decent in terms of finding a good solution!