In [None]:
#hide
from your_lib.core import *

# gradless

This is an implementation of gradient descent designed to work without access to the exact gradient. It uses Spall's simultaneous perturbation stochastic approximation (SPSA) to replace the missing gradient. 

SPSA is particularly useful for optimization problems where the objective function itself is noisy, such that the exact gradient cannot be evaluated. For example, if the model at hand is evaluated by simulations rather than exact computations. This is in contrast to more typical applications of stochastic gradient descent, where the gradient can be computed, but noise is introduced through subsampling of the data (e.g. minibatching) or by Monte Carlo integration (e.g. in variational inference). 

My principle aim in writing this library is to have a structured, easy-to-modify framework for use in a research problem with a noisy objective function. This is still in development (so if you must use it, use with caution and skepticism) and geared .

### Background

A good overview of SPSA can be found [here](https://www.jhuapl.edu/SPSA/PDF-SPSA/Spall_An_Overview.PDF). But the general idea of SPSA is reasonably straightforward. Given a step size $c_t$ and a vector of perturbations $\delta$, we first generate forward and backward perturbations all model parameters simultaneously

$$\theta^+ = \theta + c_t \delta$$
$$\theta^- = \theta - c_t \delta$$

The perturbation, $\delta$ is often sampled from a shifted and rescaled Bernoulli distribution as follows:

$$b_1, b_2,..., b_m \sim Bernoulli(p=.5)$$
$$\delta_i = 2b_i -1$$

where $\delta_i$ is the direction in which the $i$-th model parameter will be moved in the forward perturbation.

We then evaluate the cost function at the two perturbed parameters

$$y^+ = F(\theta^+, X)$$
$$y^- = F(\theta^-, X)$$

The gradient is approximated the slope of the line between the points $(\theta^+, y^+)$ and $(\theta^-, y^-)$:

$$\hat{g}= \frac{y^+-y^-}{\theta^+ - \theta^-}= \frac{y^+-y^-}{2 c_t \delta}$$

A major advantage of this approximation is that in its simplest form, only two evaluations of the cost function are required, regardless of the dimensionality of the model. This is in constrast to the [finite-differences approximation]() which requires each model parameter be perturbed separately.

## How to use