# Stochastic Methods

Employ a randomized strategy for exploring a design space to jump out of local minima and hopefully enter global minima.


## Noisy Descent 

$x^{k+1} = x^k + \alpha * g^{k} + \epsilon^{k}$

This augments gradient descent with additive gaussian noise $\epsilon$ 

## Mesh Adaptive 

Similar to noisy descent however it will use random spanning directions to search dictated by $+-1 / \sqrt(\alpha^k)$

## Simulated Annealing

Inspired by metallurgy (material is heated and cooled) uses a temperature term to control the amount of stochasticity used in the randomized search.

1. Logarithmic Annealing (gauranteed but slow)

$t^{k} = t^{1} * \ln(2) / \ln(k + 1)$

2. Exponential Annealing (faster)

$t^{k+1} = \gamma * t^{k}$ where $\gamma = 0...1$

3. Fast Annealing 

$t^{k} = t^{1} / k$

In [10]:
function simulated_annealing(f, x, T, t, k_max)
    y = f(x)
    x_best, y_best = x, y
    for k in 1:k_max
        # Expand search space by temperature
        x_new = x + rand(T)

        # Evaluate new point
        y_new = f(x_new)

        # Calculate the difference between the new and old point
        delta_y = y_new - y

        # If the new point is better or change is small enough, accept it
        if delta_y <= 0 || rand() < exp(-delta_y / t(k))
            x, y = x_new, y_new
        end

        # If the new point is the best so far, update the best point
        if y_new < y_best
            x_best, y_best = x_new, y_new
        end
    end

    return x_best, y_best
end

simulated_annealing (generic function with 1 method)

In [11]:
ackey = (x) -> -20 * exp(-0.2 * sqrt(0.5 * (x[1]^2 + x[2]^2))) - exp(0.5 * (cos(2 * π * x[1]) + cos(2 * π * x[2]))) + 20 + exp(1)
x0 = [15., 15.]
T = 2
t(k) = 1 / k

x, y = simulated_annealing(ackey, x0, T, t, 100)
println("x = $x, y = $y")

x = [15.0, 15.0], y = 19.00425863264272


As one can see 19 isn't the minimum of an Ackey function.. however making the simulated annealing to be more adaptive and larger sigma of energy will help the algorithm to capture the global minimum... but lets see how another method will work below.

## Cross-Entropy Method

Very similar to KL divergence where we sample from a proposal distribution to minimize the delta between proposal and target with sampling best samples (e.g., find maximum liklihood).

In [12]:
using Distributions

function cross_entropy_method(f, P, k_max; m=100, m_elite=10)
    for k in 1:k_max
        # Sample from the distribution (nxm matrix where n is the dimension of the input space P and m is the number of samples)
        samples = rand(P, m)
        # Evaluate the samples and sort them
        order = sortperm([f(samples[:, i]) for i in 1:m])
        # Update the distribution based on the elite samples
        P = fit(typeof(P), samples[:, order[1:m_elite]])
    end
    return P
end

cross_entropy_method (generic function with 1 method)

In [13]:
import Random: seed!
import LinearAlgebra: norm

seed!(42)

μ = [0.5, 1.5]
Σ = [1.0 0.2; 0.2 2.0]
P = MvNormal(μ, Σ)
k_max = 10

P = cross_entropy_method(ackey, P, k_max)
@show P.μ

P.μ = [-7.818623595598141e-8, 2.253462157183674e-7]


2-element Vector{Float64}:
 -7.818623595598141e-8
  2.253462157183674e-7

Turns out this method works very will fiting a proposal distribution to minimize the the objective function. Below we can see how this can be done more efficiently using gradient descent of the log liklihood.

## Natural Evolution Strategies

Similar to Cross-Entropy Method, natural evolution strategy seeks to minimize a function from a proposal distribution parameterized by $\theta$. We seek to minimize the expectation of a function parameteried by $\theta\$ instead of taking fitting elite samples. 

In [None]:
# TODO calcualte the gradient of the log-likelihood

function natural_evolution_strategy(f, θ, k_max; m=100, α=0.01)
    for k in 1:k_max
        # Sample from the distribution
        samples = [rand(θ, length(θ)) for i in 1:m]
        # Evaluate the samples, calculate the log-likelihood, and integrate them by taking the average and applying the learning rate to update theta
        fx = [f(x) for x in samples]
        logp = [∇logp(x, θ) for x in samples]
        θ -= α * sum(fx * logp) / m
    end
    return θ
end

In [None]:
θ = natural_evolution_strategy(ackey, μ, 100)
@show θ