<img src="../imgs/logo.png" width="20%" align="right" style="margin:0px 20px">


# Evolutionary Computation

## 5.3 Deep Neuroevolution

<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" align="left" src="https://i.creativecommons.org/l/by-sa/4.0/80x15.png" /></a>&nbsp;| Dennis G. Wilson | <a href="https://d9w.github.io/evolution/">https://d9w.github.io/evolution/</a>

# Deep Neuroevolution

Artificial neural networks are commonly used today in many applications, from phone apps to automatic piloting systems to search engines. These machine learning models contain many parameters and are usually optimized with stochastic gradient descent. However, evolutionary strategies can also be a great tool for optimizing neural network parameters, especially when there isn't a clear direction the training of the network should take. This is the case for reinforcement learning, so we'll look at a classic RL task in this section.

Because of the success of deep learning, where neural network architectures are "deep" by having many layers, this field is sometimes called deep neuroevolution. However, remember from tutorial 4 that researchers have been evolving neural networks long before the advent of deep learning.

In today's notebook, I'll be using some Python RL environments and using PyCall to interact with them in Julia.

In [None]:
using PyCall
using Conda

In [None]:
Conda.add("gym")

I'll also select a random seed. This means that whenever I generate random numbers, they'll follow a defined sequence. Finally, I'll import all of the code from the CMA-ES notebook, which I've put in a separate file.

In [None]:
import Random
Random.seed!(1234);

In [None]:
include("cmaes.jl");

Let's make a simple neural network. Remember the model of a neuron which multiplies inputs by synaptic weights then adds a bias. We'll construct a network with two internal, or hidden, layers. These layers will be fully connected - every neuron will connect to every other one in the next layer.

<img src="../imgs/neuron_model.jpeg" width="50%">
<img src="../imgs/cnn.png" width="50%">

In [None]:
struct FCLayer
    w::Array{Float64}
    b::Array{Float64}
end

struct SimpleANN
    l1::FCLayer
    l2::FCLayer
    out::FCLayer
end

We can write a construction method which just uses zeros as all weights and biases. We'll fill these with the genetic information later.

In [None]:
function SimpleANN(input::Int, N1::Int, N2::Int, output::Int)
    l1 = FCLayer(zeros(N1, input), zeros(N1))
    l2 = FCLayer(zeros(N2, N1), zeros(N2))
    out = FCLayer(zeros(output, N2), zeros(output))
    SimpleANN(l1, l2, out)
end

Finally, we'll use our network to compute, passing an input in before the first layer and recording the activation of the output layer

In [None]:
ann = SimpleANN(5, 64, 64, 4);

In [None]:
function compute(inputs::Array{Float64}, ann::SimpleANN)
    x = ann.l1.w * inputs .+ ann.l1.b
    x = ann.l2.w * x .+ ann.l2.b
    x = ann.out.w * x .+ ann.out.b
    x
end

Since all weights and biases are zeros, if we pass in zeros we should also get out zeros.

In [None]:
compute(zeros(5), ann)

Now that we have an ANN, let's test it. We'll evaluate individuals in the CartPole environment, where they must balance a pole on a cart to keep it upright. The actions our agent can take are to move the cart either right or left.

In [None]:
gym = pyimport("gym")

In [None]:
env = gym.make("CartPole-v1")
n_in = 4
n_out = 2;

We'll run an entire episode, which terminates whenever the pole falls to a certain angle from the top.

In [None]:
function play_env(ann::SimpleANN; render=false)
    env = gym.make("CartPole-v1")
    env.seed(0)
    obs = env.reset()
    total_reward = 0.0
    done = false
    
    while ~done
        action = argmax(compute(obs, ann))-1
        obs, reward, done, _ = env.step(action)
        if render
            env.render()
        end
        total_reward += reward
    end
    env.close()
    env = nothing
    Base.GC.gc()
    total_reward
end

With our zero network, this won't be able to last very long, as it is always taking a constant action of 0.

In [None]:
ann = SimpleANN(n_in, 5, 5, n_out)
play_env(ann; render=true)

Let's write a new constructor for our network which takes in genes and sets all of the network parameters. We'll then optimize these genes.

In [None]:
function SimpleANN(genes::Array{Float64})
    ann = SimpleANN(n_in, 5, 5, n_out)
    layers = [ann.l1.w, ann.l1.b, ann.l2.w, ann.l2.b, ann.out.w, ann.out.b]
    L = 1
    j = 1
    for i in eachindex(genes)
        if j > length(layers[L])
            L += 1
            j = 1
        end
        layers[L][j] = genes[i]
        j += 1
    end
    ann
end

The objective function is then just to create an ANN and evaluate its performance on an episode of the CartPole benchmark. Because CMA-ES is minimizing, we'll return the negative.

In [None]:
function objective(genes::Array{Float64})
    ann = SimpleANN(genes)
    -play_env(ann)
end

Let's see how many genes we have now:

In [None]:
N = n_in*5 + 5 + 5*5 + 5 + 5*n_out + n_out

Now we can try a random individual, maybe it will do better!

In [None]:
ann = SimpleANN(randn(N))
play_env(ann; render=true)

Let's use the CMAES function we defined in the last notebook and optimize for just a few steps.

In [None]:
c = CMAES(N=N, µ=10, λ=30, τ=sqrt(N), τ_c=N^2, τ_σ=sqrt(N))
for i in 1:5
    step!(c, objective)
    println(i, " ", maximum(.-c.F_λ))
end

We might notice that our results go down. Remember that CMA-ES is not elitist! We should keep an external archive of the best results.

In [None]:
best = nothing
best_fit = -Inf
c = CMAES(N=N, µ=10, λ=30, τ=sqrt(N), τ_c=N^2, τ_σ=sqrt(N))
for i in 1:20
    step!(c, objective)
    bestind = argmin(c.F_λ)
    maxfit = -c.F_λ[bestind]
    println(i, " ", maxfit)
    if maxfit > best_fit
        best = copy(c.offspring[bestind])
        best_fit = maxfit
    end
    if best_fit == 500
        break
    end
end

Finally, we can see how the CMA-ES optimized invidual does on this benchmark.

In [None]:
ann = SimpleANN(best)
play_env(ann; render=true)

<div class="alert alert-success">
    <b>Exercise</b>
    <br/>
    We were sort of cheating before. This neural network only learned how to do well on one individual, the one which comes from seeing the environment with 0. Test is on an environment with a different seed. Does it still do well? Finally, re-run the evaluation, but don't use a random seed, or change it every time. What is the impact of a stochastic fitness on evolution?
</div>