<img src="../imgs/logo.png" width="20%" align="right" style="margin:0px 20px">


# Evolutionary Computation

## 5.3 Deep Neuroevolution

<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" align="left" src="https://i.creativecommons.org/l/by-sa/4.0/80x15.png" /></a>&nbsp;| Dennis G. Wilson | <a href="https://d9w.github.io/evolution/">https://d9w.github.io/evolution/</a>

# Deep Neuroevolution

Artificial neural networks are commonly used today in many applications, from phone apps to automatic piloting systems to search engines. These machine learning models contain many parameters and are usually optimized with stochastic gradient descent. However, evolutionary strategies can also be a great tool for optimizing neural network parameters, especially when there isn't a clear direction the training of the network should take. This is the case for reinforcement learning, so we'll look at a classic RL task in this section.

Because of the success of deep learning, where neural network architectures are "deep" by having many layers, this field is sometimes called deep neuroevolution. However, remember from tutorial 4 that researchers have been evolving neural networks long before the advent of deep learning.

In today's notebook, I'll be using some Python RL environments and using PyCall to interact with them in Julia.

In [1]:
using PyCall
using Conda
using Flux
include("cmaes.jl");

We can write a construction method which just uses zeros as all weights and biases. We'll fill these with the genetic information later.

In [2]:
struct convol
    w ::AbstractArray{Float64}
    b ::AbstractArray{Float64}
end

struct my_CNN
    c1 :: convol
    c2 :: convol
end

function my_CNN(f1::Int,c1_in::Int,c1_out::Int,f2::Int,c2_in::Int,c2_out::Int)
    c1 = convol(zeros(f1,f1,c1_in,c1_out),zeros(c1_out))
    c2 = convol(zeros(f2,f2,c2_in,c2_out),zeros(c2_out))
    my_CNN(c1,c2)
end

my_CNN

Finally, we'll use our network to compute, passing an input in before the first layer and recording the activation of the output layer

In [3]:
function compute_cnn(ann,inputs)
    y = Flux.Conv(ann.c1.w,ann.c1.b,σ;stride=3)(inputs)
    y = MaxPool((2,2),stride=2)(y)
    y = Flux.Conv(ann.c2.w,ann.c2.b,σ;stride=3)(y)
    y = MeanPool((2,2),stride=4)(y)
    y = flatten(y)
    y = (y .> 0.5) .* 1 
end

compute_cnn (generic function with 1 method)

Since all weights and biases are zeros, if we pass in zeros we should also get out zeros.

Now that we have an ANN, let's test it. We'll evaluate individuals in the CartPole environment, where they must balance a pole on a cart to keep it upright. The actions our agent can take are to move the cart either right or left.

In [4]:
retro = pyimport_conda("retro","gym")

PyObject <module 'retro' from 'C:\\Users\\Kinza\\.julia\\conda\\3\\lib\\site-packages\\retro\\__init__.py'>

We'll run an entire episode, which terminates whenever the pole falls to a certain angle from the top.

In [11]:
function play_env(ann; render=false)
    env = retro.make("SonicTheHedgehog-Genesis","GreenHillZone.Act1")
    ob = env.reset()
    total_reward = 0.0
    done = false
    
    #inx, iny, inc = env.observation_space.shape
    #inx = floor(Int,inx/8)
    #iny = floor(Int,iny/8)
    
    max_fitness = 0
    fitness = 0
    counter = 0
    xpos = 0
    xpos_max = 0
    frame = 0
    while ~done
        if render
            frame+=1
            env.render()
        end
        
        ob = Flux.unsqueeze(ob,4)
        action = compute_cnn(ann,ob)
        println("action = ",action)
        
        ob, reward, done, info = env.step(action)
    
        fitness += reward

        xpos = info["x"]
        xpos_end = info["screen_x_end"]


        if xpos > xpos_max
            fitness += 1
            xpos_max = xpos
        end

        if xpos == xpos_end && xpos > 500
            fitness += 100000
            done = True
        end

        if fitness > max_fitness
            max_fitness = fitness
            counter = 0
        else
            counter += 1
        end

        if done || counter == 500
            done = true
        end

    end
    
    env.close()
    fitness
end

play_env (generic function with 1 method)

With our zero network, this won't be able to last very long, as it is always taking a constant action of 0.

Let's write a new constructor for our network which takes in genes and sets all of the network parameters. We'll then optimize these genes.

In [6]:
function my_CNN(genes::Array{Float64})
    ann = my_CNN(12,3,4,8,4,1)
    layers = [ann.c1.w, ann.c1.b, ann.c2.w, ann.c2.b]
    L = 1
    j = 1
    for i in eachindex(genes)
        if j > length(layers[L])
            L += 1
            j = 1
        end
        layers[L][j] = genes[i]
        j += 1
    end
    ann
end

function objective(genes::Array{Float64})
    ann = my_CNN(genes)
    -play_env(ann;render=false)
end

objective (generic function with 1 method)

The objective function is then just to create an ANN and evaluate its performance on an episode of the CartPole benchmark. Because CMA-ES is minimizing, we'll return the negative.

Let's see how many genes we have now:

In [7]:
N = 12*12*3*4+4+8*8*4*1+1

1989

Now we can try a random individual, maybe it will do better!

Let's use the CMAES function we defined in the last notebook and optimize for just a few steps.

We might notice that our results go down. Remember that CMA-ES is not elitist! We should keep an external archive of the best results.

In [10]:
best = nothing
best_fit = -Inf
c = CMAES(N=N, µ=20, λ=20, τ=sqrt(N), τ_c=N^2, τ_σ=sqrt(N))
i=0
while best_fit <000 
    i+=1
    start = time()
    step!(c, objective)
    bestind = argmin(c.F_λ)
    maxfit = -c.F_λ[bestind]
    print("generation = ",i, ", fitness = ", maxfit)
    if maxfit > best_fit
        best = copy(c.offspring[bestind])
        best_fit = maxfit
    end
    println(", elapsed time = ",time()-start)
end
println("Done")

generation = 1 fitness = 261.0 elapsed time = 178.49200010299683
generation = 2 fitness = 823.0 elapsed time = 273.9390001296997
generation = 3 fitness = 305.0 elapsed time = 424.01399993896484
generation = 4 fitness = 210.0 elapsed time = 364.4279999732971
generation = 5 fitness = 261.0 elapsed time = 436.0789999961853
generation = 6 fitness = 180.0 elapsed time = 324.42199993133545
generation = 7 fitness = 502.0 elapsed time = 517.5539999008179
generation = 8 fitness = 176.0 elapsed time = 283.10199999809265
generation = 9 fitness = 489.0 elapsed time = 535.4729998111725
generation = 10 fitness = 199.0 elapsed time = 568.6829998493195
generation = 11 fitness = 222.0 elapsed time = 385.01599979400635
generation = 12 fitness = 499.0 elapsed time = 364.5420000553131
generation = 13 fitness = 320.0 elapsed time = 372.9069998264313
generation = 14 fitness = 259.0 elapsed time = 436.710000038147
generation = 15 fitness = 198.0 elapsed time = 349.6819999217987
generation = 16 fitness = 216.

InterruptException: InterruptException:

Finally, we can see how the CMA-ES optimized invidual does on this benchmark.

In [9]:
ann = my_CNN(best)
play_env(ann; render=true)

MethodError: MethodError: no method matching my_CNN(::Nothing)
Closest candidates are:
  my_CNN(::Any, !Matched::Any) at In[2]:7
  my_CNN(!Matched::convol, !Matched::convol) at In[2]:7
  my_CNN(!Matched::Int64, !Matched::Int64, !Matched::Int64, !Matched::Int64, !Matched::Int64, !Matched::Int64) at In[2]:12
  ...

<div class="alert alert-success">
    <b>Exercise</b>
    <br/>
    We were sort of cheating before. This neural network only learned how to do well on one individual, the one which comes from seeing the environment with 0. Test is on an environment with a different seed. Does it still do well? Finally, re-run the evaluation, but don't use a random seed, or change it every time. What is the impact of a stochastic fitness on evolution?
</div>