# Evolutionary methods with `multiprocessing`

*At least let me have a little fun!*

Context on evolutionary algorithms from the Four Competences (on whiteboard)
- [more context here](https://towardsdatascience.com/daniel-c-dennetts-four-competences-779648bdbabc?source=friends_link&sk=15fe38a0971a25c0ddb028aec05109a4)
- **neuroevolution** = using evolutionary methods to find the parameters (or even architecture) of a neural network

Computational evolutionary methods
- general, black box optimization
- gradient free 
- work in challenging cost functions (non-linear, discontinuous etc)
- can be parallelized

Sample inefficient
- because they learn from a weak learning signal (fitness / total episode reward)
- don't learn from state / reward transitions that occur during an episode

If you want a tool for your toolbet, CMA-ES (https://en.wikipedia.org/wiki/CMA-ES) is a good choice
- adapts covariance matrix
- good up until 1,000 - 10,000 parameters
- used in World Models

## Generate, test & select

Evolutionary improvement occurs through a **generate, test & select loop**
- substrate independent

Our algorithm will:
- generate a population of parameters (neural network weights and biases)
- test these parameters in the `mountaincar` environment
- select the best performing set of parameters to use in the next generate step

Let's first setup the code for the forward pass of a neural net.  We aren't going to do any backprop, so we can do it all in `numpy`:

In [None]:
!pip install gym pygame -q

import gym
from evolution import make_env, initialize_parameters, episode

#  you can use either 'CartPole-v0' or 'MountainCarContinuous-v0'
env_id = 'CartPole-v0'
env_id = 'MountainCarContinuous-v0'
h_size = 10
env, forward, i_size, o_size = make_env(env_id)
params = initialize_parameters(i_size, h_size, o_size)

We end up with a dictionary of parameters with random weights:

In [None]:
params.keys()

We can use the function forward to select an action using these parameters & a randomly sampled observation:

In [None]:
action = forward(env.observation_space.sample(), params)
action

Below the machinery for saving and loading parameters is given - this is so you can run `python render.py` when you agent is learning:

In [None]:
from evolution import save_params, load_params

params = initialize_parameters(i_size, h_size, o_size)
save_params(params, env_id, agent_id=1)
params = load_params(env_id, agent_id=1)

Now you should be able to run (in a shell - will break your notebook kernel):

```bash
$ python render.py --env_id CartPole-v0 --agent_id 1

$ python render.py --env_id MountainCarContinuous-v0 --agent_id 1
```

![](assets/car.png)

## The components of an evolutionary algorithm

Above we have outlined the code we need to do the test step of generate/test/select.  Now we will outline the code for the generate & select steps.

For the first generation, the loop is to **generate** a population with random weights:

In [None]:
pop_size = 32
pop = [initialize_parameters(i_size, h_size, o_size) for _ in range(pop_size)]

**Test** the population in the environment:

In [None]:
from functools import partial

results = list(map(partial(episode, env_id=env_id), pop))

print(np.mean(results))
assert len(results) == pop_size

**Select** the best performing parameters:

In [None]:
best = pop[np.argmax(results)]

We are now on the second generation - instead of sampling parameters randomly using `initialize_parameters` we use selected best from the last generation.

We still sample a new generation randomly, but now we use the results of the first generation to create the distribution to sample from.

Below we create a new generation, using the best performing member to estimate the mean.  We use an identity covariance matrix.

In [None]:
np.eye(4)

In [None]:
num_w = 4
[np.random.normal(0, 1) for _ in range(num_w)]

np.random.multivariate_normal([0, 0, 0, 0], np.eye(4))

In [None]:
def sample_params(best):
    p = {}
    for k, v in best.items():
        #  sample new parameters from a flat multivariate normal
        flat = v.flatten()
        new = np.random.multivariate_normal(flat, np.eye(flat.shape[0]), size=1)
        #  reshape the weight or bias
        p[k] = new.reshape(v.shape)
    return p

pop = [sample_params(best) for _ in range(pop_size)]

In [None]:
results = list(map(partial(episode, env_id=env_id), pop))
print(np.mean(results))

## Practical

Take the components above and put them together:
- implement an evolutionary method using `map` (single core)
- implement an evolutionary method using `pool.map` (multi-core)

```
initialize a population of parameters

for generation in generation
    test parameters in the environment
    select the best
    generate new parameters using the best
```

In [None]:
generations = 64
pop_size = 32

#  cartpole broken
env_id = 'MountainCarContinuous-v0'
env_id = 'CartPole-v0'

h_size = 10
env, forward, i_size, o_size = make_env(env_id)
params = initialize_parameters(i_size, h_size, o_size)

pop = [initialize_parameters(i_size, h_size, o_size) for _ in range(pop_size)]
for generation in range(generations):
    #  test
    results = list(map(partial(episode, env_id=env_id), pop))
    print(np.mean(results))
    #  select
    best = pop[np.argmax(results)]
    pop = [sample_params(best) for _ in range(pop_size)]
    save_params(best, env_id, agent_id=generation)