Skip to content

Commit

Permalink
Added an example that uses the OpenAI Gym's LunarLander-v2 environment.
Browse files Browse the repository at this point in the history
  • Loading branch information
CodeReclaimers committed Jan 28, 2017
1 parent ee83edb commit a53f53f
Show file tree
Hide file tree
Showing 3 changed files with 449 additions and 0 deletions.
62 changes: 62 additions & 0 deletions examples/openai-lander/config
@@ -0,0 +1,62 @@
# neat-python configuration for the LunarLander-v2 environment on OpenAI Gym

[NEAT]
pop_size = 200
max_fitness_threshold = 1000.0
reset_on_extinction = 0

[DefaultGenome]
num_inputs = 8
num_hidden = 0
num_outputs = 4
initial_connection = full
feed_forward = True
compatibility_disjoint_coefficient = 1.0
compatibility_weight_coefficient = 1.0
conn_add_prob = 0.15
conn_delete_prob = 0.1
node_add_prob = 0.15
node_delete_prob = 0.1
activation_default = tanh
activation_options = tanh clamped gauss hat sin
activation_mutate_rate = 0.05
aggregation_default = sum
aggregation_options = sum
aggregation_mutate_rate = 0.0
bias_init_mean = 0.0
bias_init_stdev = 1.0
bias_replace_rate = 0.02
bias_mutate_rate = 0.8
bias_mutate_power = 0.4
bias_max_value = 30.0
bias_min_value = -30.0
response_init_mean = 1.0
response_init_stdev = 0.0
response_replace_rate = 0.0
response_mutate_rate = 0.1
response_mutate_power = 0.01
response_max_value = 30.0
response_min_value = -30.0

weight_max_value = 30
weight_min_value = -30
weight_init_mean = 0.0
weight_init_stdev = 1.0
weight_mutate_rate = 0.8
weight_replace_rate = 0.02
weight_mutate_power = 0.4
enabled_default = True
enabled_mutate_rate = 0.01

[DefaultSpeciesSet]
compatibility_threshold = 3.0

[DefaultStagnation]
species_fitness_func = mean
max_stagnation = 15
species_elitism = 5

[DefaultReproduction]
elitism = 2
survival_threshold = 0.2

185 changes: 185 additions & 0 deletions examples/openai-lander/evolve.py
@@ -0,0 +1,185 @@
# Evolve a control/reward estimation network for the OpenAI Gym
# LunarLander-v2 environment (https://gym.openai.com/envs/LunarLander-v2).
# This is a work in progress, and currently takes ~100 generations to
# find a network that can land with a score >= 200 at least a couple of
# times. It has yet to solve the environment, which could have something
# to do to me being totally clueless in regard to reinforcement learning. :)

from __future__ import print_function

import gym
import gym.wrappers

import matplotlib.pyplot as plt

import neat
import numpy as np
import os
import pickle
import random

import visualize

env = gym.make('LunarLander-v2')

print("action space: {0!r}".format(env.action_space))
print("observation space: {0!r}".format(env.observation_space))

# Limit episodes to 400 time steps to cut down on training time.
# 400 steps is more than enough time to land with a winning score.
print(env.spec.tags.get('wrapper_config.TimeLimit.max_episode_steps'))
env.spec.tags['wrapper_config.TimeLimit.max_episode_steps'] = 400
print(env.spec.tags.get('wrapper_config.TimeLimit.max_episode_steps'))

env = gym.wrappers.Monitor(env, 'results', force=True)

discounted_reward = 0.9
min_reward = -200
max_reward = 200

score_range = []

def eval_fitness_shared(genomes, config):
nets = []
for gid, g in genomes:
nets.append((g, neat.nn.FeedForwardNetwork.create(g, config)))
g.fitness = []

episodes = []
scores = []
for genome, net in nets:
observation = env.reset()
episode_data = []
j = 0
total_score = 0.0
while 1:
if net is not None:
output = net.activate(observation)
action = np.argmax(output)
else:
action = env.action_space.sample()

observation, reward, done, info = env.step(action)
total_score += reward
episode_data.append((j, observation, action, reward))

if done:
break

j += 1

episodes.append(episode_data)
scores.append(total_score)
genome.fitness = total_score

if scores:
score_range.append((min(scores), np.mean(scores), max(scores)))

# Compute discounted rewards.
discounted_rewards = []
for episode in episodes:
rewards = np.array([reward for j, observation, action, reward in episode])
N = len(episode)
D = np.sum((np.eye(N, k=i) * discounted_reward ** i for i in range(N)))
discounted_rewards.append(np.dot(D, rewards))

print(min(map(np.min, discounted_rewards)), max(map(np.max, discounted_rewards)))

# Normalize rewards
for i in range(len(discounted_rewards)):
discounted_rewards[i] = 2 * (discounted_rewards[i] - min_reward) / (max_reward - min_reward) - 1.0

print(min(map(np.min, discounted_rewards)), max(map(np.max, discounted_rewards)))

episode_filter = [random.randint(0, len(episodes)-1) for _ in range(10)]
for genome, net in nets:
reward_error = []
for i in episode_filter:
episode = episodes[i]
discount_reward = discounted_rewards[i]
for (j, observation, action, reward), dr in zip(episode, discount_reward):
#test_set.append((observation, action, reward, dr))
output = net.activate(observation)
reward_error.append((output[action] - dr)**2)

print(genome.fitness, np.mean(reward_error))
genome.fitness -= 100 * np.mean(reward_error)


def run():
# Load the config file, which is assumed to live in
# the same directory as this script.
local_dir = os.path.dirname(__file__)
config_path = os.path.join(local_dir, 'config')
config = neat.Config(neat.DefaultGenome, neat.DefaultReproduction,
neat.DefaultSpeciesSet, neat.DefaultStagnation,
config_path)

pop = neat.Population(config)
stats = neat.StatisticsReporter()
pop.add_reporter(stats)
pop.add_reporter(neat.StdOutReporter())
# Checkpoint every 10 generations or 900 seconds.
pop.add_reporter(neat.Checkpointer(10, 900))

# Run until the winner from a generation is able to solve the environment.
while 1:
winner = pop.run(eval_fitness_shared, 1)

visualize.plot_stats(stats, ylog=False, view=False, filename="fitness.svg")

if score_range:
S = np.array(score_range).T
plt.plot(S[0], 'r-')
plt.plot(S[1], 'b-')
plt.plot(S[2], 'g-')
plt.grid()
plt.savefig("score-ranges.svg")
plt.close()

mfs = sum(stats.get_fitness_mean()[-5:]) / 5.0
print("Average mean fitness over last 5 generations: {0}".format(mfs))

mfs = sum(stats.get_fitness_stat(min)[-5:]) / 5.0
print("Average min fitness over last 5 generations: {0}".format(mfs))

winner_net = neat.nn.FeedForwardNetwork.create(winner, config)

for k in range(100):
observation = env.reset()
score = 0
while 1:
output = winner_net.activate(observation)
observation, reward, done, info = env.step(np.argmax(output))
score += reward
env.render()
if done:
break
print(k, score)
if score < 200:
break
else:
print("Solved.")
break

winner = stats.best_genome()
print(winner)

# Save the winner.
with open('winner.pickle', 'wb') as f:
pickle.dump(winner, f)

visualize.plot_stats(stats, ylog=False, view=True, filename="fitness.svg")
visualize.plot_species(stats, view=True, filename="speciation.svg")

visualize.draw_net(config, winner, True)

visualize.draw_net(config, winner, view=True, filename="winner-net.gv")
visualize.draw_net(config, winner, view=True, filename="winner-net-enabled.gv",
show_disabled=False)
visualize.draw_net(config, winner, view=True, filename="winner-net-enabled-pruned.gv",
show_disabled=False, prune_unused=True)


if __name__ == '__main__':
run()

8 comments on commit a53f53f

@evolvingfridge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great example !
Can next example be Doom please :)

@CodeReclaimers
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably need to get HyperNEAT implemented for that, because NEAT itself is terrible for directly evolving networks with large numbers of inputs. There are probably some other easier things that could be done, though, like using NEAT to evolve the structure of a tensorflow network or something. :)

@evolvingfridge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be more easier to use numba and numpy arrays as data structures to speed up code ?
Speed comparison:
https://www.ibm.com/developerworks/community/blogs/jfp/entry/How_To_Compute_Mandelbrodt_Set_Quickly?lang=en

@abrahamrhoffman
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodeReclaimers . I am very much hoping I can port this code to Tensorflow to integrate with a RL agent I am building. What do you see as the future of this project? In order to use in a production envoronment, I need the distributed part of Tensorflow. GPU / cuda is also a must. As @D0pa, suggested, performing vector / maxtrix multiplication is a lot faster when you perform the ops outside of python. Numpy would be a good intermediate step.

@evolvingfridge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abrahamrhoffman, I just meant to use numpy array objects only, then porting code will be open to user to tensorflow, theano, numba, cytone, etc ... would be simplified. This based on all comparisons I could find all of them use numpy array objects. Additionally sharing data between process is much more efficient when using numpy objects.

@abrahamrhoffman
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@D0pa that makes a lot of sense to me. +1

@CodeReclaimers
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer: I haven't tried any optimization yet other than running under pypy, so everything below is just a guess on my part. When evaluating some nontrivial networks, pypy gave a ~10x speedup over Python 2.7 for me on a fairly old machine, but it has the drawback that you probably can't run OpenAI Gym or other simulation environments under it.

The networks produced by DefaultGenome and neat.nn probably won't be helped too much by fast numpy or GPU-based matrix/vector multiplies, because they are small networks with no regular structure (and in the most general case you can also have different aggregation and activation functions at each node). Using Numba, Cython, or maybe a sparse matrix/vector lib might be a good option for those in cases where you need to apply the network to a large set of inputs, though.

For larger, more regular networks like those normally used in the common deep learning packages, I think it should be possible to come up with an indirect coding scheme (along the lines of Compositional Pattern-Producing Networks with HyperNEAT) for the network structure so that a fairly simple NEAT network generates the structure for a larger conventional network that is then trained using Tensorflow or some other package.

In the long run, there are some bottlenecks in the core NEAT algorithm and the Default* classes that I may implement in C (for example, the genome distance computation), but it would still be great to have at least a few examples that show how to make use of numba, Tensorflow, etc. where the application requires big, regular networks for handling large structured inputs like screen images.

My intent is to gradually get examples working for as many of the OpenAI Gym problems as possible, just to see if there's some NEAT-based approach to those problems that might result in more simple/efficient networks than you might get if you tried structuring them manually. I honestly suspect NEAT isn't actually a good fit for all of them, but it will be fun to try. :)

@evolvingfridge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodeReclaimers, Thank you very much for a such a great answer! I think your answer should be posted on issues page, it would be nice to see where others hitting bottlenecks. For my project I am using training and validation data sets with size 1-8GB and hoping to use larger data sets. Main bottle neck for me was just transferring data for each process/genome, seems like I solved that. I am newbie in python profiling this is my first project where I am actually facing issue with processing speed. This week will run new version on ipyparallel cluster with large data set and share results where I am hitting bottleneck.
I am HYPED for Hyper Neat implementation :)

Please sign in to comment.