Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added an example that uses the OpenAI Gym's LunarLander-v2 environment.
- Loading branch information
1 parent
ee83edb
commit a53f53f
Showing
3 changed files
with
449 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
# neat-python configuration for the LunarLander-v2 environment on OpenAI Gym | ||
|
||
[NEAT] | ||
pop_size = 200 | ||
max_fitness_threshold = 1000.0 | ||
reset_on_extinction = 0 | ||
|
||
[DefaultGenome] | ||
num_inputs = 8 | ||
num_hidden = 0 | ||
num_outputs = 4 | ||
initial_connection = full | ||
feed_forward = True | ||
compatibility_disjoint_coefficient = 1.0 | ||
compatibility_weight_coefficient = 1.0 | ||
conn_add_prob = 0.15 | ||
conn_delete_prob = 0.1 | ||
node_add_prob = 0.15 | ||
node_delete_prob = 0.1 | ||
activation_default = tanh | ||
activation_options = tanh clamped gauss hat sin | ||
activation_mutate_rate = 0.05 | ||
aggregation_default = sum | ||
aggregation_options = sum | ||
aggregation_mutate_rate = 0.0 | ||
bias_init_mean = 0.0 | ||
bias_init_stdev = 1.0 | ||
bias_replace_rate = 0.02 | ||
bias_mutate_rate = 0.8 | ||
bias_mutate_power = 0.4 | ||
bias_max_value = 30.0 | ||
bias_min_value = -30.0 | ||
response_init_mean = 1.0 | ||
response_init_stdev = 0.0 | ||
response_replace_rate = 0.0 | ||
response_mutate_rate = 0.1 | ||
response_mutate_power = 0.01 | ||
response_max_value = 30.0 | ||
response_min_value = -30.0 | ||
|
||
weight_max_value = 30 | ||
weight_min_value = -30 | ||
weight_init_mean = 0.0 | ||
weight_init_stdev = 1.0 | ||
weight_mutate_rate = 0.8 | ||
weight_replace_rate = 0.02 | ||
weight_mutate_power = 0.4 | ||
enabled_default = True | ||
enabled_mutate_rate = 0.01 | ||
|
||
[DefaultSpeciesSet] | ||
compatibility_threshold = 3.0 | ||
|
||
[DefaultStagnation] | ||
species_fitness_func = mean | ||
max_stagnation = 15 | ||
species_elitism = 5 | ||
|
||
[DefaultReproduction] | ||
elitism = 2 | ||
survival_threshold = 0.2 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,185 @@ | ||
# Evolve a control/reward estimation network for the OpenAI Gym | ||
# LunarLander-v2 environment (https://gym.openai.com/envs/LunarLander-v2). | ||
# This is a work in progress, and currently takes ~100 generations to | ||
# find a network that can land with a score >= 200 at least a couple of | ||
# times. It has yet to solve the environment, which could have something | ||
# to do to me being totally clueless in regard to reinforcement learning. :) | ||
|
||
from __future__ import print_function | ||
|
||
import gym | ||
import gym.wrappers | ||
|
||
import matplotlib.pyplot as plt | ||
|
||
import neat | ||
import numpy as np | ||
import os | ||
import pickle | ||
import random | ||
|
||
import visualize | ||
|
||
env = gym.make('LunarLander-v2') | ||
|
||
print("action space: {0!r}".format(env.action_space)) | ||
print("observation space: {0!r}".format(env.observation_space)) | ||
|
||
# Limit episodes to 400 time steps to cut down on training time. | ||
# 400 steps is more than enough time to land with a winning score. | ||
print(env.spec.tags.get('wrapper_config.TimeLimit.max_episode_steps')) | ||
env.spec.tags['wrapper_config.TimeLimit.max_episode_steps'] = 400 | ||
print(env.spec.tags.get('wrapper_config.TimeLimit.max_episode_steps')) | ||
|
||
env = gym.wrappers.Monitor(env, 'results', force=True) | ||
|
||
discounted_reward = 0.9 | ||
min_reward = -200 | ||
max_reward = 200 | ||
|
||
score_range = [] | ||
|
||
def eval_fitness_shared(genomes, config): | ||
nets = [] | ||
for gid, g in genomes: | ||
nets.append((g, neat.nn.FeedForwardNetwork.create(g, config))) | ||
g.fitness = [] | ||
|
||
episodes = [] | ||
scores = [] | ||
for genome, net in nets: | ||
observation = env.reset() | ||
episode_data = [] | ||
j = 0 | ||
total_score = 0.0 | ||
while 1: | ||
if net is not None: | ||
output = net.activate(observation) | ||
action = np.argmax(output) | ||
else: | ||
action = env.action_space.sample() | ||
|
||
observation, reward, done, info = env.step(action) | ||
total_score += reward | ||
episode_data.append((j, observation, action, reward)) | ||
|
||
if done: | ||
break | ||
|
||
j += 1 | ||
|
||
episodes.append(episode_data) | ||
scores.append(total_score) | ||
genome.fitness = total_score | ||
|
||
if scores: | ||
score_range.append((min(scores), np.mean(scores), max(scores))) | ||
|
||
# Compute discounted rewards. | ||
discounted_rewards = [] | ||
for episode in episodes: | ||
rewards = np.array([reward for j, observation, action, reward in episode]) | ||
N = len(episode) | ||
D = np.sum((np.eye(N, k=i) * discounted_reward ** i for i in range(N))) | ||
discounted_rewards.append(np.dot(D, rewards)) | ||
|
||
print(min(map(np.min, discounted_rewards)), max(map(np.max, discounted_rewards))) | ||
|
||
# Normalize rewards | ||
for i in range(len(discounted_rewards)): | ||
discounted_rewards[i] = 2 * (discounted_rewards[i] - min_reward) / (max_reward - min_reward) - 1.0 | ||
|
||
print(min(map(np.min, discounted_rewards)), max(map(np.max, discounted_rewards))) | ||
|
||
episode_filter = [random.randint(0, len(episodes)-1) for _ in range(10)] | ||
for genome, net in nets: | ||
reward_error = [] | ||
for i in episode_filter: | ||
episode = episodes[i] | ||
discount_reward = discounted_rewards[i] | ||
for (j, observation, action, reward), dr in zip(episode, discount_reward): | ||
#test_set.append((observation, action, reward, dr)) | ||
output = net.activate(observation) | ||
reward_error.append((output[action] - dr)**2) | ||
|
||
print(genome.fitness, np.mean(reward_error)) | ||
genome.fitness -= 100 * np.mean(reward_error) | ||
|
||
|
||
def run(): | ||
# Load the config file, which is assumed to live in | ||
# the same directory as this script. | ||
local_dir = os.path.dirname(__file__) | ||
config_path = os.path.join(local_dir, 'config') | ||
config = neat.Config(neat.DefaultGenome, neat.DefaultReproduction, | ||
neat.DefaultSpeciesSet, neat.DefaultStagnation, | ||
config_path) | ||
|
||
pop = neat.Population(config) | ||
stats = neat.StatisticsReporter() | ||
pop.add_reporter(stats) | ||
pop.add_reporter(neat.StdOutReporter()) | ||
# Checkpoint every 10 generations or 900 seconds. | ||
pop.add_reporter(neat.Checkpointer(10, 900)) | ||
|
||
# Run until the winner from a generation is able to solve the environment. | ||
while 1: | ||
winner = pop.run(eval_fitness_shared, 1) | ||
|
||
visualize.plot_stats(stats, ylog=False, view=False, filename="fitness.svg") | ||
|
||
if score_range: | ||
S = np.array(score_range).T | ||
plt.plot(S[0], 'r-') | ||
plt.plot(S[1], 'b-') | ||
plt.plot(S[2], 'g-') | ||
plt.grid() | ||
plt.savefig("score-ranges.svg") | ||
plt.close() | ||
|
||
mfs = sum(stats.get_fitness_mean()[-5:]) / 5.0 | ||
print("Average mean fitness over last 5 generations: {0}".format(mfs)) | ||
|
||
mfs = sum(stats.get_fitness_stat(min)[-5:]) / 5.0 | ||
print("Average min fitness over last 5 generations: {0}".format(mfs)) | ||
|
||
winner_net = neat.nn.FeedForwardNetwork.create(winner, config) | ||
|
||
for k in range(100): | ||
observation = env.reset() | ||
score = 0 | ||
while 1: | ||
output = winner_net.activate(observation) | ||
observation, reward, done, info = env.step(np.argmax(output)) | ||
score += reward | ||
env.render() | ||
if done: | ||
break | ||
print(k, score) | ||
if score < 200: | ||
break | ||
else: | ||
print("Solved.") | ||
break | ||
|
||
winner = stats.best_genome() | ||
print(winner) | ||
|
||
# Save the winner. | ||
with open('winner.pickle', 'wb') as f: | ||
pickle.dump(winner, f) | ||
|
||
visualize.plot_stats(stats, ylog=False, view=True, filename="fitness.svg") | ||
visualize.plot_species(stats, view=True, filename="speciation.svg") | ||
|
||
visualize.draw_net(config, winner, True) | ||
|
||
visualize.draw_net(config, winner, view=True, filename="winner-net.gv") | ||
visualize.draw_net(config, winner, view=True, filename="winner-net-enabled.gv", | ||
show_disabled=False) | ||
visualize.draw_net(config, winner, view=True, filename="winner-net-enabled-pruned.gv", | ||
show_disabled=False, prune_unused=True) | ||
|
||
|
||
if __name__ == '__main__': | ||
run() |
Oops, something went wrong.
a53f53f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great example !
Can next example be Doom please :)
a53f53f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably need to get HyperNEAT implemented for that, because NEAT itself is terrible for directly evolving networks with large numbers of inputs. There are probably some other easier things that could be done, though, like using NEAT to evolve the structure of a tensorflow network or something. :)
a53f53f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be more easier to use numba and numpy arrays as data structures to speed up code ?
Speed comparison:
https://www.ibm.com/developerworks/community/blogs/jfp/entry/How_To_Compute_Mandelbrodt_Set_Quickly?lang=en
a53f53f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CodeReclaimers . I am very much hoping I can port this code to Tensorflow to integrate with a RL agent I am building. What do you see as the future of this project? In order to use in a production envoronment, I need the distributed part of Tensorflow. GPU / cuda is also a must. As @D0pa, suggested, performing vector / maxtrix multiplication is a lot faster when you perform the ops outside of python. Numpy would be a good intermediate step.
a53f53f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abrahamrhoffman, I just meant to use numpy array objects only, then porting code will be open to user to tensorflow, theano, numba, cytone, etc ... would be simplified. This based on all comparisons I could find all of them use numpy array objects. Additionally sharing data between process is much more efficient when using numpy objects.
a53f53f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@D0pa that makes a lot of sense to me. +1
a53f53f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disclaimer: I haven't tried any optimization yet other than running under pypy, so everything below is just a guess on my part. When evaluating some nontrivial networks, pypy gave a ~10x speedup over Python 2.7 for me on a fairly old machine, but it has the drawback that you probably can't run OpenAI Gym or other simulation environments under it.
The networks produced by DefaultGenome and neat.nn probably won't be helped too much by fast numpy or GPU-based matrix/vector multiplies, because they are small networks with no regular structure (and in the most general case you can also have different aggregation and activation functions at each node). Using Numba, Cython, or maybe a sparse matrix/vector lib might be a good option for those in cases where you need to apply the network to a large set of inputs, though.
For larger, more regular networks like those normally used in the common deep learning packages, I think it should be possible to come up with an indirect coding scheme (along the lines of Compositional Pattern-Producing Networks with HyperNEAT) for the network structure so that a fairly simple NEAT network generates the structure for a larger conventional network that is then trained using Tensorflow or some other package.
In the long run, there are some bottlenecks in the core NEAT algorithm and the Default* classes that I may implement in C (for example, the genome distance computation), but it would still be great to have at least a few examples that show how to make use of numba, Tensorflow, etc. where the application requires big, regular networks for handling large structured inputs like screen images.
My intent is to gradually get examples working for as many of the OpenAI Gym problems as possible, just to see if there's some NEAT-based approach to those problems that might result in more simple/efficient networks than you might get if you tried structuring them manually. I honestly suspect NEAT isn't actually a good fit for all of them, but it will be fun to try. :)
a53f53f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CodeReclaimers, Thank you very much for a such a great answer! I think your answer should be posted on issues page, it would be nice to see where others hitting bottlenecks. For my project I am using training and validation data sets with size 1-8GB and hoping to use larger data sets. Main bottle neck for me was just transferring data for each process/genome, seems like I solved that. I am newbie in python profiling this is my first project where I am actually facing issue with processing speed. This week will run new version on ipyparallel cluster with large data set and share results where I am hitting bottleneck.
I am HYPED for Hyper Neat implementation :)