Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update OpenAI Lander example #252

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 11 additions & 9 deletions examples/openai-lander/evolve.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
import neat
import visualize

NUM_CORES = 8
NUM_CORES = multiprocessing.cpu_count()

env = gym.make('LunarLander-v2')

Expand Down Expand Up @@ -86,21 +86,22 @@ def __init__(self, num_workers):
def simulate(self, nets):
scores = []
for genome, net in nets:
observation = env.reset()
observation_init_vals, observation_init_info = env.reset()
step = 0
data = []
while 1:
step += 1
if step < 200 and random.random() < 0.2:
action = env.action_space.sample()
else:
output = net.activate(observation)
output = net.activate(observation_init_vals)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this wrong? Shouldn't you have named this observation? Now it just feeds the initial observation every time through the loop, and the observation never changes. Same issue below!

Copy link

@markste-in markste-in Aug 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you are right. I created another PR ... maybe have a look at it and feel free to comment if u find something
#274

action = np.argmax(output)

observation, reward, done, info = env.step(action)
# Note: done has been deprecated.
observation, reward, terminated, done, info = env.step(action)
data.append(np.hstack((observation, action, reward)))

if done:
if terminated:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment for line 223 -> same topic

break

data = np.array(data)
Expand Down Expand Up @@ -202,7 +203,7 @@ def run():
solved = True
best_scores = []
for k in range(100):
observation = env.reset()
observation_init_vals, observation_init_info = env.reset()
score = 0
step = 0
while 1:
Expand All @@ -211,14 +212,15 @@ def run():
# determine the best action given the current state.
votes = np.zeros((4,))
for n in best_networks:
output = n.activate(observation)
output = n.activate(observation_init_vals)
votes[np.argmax(output)] += 1

best_action = np.argmax(votes)
observation, reward, done, info = env.step(best_action)
# Note: done has been deprecated.
observation, reward, terminated, done, info = env.step(best_action)
score += reward
env.render()
if done:
if terminated:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use the truncated state since it seems more to behave like the old "done".
According to the docstring truncated means:

truncated (bool): whether a truncation condition outside the scope of the MDP is satisfied.
               Typically a timelimit, but could also be used to indicate agent physically going out of bounds.
               Can be used to end the episode prematurely before a `terminal state` is reached.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the correct action here would have been if terminated or truncated.

break

ec.episode_score.append(score)
Expand Down