Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update OpenAI Lander example #252

Merged

Conversation

Warosaurus
Copy link
Contributor

This PR updates the OpenAI Lander example. It addresses changes made in the upstream lander code to make this example work again.

score += reward
env.render()
if done:
if terminated:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use the truncated state since it seems more to behave like the old "done".
According to the docstring truncated means:

truncated (bool): whether a truncation condition outside the scope of the MDP is satisfied.
               Typically a timelimit, but could also be used to indicate agent physically going out of bounds.
               Can be used to end the episode prematurely before a `terminal state` is reached.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the correct action here would have been if terminated or truncated.

data.append(np.hstack((observation, action, reward)))

if done:
if terminated:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment for line 223 -> same topic

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.05%) to 95.16% when pulling 36dcd31 on Warosaurus:update_openai_lander_example into 4928381 on CodeReclaimers:master.

2 similar comments
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.05%) to 95.16% when pulling 36dcd31 on Warosaurus:update_openai_lander_example into 4928381 on CodeReclaimers:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.05%) to 95.16% when pulling 36dcd31 on Warosaurus:update_openai_lander_example into 4928381 on CodeReclaimers:master.

step = 0
data = []
while 1:
step += 1
if step < 200 and random.random() < 0.2:
action = env.action_space.sample()
else:
output = net.activate(observation)
output = net.activate(observation_init_vals)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this wrong? Shouldn't you have named this observation? Now it just feeds the initial observation every time through the loop, and the observation never changes. Same issue below!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you are right. I created another PR ... maybe have a look at it and feel free to comment if u find something
#274

@markste-in
Copy link

Follow up:
#274

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants