Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CartPole] Add sutton_barto_reward argument #958

Merged
merged 15 commits into from
Mar 12, 2024
Merged

Conversation

Kallinteris-Andreas
Copy link
Collaborator

@Kallinteris-Andreas Kallinteris-Andreas commented Mar 8, 2024

Description

Adds sutton_barto_reward environment parameter to CartPole to support the reward function in Sutton and Barto, Reinforcement Learning an introduction. This parameter is turned off by default.

Fixes #790

Type of change

  • Documentation only change (no code changed)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

  • I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@Kallinteris-Andreas Kallinteris-Andreas changed the title cartpole v2 add cartpole v2 Mar 8, 2024
@Kallinteris-Andreas
Copy link
Collaborator Author

@pseudo-rnd-thoughts
I have changed the CartPole code and the documentation, can you review it and if it passes review I can move into updating the vector and jax versions

@RedTachyon
Copy link
Member

Why do we want this to be a v2, as opposed to just adding the optional argument to the constructor and letting users use it?

And in any case, please write the reward computation a bit more explicitly, converting booleans to floats and manipulating them is clever and all that, but unreadable and annoying to maintain in the future. A ternary operator would probably work nicely here.

I didn't participate in the original issue, and I'm overall not convinced this is necessary since both rewards are perfectly valid (albeit a bit different), but I won't oppose adding it as an optional setting. However, if we make it the v2 version, there are two problems:

  • The already messy fact that we have v0 and v1 concurrently, becomes even more messy due to having v0, v1 and v2
  • We are sorta officially stating that v2 is the "correct" version, which I'm not fully convinced of (though I might need to read the original thread in more detail)

@Kallinteris-Andreas
Copy link
Collaborator Author

As far as I can tell, new arguments have not been added mid-environment version

Also I believe that the previous version was technical incorrect (even if it worked.)

@pseudo-rnd-thoughts
Copy link
Member

Environment versions have been used to indicate that a bug in the previous version has been fixed, which would change the dynamic, so users should be alerted to this change.

Therefore, as this is an addition that can't affect previous users, this should not be a version change, particularly as a feature is turned off by default

@Kallinteris-Andreas Kallinteris-Andreas changed the title add cartpole v2 [CartPole] Add sutton_barto_reward argument Mar 11, 2024
@Kallinteris-Andreas
Copy link
Collaborator Author

Now it simply adds the new argument. I have redone the pull request. And does not bump the environment version

Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the two comments

gymnasium/envs/classic_control/cartpole.py Show resolved Hide resolved
gymnasium/envs/classic_control/cartpole.py Outdated Show resolved Hide resolved
gymnasium/envs/classic_control/cartpole.py Outdated Show resolved Hide resolved
@Kallinteris-Andreas
Copy link
Collaborator Author

I cannot understand why the testing is failing.

@Kallinteris-Andreas Kallinteris-Andreas marked this pull request as ready for review March 12, 2024 12:24
@Kallinteris-Andreas Kallinteris-Andreas merged commit c57f6d8 into main Mar 12, 2024
16 checks passed
@Kallinteris-Andreas
Copy link
Collaborator Author

Kallinteris-Andreas commented Mar 12, 2024

I realized after merging that we need to update the vector and jax versions also

what is the vectorized version's reward

reward = np.ones_like(terminated, dtype=np.float32)

@Kallinteris-Andreas Kallinteris-Andreas deleted the cartpole-v2 branch March 23, 2024 14:36
@Kallinteris-Andreas Kallinteris-Andreas restored the cartpole-v2 branch March 23, 2024 14:37
@Kallinteris-Andreas Kallinteris-Andreas deleted the cartpole-v2 branch March 23, 2024 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug Report] CartPole's reward function constantly returns 1 (even when it falls)
3 participants