[CartPole] Add `sutton_barto_reward` argument #958

Kallinteris-Andreas · 2024-03-08T08:40:33Z

Description

Adds sutton_barto_reward environment parameter to CartPole to support the reward function in Sutton and Barto, Reinforcement Learning an introduction. This parameter is turned off by default.

Fixes #790

Type of change

Documentation only change (no code changed)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Kallinteris-Andreas · 2024-03-08T09:28:31Z

@pseudo-rnd-thoughts
I have changed the CartPole code and the documentation, can you review it and if it passes review I can move into updating the vector and jax versions

RedTachyon · 2024-03-08T10:31:10Z

Why do we want this to be a v2, as opposed to just adding the optional argument to the constructor and letting users use it?

And in any case, please write the reward computation a bit more explicitly, converting booleans to floats and manipulating them is clever and all that, but unreadable and annoying to maintain in the future. A ternary operator would probably work nicely here.

I didn't participate in the original issue, and I'm overall not convinced this is necessary since both rewards are perfectly valid (albeit a bit different), but I won't oppose adding it as an optional setting. However, if we make it the v2 version, there are two problems:

The already messy fact that we have v0 and v1 concurrently, becomes even more messy due to having v0, v1 and v2
We are sorta officially stating that v2 is the "correct" version, which I'm not fully convinced of (though I might need to read the original thread in more detail)

Kallinteris-Andreas · 2024-03-08T14:01:54Z

As far as I can tell, new arguments have not been added mid-environment version

Also I believe that the previous version was technical incorrect (even if it worked.)

pseudo-rnd-thoughts · 2024-03-11T11:41:00Z

Environment versions have been used to indicate that a bug in the previous version has been fixed, which would change the dynamic, so users should be alerted to this change.

Therefore, as this is an addition that can't affect previous users, this should not be a version change, particularly as a feature is turned off by default

Kallinteris-Andreas · 2024-03-11T20:42:13Z

Now it simply adds the new argument. I have redone the pull request. And does not bump the environment version

pseudo-rnd-thoughts

With the two comments

gymnasium/envs/classic_control/cartpole.py

Kallinteris-Andreas · 2024-03-12T11:41:31Z

I cannot understand why the testing is failing.

Kallinteris-Andreas · 2024-03-12T15:10:44Z

I realized after merging that we need to update the vector and jax versions also

what is the vectorized version's reward

Gymnasium/gymnasium/envs/classic_control/cartpole.py

Line 469 in c57f6d8

reward = np.ones_like(terminated, dtype=np.float32)

Kallinteris-Andreas added 3 commits March 8, 2024 10:38

Update cartpole.py

3831ebc

Update __init__.py

e1bf278

fixes

5f83cee

Kallinteris-Andreas changed the title ~~cartpole v2~~ add cartpole v2 Mar 8, 2024

Update cartpole.py

4d8a5dc

Kallinteris-Andreas added 2 commits March 11, 2024 20:41

Update __init__.py

0656ce6

Update cartpole.py

97e10fb

Kallinteris-Andreas changed the title ~~add cartpole v2~~ [CartPole] Add sutton_barto_reward argument Mar 11, 2024

Kallinteris-Andreas added 5 commits March 11, 2024 20:49

Update cartpole.py

22c8fd3

Update __init__.py

2eb6182

Update cartpole.py

98b9929

pre-commit

dc2f139

float rewards

0da2a1c

Kallinteris-Andreas requested a review from pseudo-rnd-thoughts March 11, 2024 20:42

pseudo-rnd-thoughts approved these changes Mar 12, 2024

View reviewed changes

gymnasium/envs/classic_control/cartpole.py Show resolved Hide resolved

gymnasium/envs/classic_control/cartpole.py Outdated Show resolved Hide resolved

gymnasium/envs/classic_control/cartpole.py Outdated Show resolved Hide resolved

Kallinteris-Andreas added 2 commits March 12, 2024 10:39

.

80d2618

pre-commit

d534c22

pseudo-rnd-thoughts and others added 2 commits March 12, 2024 11:43

Update cartpole.py

6cf72af

pre-commit

6947ad2

Kallinteris-Andreas marked this pull request as ready for review March 12, 2024 12:24

Kallinteris-Andreas merged commit c57f6d8 into main Mar 12, 2024
16 checks passed

Kallinteris-Andreas deleted the cartpole-v2 branch March 23, 2024 14:36

Kallinteris-Andreas restored the cartpole-v2 branch March 23, 2024 14:37

Kallinteris-Andreas deleted the cartpole-v2 branch March 23, 2024 14:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CartPole] Add `sutton_barto_reward` argument #958

[CartPole] Add `sutton_barto_reward` argument #958

Kallinteris-Andreas commented Mar 8, 2024 •

edited by pseudo-rnd-thoughts

Loading

Kallinteris-Andreas commented Mar 8, 2024

RedTachyon commented Mar 8, 2024

Kallinteris-Andreas commented Mar 8, 2024

pseudo-rnd-thoughts commented Mar 11, 2024

Kallinteris-Andreas commented Mar 11, 2024

pseudo-rnd-thoughts left a comment

Kallinteris-Andreas commented Mar 12, 2024

Kallinteris-Andreas commented Mar 12, 2024 •

edited

Loading

[CartPole] Add sutton_barto_reward argument #958

[CartPole] Add sutton_barto_reward argument #958

Conversation

Kallinteris-Andreas commented Mar 8, 2024 • edited by pseudo-rnd-thoughts Loading

Description

Type of change

Checklist:

Kallinteris-Andreas commented Mar 8, 2024

RedTachyon commented Mar 8, 2024

Kallinteris-Andreas commented Mar 8, 2024

pseudo-rnd-thoughts commented Mar 11, 2024

Kallinteris-Andreas commented Mar 11, 2024

pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

Kallinteris-Andreas commented Mar 12, 2024

Kallinteris-Andreas commented Mar 12, 2024 • edited Loading

[CartPole] Add `sutton_barto_reward` argument #958

[CartPole] Add `sutton_barto_reward` argument #958

Kallinteris-Andreas commented Mar 8, 2024 •

edited by pseudo-rnd-thoughts

Loading

Kallinteris-Andreas commented Mar 12, 2024 •

edited

Loading