Add Gym 0.26 support #780

carlosluis · 2022-02-19T23:31:05Z

EDIT: Please use the Gymnasium branch instead of this one: #1327
(the new branch is compatible with gym 0.21, gym 0.26 and gymnasium)

See comment #780 (comment) to use this PR

To install SB3 with gym 0.26+ support:

pip install git+https://github.com/carlosluis/stable-baselines3@fix_tests

(or as a requirement: git+https://github.com/carlosluis/stable-baselines3@fix_tests#egg=stable_baselines3[extra,tests,docs] and "sb3_contrib @ git+https://github.com/Stable-Baselines-Team/stable-baselines3-contrib@feat/new-gym-version" in a setup.py see DLR-RM/rl-baselines3-zoo#256)

Note: if you want to use gymnasium, you can do:

import sys
import gymnasium
sys.modules["gym"] = gymnasium

before any stable-baselines3 import.

for native gymnasium support, you can take a look at #1327 or install it with:

pip install git+https://github.com/DLR-RM/stable-baselines3@feat/gymnasium-support

Description

Gym 0.26 has been released and with it breaking changes. The objective of this PR is to fix all the failing tests.

Moving from Gym 0.26.2 to gymanisum 0.26.2 (which are the same) is part of this PR.

Missing:

update changelog
update notebooks and check it works with gym 0.26 (Update HER + update to gym 0.26+ Stable-Baselines-Team/rl-colab-notebooks#9)
update RL Zoo and check it works with gym 0.26 (Add Gym 0.26 support rl-baselines3-zoo#256, not ready, gym needs to be fixed first)
update tutorials (Update to gym 0.26 araffin/rl-tutorial-jnrr19#13)
prepare docker image
~~push docker images~~
update SB3 contrib (Add support for Gym 0.26 Stable-Baselines-Team/stable-baselines3-contrib#71)
Check test environments [Feature Request] Pass all gym Environments in this repo through the env checkers during testing #1156
Use shimmy to automatically convert gym envs, see Gymnasium Integration thu-ml/tianshou#789
(note: waiting for Add check when registering gym compatibility envs to avoid gymnasium warning Farama-Foundation/Shimmy#29 and proper decoupling with gymnasium)

Motivation and Context

Gym release notes

Types of changes

Bug fix (non-breaking change which fixes an issue)

closes #840 #871
closes #271
closes #1156

Checklist:

I've read the CONTRIBUTION guide (required)
I have updated the changelog accordingly (required).
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.
I have reformatted the code using make format (required)
I have checked the codestyle using make check-codestyle and make lint (required)
I have ensured make pytest and make type both pass. (required)
I have checked that the documentation builds using make doc (required)

Status

There's still 28 failing tests when running locally, need to investigate further

araffin · 2022-02-21T16:16:58Z

it seems that the failures come from pygame not being installed somehow, probably an issue from gym...

carlosluis · 2022-02-21T19:55:46Z

it seems that the failures come from pygame not being installed somehow, probably an issue from gym...

Only a subset of the failures, those are fixed now.

carlosluis · 2022-02-21T22:24:16Z

@RedTachyon I'm currently investigating a bug which may be related to openai/gym#2422.

Here's a stack trace of the type of errors:

tests/test_save_load.py:216: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
stable_baselines3/dqn/dqn.py:258: in learn
    return super(DQN, self).learn(
stable_baselines3/common/off_policy_algorithm.py:354: in learn
    rollout = self.collect_rollouts(
stable_baselines3/common/off_policy_algorithm.py:584: in collect_rollouts
    actions, buffer_actions = self._sample_action(learning_starts, action_noise, env.num_envs)
stable_baselines3/common/off_policy_algorithm.py:415: in _sample_action
    unscaled_action, _ = self.predict(self._last_obs, deterministic=False)
stable_baselines3/dqn/dqn.py:238: in predict
    action = np.array([self.action_space.sample() for _ in range(n_batch)])
stable_baselines3/dqn/dqn.py:238: in <listcomp>
    action = np.array([self.action_space.sample() for _ in range(n_batch)])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = Discrete(10)

    def sample(self) -> int:
>       return self.start + self.np_random.randint(self.n)
E       AttributeError: 'numpy.random._generator.Generator' object has no attribute 'randint'

I've been digging into the code and the problem seems to be that this line

def sample(self) -> int:
        return self.start + self.np_random.randint(self.n)

is being called with self.np_random of type numpy.random._generator.Generator, instead of gym.utils.seeding.RandomNumberGenerator. The former doesn't have a method randint so the code fails.

If it's any help, this is only happening after loading a model, so the problem may be related to it (I haven't checked much on that front yet, wanted to rule out something obvious with the changes on seeding first). Here's an example test were it fails

stable-baselines3/tests/test_save_load.py

Line 216 in 58a9806

model.learn(total_timesteps=64, reset_num_timesteps=False)

It seems discrete.py in Gym assumes np_random will always be of some type but doesn't enforce it explicitly? Or there's something funny happening with the loading...

RedTachyon · 2022-02-22T10:14:52Z

I think I found the issue - the custom RNG class inherits from the numpy Generator for compatibility purposes, but when it gets pickled/unpickled, it defaults to the numpy behavior which creates a new Generator.

I think I see two main ways to solve it - either in __setstate__ of Space make sure that a RNG is set instead of a Generator; or in the RNG code do some custom code to pickle/unpickle the proper class

carlosluis · 2022-02-22T18:42:54Z

I think I found the issue - the custom RNG class inherits from the numpy Generator for compatibility purposes, but when it gets pickled/unpickled, it defaults to the numpy behavior which creates a new Generator.

I think I see two main ways to solve it - either in __setstate__ of Space make sure that a RNG is set instead of a Generator; or in the RNG code do some custom code to pickle/unpickle the proper class

I see, thanks for checking. I wonder what this means for SB3 adopting Gym 0.22, since the problem seems to be on Gym's side. I'm guessing the fix (whenever it comes) won't be available for SB3 until the next release by Gym?

araffin · 2022-02-23T15:59:22Z

won't be available for SB3 until the next release by Gym?

openai/gym#2640 must be fixed first too

jkterry1 · 2022-02-23T19:52:53Z

We're going to do a 0.22.1 release in the new future that will fix all of these things, some of the relevant fixes here have already been merged

stable_baselines3/common/envs/bit_flipping_env.py

carlosluis · 2022-03-03T20:10:09Z

Update on this:

Confirmed that Fix RandomNumberGenerator pickling openai/gym#2639 solves the bug I mentioned before, so in the next Gym release those tests will pass.
There's three remaining tests failing, two separate bugs

Mean reward is below desired threshold.

FAILED tests/test_identity.py::test_discrete[env2-A2C] - AssertionError: Mean reward below threshold: 88...
FAILED tests/test_identity.py::test_discrete[env2-PPO] - AssertionError: Mean reward below threshold: 86...

Not sure exactly how gym's updates are causing this lower mean reward, need to check this further

Warning not raised in HER test

FAILED tests/test_her.py::test_save_load_replay_buffer[True-False] - assert 0 == 1

Caused because the trajectory truncation is not raising a warning in these lines:

stable-baselines3/tests/test_her.py

Lines 258 to 261 in cdaa9ab

    
           if truncate_last_trajectory: 
        
               assert len(recwarn) == 1 
        
               warning = recwarn.pop(UserWarning) 
        
               assert "The last trajectory in the replay buffer will be truncated" in str(warning.message)

EDIT: CI also shows about 3 tests failing with EOFError, and I also observed those locally, but they seem to be also solved when using gym master (tested locally).

carlosluis · 2022-03-08T20:00:51Z

Regarding the "mean reward below threshold" problem, after some further investigations the root cause is the change in seeding behaviour in gym.

SB3 sets seeds here:

stable-baselines3/stable_baselines3/common/base_class.py

Line 576 in e88eb1c

self.env.seed(seed)

Before gym 0.22.0, the default seed() method was not actually setting any seed [source]. In gym 0.22 the default seed() does set the seed [source].

The consequence of it is, for instance, that the result of

stable-baselines3/stable_baselines3/common/evaluation.py

Line 82 in e88eb1c

observations = env.reset()

is not consistent between gym==0.21.0 and gym >= 0.22.0. The solution I say makes the most sense is to simply adjust the threshold value in the test.

araffin · 2022-03-08T20:44:50Z

Before gym 0.22.0, the default seed() method was not actually setting any seed [source]. In gym 0.22 the default seed() does set the seed [source].

I see...

is not consistent between gym==0.21.0 and gym >= 0.22.0.

is there any way to make it consistent?

The solution I say makes the most sense is to simply adjust the threshold value in the test.

The whole point of those performance tests is to detect any change that may injure performance/change results. (they fail easily as soon as any minor change is made) So we should not change the threshold but rather fix the underlying change/issue, or in the current case, I would prefer changing the seed if we cannot have consistent behavior with previous version.

btw, are all those warnings (see below) due to gym only?

gym/utils/seeding.py:47: 1 warning
tests/test_callbacks.py: 4153 warnings
tests/test_cnn.py: 377 warnings
tests/test_custom_policy.py: 106 warnings
tests/test_deterministic.py: 238 warnings
tests/test_dict_env.py: 14116 warnings
tests/test_envs.py: 2 warnings
tests/test_her.py: 1566 warnings
tests/test_identity.py: 18153 warnings
tests/test_logger.py: 300 warnings
tests/test_monitor.py: 3000 warnings
tests/test_predict.py: 109 warnings
tests/test_run.py: 222 warnings
tests/test_save_load.py: 6597 warnings
tests/test_spaces.py: 1110 warnings
tests/test_train_eval_mode.py: 31 warnings
tests/test_utils.py: 12 warnings
tests/test_vec_envs.py: 216 warnings
tests/test_vec_monitor.py: 4000 warnings
tests/test_vec_normalize.py: 6547 warnings
  /opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/gym/utils/seeding.py:47: DeprecationWarning: WARN: Function `rng.randint(low, [high, size, dtype])` is marked as deprecated and will be removed in the future. Please use `rng.integers(low, [high, size, dtype])` instead.

same for

tests/test_envs.py::test_non_default_action_spaces[new_action_space4]
  /home/runner/work/stable-baselines3/stable-baselines3/stable_baselines3/common/env_checker.py:272: UserWarning: We recommend you to use a symmetric and normalized Box action space (range=[-1, 1]) cf https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html
    warnings.warn(

tests/test_utils.py: 16 warnings
  /opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/gym/utils/seeding.py:138: DeprecationWarning: WARN: Function `hash_seed(seed, max_bytes)` is marked as deprecated and will be removed in the future. 
    deprecation(

tests/test_utils.py: 16 warnings
  /opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/gym/utils/seeding.py:175: DeprecationWarning: WARN: Function `_bigint_from_bytes(bytes)` is marked as deprecated and will be removed in the future. 
    deprecation(

?

carlosluis · 2022-03-08T20:47:56Z

Similarly, the HER test failure was also caused by the change in RNG. The failing test required that after training for 200 steps, the env had to be left in the middle of a trajectory. After the change in gym, the RNG gods decided that after 200 steps the env had just finished an episode, i.e., self.current_idx == 0 in

stable-baselines3/stable_baselines3/her/her_replay_buffer.py

Line 523 in e88eb1c

if current_idx > 0:

==> warning is never raised ==> test assertion fails.

A simple change in the seed fixes the test.

…nto fix_tests

araffin · 2023-03-29T23:34:34Z

Closing this one as I just released an alpha version of the Gymnasium branch on PyPi:

# sb3 contrib installs sb3 too
pip install "sb3_contrib>=2.0.0a1" --upgrade

Documentation is available here: https://stable-baselines3.readthedocs.io/en/feat-gymnasium-support/

rfali · 2023-05-16T19:09:16Z

@araffin if I understand correctly, SB3 directly jumped from supporting gym==0.21 to gymnasium==0.28.1?

araffin · 2023-05-16T19:33:53Z

SB3 2.x (master version) supports gymnasium 0.28.1 and gym 0.21/0.26 via shimmy

rfali · 2023-05-16T19:44:57Z

@araffin is there any documentation/reference on how to use SB3 with gym 0.21/0.26 via shimmy? I saw the install requirements of master branch and it only mentioned gymnasium (does not include gym).

araffin · 2023-05-17T07:25:00Z

is there any documentation/reference on how to use SB3 with gym 0.21/0.26 via shimmy?

It is automatic if you use make_vec_env (and if you pass the env to the constructor), otherwise take a look at shimmy documentation.

stable-baselines3/stable_baselines3/common/env_util.py

Lines 99 to 101 in fd0cd82

    
           env = env_id(**env_kwargs) 
        
           # Patch to support gym 0.21/0.26 and gymnasium 
        
           env = _patch_env(env)

https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/vec_env/dummy_vec_env.py#L30

Patch env is defined here:
https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/vec_env/patch_gym.py#L15

and yes gym is an optional dependency.

carlosluis added 3 commits February 19, 2022 22:48

Remove references to GoalEnv

ee71299

Fix env tests

65343f5

Fix bug in test creating invalid box space

4d794f3

Miffyli added the Maintainers on vacation Maintainers are on vacation so they can recharge their batteries, we will be back soon ;) label Feb 20, 2022

araffin mentioned this pull request Feb 21, 2022

Gym fixes - Follow up from #705 #734

Merged

11 tasks

araffin removed the Maintainers on vacation Maintainers are on vacation so they can recharge their batteries, we will be back soon ;) label Feb 21, 2022

Add classic_control extra packages from gym

513ed08

araffin mentioned this pull request Feb 21, 2022

Pin gym version #782

Merged

14 tasks

Merge branch 'master' into fix_tests

861b612

araffin mentioned this pull request Feb 23, 2022

[Bug] pip install fails on atari-py #779

Closed

3 tasks

araffin reviewed Feb 25, 2022

View reviewed changes

stable_baselines3/common/envs/bit_flipping_env.py Show resolved Hide resolved

RedTachyon mentioned this pull request Feb 25, 2022

Fix RandomNumberGenerator pickling openai/gym#2639

Merged

carlosluis added 4 commits February 28, 2022 21:52

Merge branch 'master' into fix_tests

2277012

Change back to gym 0.22 for testing

e5195a0

Fix failing set_env test

435f5fb

Fix test failiing due to deprectation of env.seed

f64346a

Merge branch 'master' into fix_tests

9efc37b

Adjust mean reward threshold in failing test

daaa84c

Fix her test failing due to rng

e62edde

vamsianumula mentioned this pull request Feb 28, 2023

Slime env compatible with SB3 LucaInis/Slime-Enviroment-GIT-#5

Merged

elliottower mentioned this pull request Mar 2, 2023

[Question] recommended way to convert gymnasium/gym v26 envs to SB3 VecEnvs #1356

Closed

4 tasks

Merge branch 'master' into fix_tests

51ffad2

This was referenced Mar 7, 2023

[Question] Wrappers for Atari games Farama-Foundation/Gymnasium#364

Closed

Update sb3_vector_wrapper to fix compatibility issues Farama-Foundation/SuperSuit#206

Closed

elliottower mentioned this pull request Mar 7, 2023

Update pettingzoo stable-baselines3 training script & compatibility wrapper google-deepmind/meltingpot#117

Closed

araffin added 5 commits March 11, 2023 22:52

Merge branch 'master' into fix_tests

00b9bbb

Merge branch 'fix_tests' of github.com:carlosluis/stable-baselines3 i…

8a149f4

…nto fix_tests

Forks don't have access to private variables

b82cacd

Merge branch 'master' into fix_tests

07d2171

Fix linter warnings

d0f5e8a

araffin mentioned this pull request Mar 13, 2023

Update doc about Gymnasium support #1382

Merged

16 tasks

araffin and others added 9 commits March 14, 2023 12:55

[ci skip] Merge branch 'master' into fix_tests

d5c79b0

Merge branch 'master' into fix_tests

e91b436

Merge branch 'fix_tests' of github.com:carlosluis/stable-baselines3 i…

5ec7e39

…nto fix_tests

Fix env checker for GoalEnv

986e6c0

Merge branch 'master' into fix_tests

2be728d

Update env checker (more info) and fix dtype

331853a

Use micromamab for Docker

68861b6

Update dependencies

9f0d5d8

Clarify VecEnv doc

6617e6e

araffin closed this Mar 29, 2023

dancju mentioned this pull request May 4, 2023

fix Poetry dependency AI4Finance-Foundation/FinRL#997

Merged

araffin mentioned this pull request May 24, 2023

[Feature Request] Gymnasium: Add options argument to VecEnv.reset() #1523

Closed

1 task

araffin mentioned this pull request Aug 22, 2023

Fix type hints for callbacks, utils and VecTranspose #1648

Merged

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Gym 0.26 support #780

Add Gym 0.26 support #780

carlosluis commented Feb 19, 2022 •

edited by araffin

araffin commented Feb 21, 2022

carlosluis commented Feb 21, 2022

carlosluis commented Feb 21, 2022

RedTachyon commented Feb 22, 2022

carlosluis commented Feb 22, 2022

araffin commented Feb 23, 2022

jkterry1 commented Feb 23, 2022

carlosluis commented Mar 3, 2022 •

edited

carlosluis commented Mar 8, 2022

araffin commented Mar 8, 2022

carlosluis commented Mar 8, 2022

araffin commented Mar 29, 2023

rfali commented May 16, 2023

araffin commented May 16, 2023

rfali commented May 16, 2023

araffin commented May 17, 2023

Add Gym 0.26 support #780

Add Gym 0.26 support #780

Conversation

carlosluis commented Feb 19, 2022 • edited by araffin

Description

Motivation and Context

Types of changes

Checklist:

Status

araffin commented Feb 21, 2022

carlosluis commented Feb 21, 2022

carlosluis commented Feb 21, 2022

RedTachyon commented Feb 22, 2022

carlosluis commented Feb 22, 2022

araffin commented Feb 23, 2022

jkterry1 commented Feb 23, 2022

carlosluis commented Mar 3, 2022 • edited

carlosluis commented Mar 8, 2022

araffin commented Mar 8, 2022

carlosluis commented Mar 8, 2022

araffin commented Mar 29, 2023

rfali commented May 16, 2023

araffin commented May 16, 2023

rfali commented May 16, 2023

araffin commented May 17, 2023

carlosluis commented Feb 19, 2022 •

edited by araffin

carlosluis commented Mar 3, 2022 •

edited