Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Callback returns aren't handled properly by either OffPolicyAlgorithm or OnPolicyAlgorithm #1706

Closed
5 tasks done
iwishiwasaneagle opened this issue Oct 3, 2023 · 8 comments 路 Fixed by #1707
Closed
5 tasks done
Assignees
Labels
enhancement New feature or request

Comments

@iwishiwasaneagle
Copy link
Contributor

馃悰 Bug

OffPolicyAlgorithm and OnPolicyAlgorithm evaluate the callback returns incorrectly using is False which only works if the returned type is a singleton boolean. This was fine for the default environments which us the singleton False, but this breaks down if the returned value is as a result of a numpy or torch operation such as np.any(np.ones(10)<0).

To Reproduce

import gymnasium
import numpy as np
from stable_baselines3 import SAC
from stable_baselines3.common.callbacks import BaseCallback, EvalCallback

class AlwaysFailCallback(BaseCallback):
    def _on_step(self):
        return np.bool_(0)

env_id = "Pendulum-v1"
model = SAC("MlpPolicy", env_id, policy_kwargs=dict(net_arch=[32]))
eval_callback = EvalCallback(
    gymnasium.make(env_id),
    callback_after_eval=AlwaysFailCallback(),
    n_eval_episodes=1,
    eval_freq=1,
    warn=False,
)
model.learn(3, callback=eval_callback)

assert eval_callback.n_calls == 1

Relevant log output / Error message

Eval num_timesteps=1, episode_reward=-1035.10 +/- 0.00
Episode length: 200.00 +/- 0.00
New best mean reward!
Eval num_timesteps=2, episode_reward=-1251.05 +/- 0.00
Episode length: 200.00 +/- 0.00
Eval num_timesteps=3, episode_reward=-1185.56 +/- 0.00
Episode length: 200.00 +/- 0.00

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[17], line 21
     12 eval_callback = EvalCallback(
     13     gymnasium.make(env_id),
     14     callback_after_eval=AlwaysFailCallback(),
   (...)
     17     warn=False,
     18 )
     19 model.learn(3, callback=eval_callback)
---> 21 assert eval_callback.n_calls == 1

AssertionError:

System Info

  • OS: Linux-5.15.131-1-MANJARO-x86_64-with-glibc2.35 # 1 SMP PREEMPT Thu Sep 7 11:05:18 UTC 2023
  • Python: 3.10.10
  • Stable-Baselines3: 2.0.0
  • PyTorch: 2.0.1+cu117
  • GPU Enabled: False
  • Numpy: 1.23.5
  • Cloudpickle: 2.2.1
  • Gymnasium: 0.28.1

Checklist

  • My issue does not relate to a custom gym environment. (Use the custom gym env template instead)
  • I have checked that there is no similar issue in the repo
  • I have read the documentation
  • I have provided a minimal and working example to reproduce the bug
  • I've used the markdown code blocks for both code and stack traces.
@iwishiwasaneagle
Copy link
Contributor Author

Just FYI for anyone with this issue, wrap your callbacks return value with bool such as return bool(x).

@qgallouedec qgallouedec self-assigned this Oct 3, 2023
@qgallouedec
Copy link
Collaborator

I agree that not x should be preferred to x is False (after all, who am I to give my opinion? It's written in PEP8), however, I wouldn't say it's a bug insofar as _on_step is supposed to return a bool:

def _on_step(self) -> bool:

In this respect, the example you give is a misuse.

>>> isinstance(np.bool_(0), bool)
False

@qgallouedec qgallouedec removed the bug Something isn't working label Oct 3, 2023
@araffin
Copy link
Member

araffin commented Oct 3, 2023

Hello,
this is intended and that's for legacy reasons (callback used to return None, and the not ... wouldn't work), but it's probably time for a breaking change, as the numpy bool not being a true bool is a known limitation

@araffin
Copy link
Member

araffin commented Oct 4, 2023

The full story:

@araffin
Copy link
Member

araffin commented Oct 4, 2023

I agree that not x should be preferred to x is False (after all, who am I to give my opinion? It's written in PEP8), however, I wouldn't say it's a bug insofar as _on_step is supposed to return a bool:

If we allow that change, there will be some weird behavior allowed, like returning {} to stop training.

@iwishiwasaneagle
Copy link
Contributor Author

I agree that not x should be preferred to x is False (after all, who am I to give my opinion? It's written in PEP8), however, I wouldn't say it's a bug insofar as _on_step is supposed to return a bool:

def _on_step(self) -> bool:

In this respect, the example you give is a misuse.

>>> isinstance(np.bool_(0), bool)
False

Ah okay, my bad. So it was intended to be ONLY True or False?

@iwishiwasaneagle
Copy link
Contributor Author

Hello, this is intended and that's for legacy reasons (callback used to return None, and the not ... wouldn't work), but it's probably time for a breaking change, as the numpy bool not being a true bool is a known limitation

what about something like if x is None or not x (assuming None is supposed to have the same effect as False)

@iwishiwasaneagle
Copy link
Contributor Author

I agree that not x should be preferred to x is False (after all, who am I to give my opinion? It's written in PEP8), however, I wouldn't say it's a bug insofar as _on_step is supposed to return a bool:

If we allow that change, there will be some weird behavior allowed, like returning {} to stop training.

Well, technically a empty sequence is supposed to be Falsy so, other than it being an incorrect type, it's not technically incorrect.

@araffin araffin added the enhancement New feature or request label Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
3 participants