[Bug]: Callback returns aren't handled properly by either OffPolicyAlgorithm or OnPolicyAlgorithm #1706

iwishiwasaneagle · 2023-10-03T16:12:57Z

🐛 Bug

OffPolicyAlgorithm and OnPolicyAlgorithm evaluate the callback returns incorrectly using is False which only works if the returned type is a singleton boolean. This was fine for the default environments which us the singleton False, but this breaks down if the returned value is as a result of a numpy or torch operation such as np.any(np.ones(10)<0).

To Reproduce

import gymnasium
import numpy as np
from stable_baselines3 import SAC
from stable_baselines3.common.callbacks import BaseCallback, EvalCallback

class AlwaysFailCallback(BaseCallback):
    def _on_step(self):
        return np.bool_(0)

env_id = "Pendulum-v1"
model = SAC("MlpPolicy", env_id, policy_kwargs=dict(net_arch=[32]))
eval_callback = EvalCallback(
    gymnasium.make(env_id),
    callback_after_eval=AlwaysFailCallback(),
    n_eval_episodes=1,
    eval_freq=1,
    warn=False,
)
model.learn(3, callback=eval_callback)

assert eval_callback.n_calls == 1

Relevant log output / Error message

Eval num_timesteps=1, episode_reward=-1035.10 +/- 0.00
Episode length: 200.00 +/- 0.00
New best mean reward!
Eval num_timesteps=2, episode_reward=-1251.05 +/- 0.00
Episode length: 200.00 +/- 0.00
Eval num_timesteps=3, episode_reward=-1185.56 +/- 0.00
Episode length: 200.00 +/- 0.00

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[17], line 21
     12 eval_callback = EvalCallback(
     13     gymnasium.make(env_id),
     14     callback_after_eval=AlwaysFailCallback(),
   (...)
     17     warn=False,
     18 )
     19 model.learn(3, callback=eval_callback)
---> 21 assert eval_callback.n_calls == 1

AssertionError:

System Info

OS: Linux-5.15.131-1-MANJARO-x86_64-with-glibc2.35 # 1 SMP PREEMPT Thu Sep 7 11:05:18 UTC 2023
Python: 3.10.10
Stable-Baselines3: 2.0.0
PyTorch: 2.0.1+cu117
GPU Enabled: False
Numpy: 1.23.5
Cloudpickle: 2.2.1
Gymnasium: 0.28.1

Checklist

My issue does not relate to a custom gym environment. (Use the custom gym env template instead)
I have checked that there is no similar issue in the repo
I have read the documentation
I have provided a minimal and working example to reproduce the bug
I've used the markdown code blocks for both code and stack traces.

The text was updated successfully, but these errors were encountered:

iwishiwasaneagle · 2023-10-03T16:21:22Z

Just FYI for anyone with this issue, wrap your callbacks return value with bool such as return bool(x).

qgallouedec · 2023-10-03T16:59:29Z

I agree that not x should be preferred to x is False (after all, who am I to give my opinion? It's written in PEP8), however, I wouldn't say it's a bug insofar as _on_step is supposed to return a bool:

stable-baselines3/stable_baselines3/common/callbacks.py

Line 93 in c6c660e

def _on_step(self) -> bool:

In this respect, the example you give is a misuse.

>>> isinstance(np.bool_(0), bool)
False

araffin · 2023-10-03T17:34:51Z

Hello,
this is intended and that's for legacy reasons (callback used to return None, and the not ... wouldn't work), but it's probably time for a breaking change, as the numpy bool not being a true bool is a known limitation

araffin · 2023-10-04T07:39:39Z

The full story:

callbacks were introduced in [WIP] Refactoring RL models hill-a/stable-baselines#3
we allowed to stop training early, but because callbacks used to return None, we had to check if the return value was False not ... wouldn't work: we had a longer discussion about why == False instead of if not ... in Allow the callback to abort training for early stopping. hill-a/stable-baselines#98
change the check to is False in hill-a/stable-baselines@810da4f (as == False is not recommended)

araffin · 2023-10-04T07:47:17Z

I agree that not x should be preferred to x is False (after all, who am I to give my opinion? It's written in PEP8), however, I wouldn't say it's a bug insofar as _on_step is supposed to return a bool:

If we allow that change, there will be some weird behavior allowed, like returning {} to stop training.

iwishiwasaneagle · 2023-10-04T07:49:37Z

I agree that not x should be preferred to x is False (after all, who am I to give my opinion? It's written in PEP8), however, I wouldn't say it's a bug insofar as _on_step is supposed to return a bool:

stable-baselines3/stable_baselines3/common/callbacks.py

Line 93 in c6c660e

def _on_step(self) -> bool:

In this respect, the example you give is a misuse.
>>> isinstance(np.bool_(0), bool)
False

Ah okay, my bad. So it was intended to be ONLY True or False?

iwishiwasaneagle · 2023-10-04T07:50:25Z

Hello, this is intended and that's for legacy reasons (callback used to return None, and the not ... wouldn't work), but it's probably time for a breaking change, as the numpy bool not being a true bool is a known limitation

what about something like if x is None or not x (assuming None is supposed to have the same effect as False)

iwishiwasaneagle · 2023-10-04T07:52:02Z

I agree that not x should be preferred to x is False (after all, who am I to give my opinion? It's written in PEP8), however, I wouldn't say it's a bug insofar as _on_step is supposed to return a bool:

If we allow that change, there will be some weird behavior allowed, like returning {} to stop training.

Well, technically a empty sequence is supposed to be Falsy so, other than it being an incorrect type, it's not technically incorrect.

iwishiwasaneagle added the bug Something isn't working label Oct 3, 2023

iwishiwasaneagle mentioned this issue Oct 3, 2023

fix: Follow PEP8 guidelines and evaluate falsy to truthy with not rather than is False. #1707

Merged

16 tasks

qgallouedec self-assigned this Oct 3, 2023

qgallouedec removed the bug Something isn't working label Oct 3, 2023

araffin added the enhancement New feature or request label Oct 4, 2023

araffin closed this as completed in #1707 Oct 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Callback returns aren't handled properly by either OffPolicyAlgorithm or OnPolicyAlgorithm #1706

[Bug]: Callback returns aren't handled properly by either OffPolicyAlgorithm or OnPolicyAlgorithm #1706

iwishiwasaneagle commented Oct 3, 2023

iwishiwasaneagle commented Oct 3, 2023

qgallouedec commented Oct 3, 2023

araffin commented Oct 3, 2023

araffin commented Oct 4, 2023

araffin commented Oct 4, 2023

iwishiwasaneagle commented Oct 4, 2023

iwishiwasaneagle commented Oct 4, 2023

iwishiwasaneagle commented Oct 4, 2023

[Bug]: Callback returns aren't handled properly by either OffPolicyAlgorithm or OnPolicyAlgorithm #1706

[Bug]: Callback returns aren't handled properly by either OffPolicyAlgorithm or OnPolicyAlgorithm #1706

Comments

iwishiwasaneagle commented Oct 3, 2023

🐛 Bug

To Reproduce

Relevant log output / Error message

System Info

Checklist

iwishiwasaneagle commented Oct 3, 2023

qgallouedec commented Oct 3, 2023

araffin commented Oct 3, 2023

araffin commented Oct 4, 2023

araffin commented Oct 4, 2023

iwishiwasaneagle commented Oct 4, 2023

iwishiwasaneagle commented Oct 4, 2023

iwishiwasaneagle commented Oct 4, 2023