Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Allow noop_max to be 0 and make ClipReward actually clip rewards #271

Closed
RaghuSpaceRajan opened this issue Dec 23, 2020 · 5 comments · Fixed by #1286 or #1327
Closed

[Bug] Allow noop_max to be 0 and make ClipReward actually clip rewards #271

RaghuSpaceRajan opened this issue Dec 23, 2020 · 5 comments · Fixed by #1286 or #1327
Labels
enhancement New feature or request

Comments

@RaghuSpaceRajan
Copy link

In atari_wrappers.py, allow noop_max to be 0 to make it possible to have a completely deterministic Atari environment.

Also in the same file, having ClipRewardEnv bin, and not clip rewards, causes problems when there's a small amount of reward noise in the environments, e.g., actual reward of 0.0 on injecting noise changes to 0.01 and this gets "clipped" to 1.0. I know that many libraries are implementing it like done here (i.e. binning), but is there a reason we can't change it here?

I have a patched fork with the fix here. Shall I create a pull request?

@RaghuSpaceRajan RaghuSpaceRajan added the bug Something isn't working label Dec 23, 2020
@araffin araffin added enhancement New feature or request and removed bug Something isn't working labels Dec 24, 2020
@araffin
Copy link
Member

araffin commented Dec 24, 2020

Hello,
what you described sounds more like a feature request than a bug, hence I would ask you to fill up the feature request template ;)

In atari_wrappers.py, allow noop_max to be 0 to make it possible to have a completely deterministic Atari environment.

What prevents you from re-defining

class AtariWrapper(gym.Wrapper):

that does not include the NoopWrapper?

I would agree we could simply avoid wrapping it if noop_max=0 (I would accept such PR).

Also in the same file, having ClipRewardEnv bin, and not clip rewards, causes problems when there's a small amount of reward noise in the environments,

Same here, what prevents you from defining a custom gym.Wrapper (that you can pass to the make_vec_env function)?

We usually keep those Atari wrappers as-is to have consistent comparison between algorithms.

@Miffyli
Copy link
Collaborator

Miffyli commented Dec 24, 2020

I concur with @araffin. I'd rather keep the current wrappers as is for consistency (despite the unclear naming "ClipReward"), but support for noop_max=0.

@RaghuSpaceRajan
Copy link
Author

Hey,

I would call these 2 logical bugs, because my expectation is that noop_max can be 0 and that ClipReward matches its semantics and actually clips. Having said that, the feature request template is easier to fill ;), so I have filled it further below.

What prevents you from re-defining

class AtariWrapper(gym.Wrapper):

that does not include the NoopWrapper?
I would agree we could simply avoid wrapping it if noop_max=0 (I would accept such PR).

I presume you mean that instead of line 238 there, I put in an if statement there that does not wrap with NoopResetEnv in case noop_max = 0. That does sound a better solution to me than what I proposed.
But by 're-define', do you mean I should define a new class, separate from AtariWrapper, or redefine AtariWrapper itself in the PR?

Same here, what prevents you from defining a custom gym.Wrapper (that you can pass to the make_vec_env function)?

Here, I suppose you mean that I should define a new class, separate from AtariWrapper and use it for my purposes. The reason I'd rather have it in the AtariWrapper is because it's going to be used much more frequently by users and I feel it is a logical bug. I faced problems in learning in the presence of reward noise because of the binning behaviour with another library and we're trying to move to using SB3 and everything seems cleaner and more accurate here, so I thought it'd be best to have it "correctly done" here. :) However, it is true that reward noise may not always be present and binning and clipping, both, can be useful. And I can understand why you'd want to keep the current behaviour for consistency.

We usually keep those Atari wrappers as-is to have consistent comparison between algorithms.
I'd rather keep the current wrappers as is for consistency (despite the unclear naming "ClipReward")

Do you mean consistency with OpenAI baseline's results? If so, they have also already allowed noop_max = 0 in OpenAI Gym. Regarding ClipReward, I can understand if you want to keep the binning behaviour, but then shouldn't we at least rename ClipReward to BinReward or even better, SignumReward in keeping with the goal of Stable Baselines to be more stable and, also, I presume to be more semantically correct and easier to understand and use?

🚀 Feature

In atari_wrappers.py, allow noop_max to be 0.

Also in the same file, I propose we rename ClipRewardEnv to SignumRewardEnv and define ClipRewardEnv with the semantically correct code.

Motivation

Allow noop_max to be 0 to make it possible to have a completely deterministic Atari environment. This is needed to have vanilla Atari environments which is useful to see agents' performances in the absence of confounding factors.

Having ClipRewardEnv bin, and not clip rewards, is misleading semantically.

### Checklist

  • I have checked that there is no similar issue in the repo

@araffin
Copy link
Member

araffin commented Jan 6, 2021

I presume you mean that instead of line 238 there, I put in an if statement there that does not wrap with NoopResetEnv in case noop_max = 0. That does sound a better solution to me than what I proposed.

sounds good ;)

But by 're-define', do you mean I should define a new class, separate from AtariWrapper, or redefine AtariWrapper itself in the PR?

I meant defining a new class for your own purposes (not part of the PR).

because it's going to be used much more frequently by users and I feel it is a logical bug.

AtariWrapper is usually meant to be used "as-is" in order to have comparable results (we have a link in the doc that describe atari preprocessing in details).

Do you mean consistency with OpenAI baseline's results?

Consistency with many codebases and results out there and also avoiding breaking changes.
I would keep ClipRewardEnv name.

I presume to be more semantically correct and easier to understand and use?

that's what the docstring is meant for ;) (and looking at https://stable-baselines3.readthedocs.io/en/master/common/atari_wrappers.html I find it quite explicit)

small amount of reward noise in the environments

btw, why would you do that?

@RaghuSpaceRajan
Copy link
Author

sounds good ;)

Done here.

we have a link in the doc that describe atari preprocessing in details

You mean this link, right: https://stable-baselines3.readthedocs.io/en/master/common/atari_wrappers.html
I have read those and it seemed most similar to the OpenAI Gym Atari wrapper and so I assumed comparability was meant with OpenAI Baselines. However, I have also seen other Atari wrappers which are doing things a bit differently and given how variable RL is and some of your tweets ;) was not sure how comparable the others are.

that's what the docstring is meant for ;) (and looking at https://stable-baselines3.readthedocs.io/en/master/common/atari_wrappers.html I find it quite explicit)

I had read the docstring and I'd argue these things tend to get forgotten (or at the very least require extra bits in one's brain to remember ;)) when we're in the middle of debugging code 40 layers deep in the stack. And that's exactly what happened to me with another library that 'clipped'. One could also make a case that it's fine for docstrings also to be semantically unclear, since one can always read the code to know 'exactly' what's happening. And that's kind of how RL is at the moment in many instances, isn't it? ;)

small amount of reward noise in the environments

btw, why would you do that?

We're developing a library where we try to inject different difficulties into environments to analyse / benchmark an agent's robustness to those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
3 participants