[DRAFT] Add functionality of FireReset to AtariPreprocessing wrapper #805

balazsgyenes · 2023-11-30T14:50:08Z

Description

Breakout requires an agent to use FIRE to launch the next "ball" after losing a life or losing the game.
Without this feature, the agent can get stuck if it hasn't learned to press FIRE after losing a life (or losing the game), and the FIRE action otherwise does nothing.

This PR integrates the functionality of Stable-Baselines3's FireResetEnv wrapper into Gymnasium's AtariPreprocessing wrapper.

Fixes #781

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

Checklist:

I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Factor life tracking and reactions into subfunctions.

balazsgyenes · 2023-11-30T15:03:01Z

Note: I'm not quite done with this feature yet, and have not tested and fully documented the changes. Sorry for the delay, I will do it as soon as I can. In the meantime, feedback is very much appreciated.

In addition, I noticed another potential bug that I wanted to ask about before addressing. The wrapper currently takes the max over the last two observations of the N frame skip frames. However, if the wrapped environment is done (either due to truncated or terminated), no further frames are saved, and the obs_buffer can be in any state (2 old observations from the previous step, 1 new / 1 old, 2 new observations). This has probably never caused any issues because Atari environments are typically not trained with time limits (I think?), and the final observation is ignored anyway if terminated=True. However, it is incorrect if truncated=True.

A potential fix would use a deque object from the collections library to save every observation, with only the last 2 observations retained. This would likely come with a performance penalty, since e.g. for frame_skip=4, half the observations are not even generated. @pseudo-rnd-thoughts, do you think this bug should be addressed?

pseudo-rnd-thoughts · 2024-01-29T18:01:47Z

Reading through openai/baselines#240 then I believe that FireReset is not necessary as a feature as it is implemented in ale-py automatically. Therefore, SB3 and CleanRL etc who use it should be able to remove it from the environment stack

Draft implementation of fire on reset.

9a8a20f

Factor life tracking and reactions into subfunctions.

pseudo-rnd-thoughts closed this Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] Add functionality of FireReset to AtariPreprocessing wrapper #805

[DRAFT] Add functionality of FireReset to AtariPreprocessing wrapper #805

balazsgyenes commented Nov 30, 2023

balazsgyenes commented Nov 30, 2023

pseudo-rnd-thoughts commented Jan 29, 2024

[DRAFT] Add functionality of FireReset to AtariPreprocessing wrapper #805

[DRAFT] Add functionality of FireReset to AtariPreprocessing wrapper #805

Conversation

balazsgyenes commented Nov 30, 2023

Description

Type of change

Checklist:

balazsgyenes commented Nov 30, 2023

pseudo-rnd-thoughts commented Jan 29, 2024