Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frame stack #142

Merged
merged 17 commits into from May 6, 2022
Merged

Frame stack #142

merged 17 commits into from May 6, 2022

Conversation

jackyoung96
Copy link
Contributor

frame_stack.py modifying (same as stablebaseline3)
Original: fill observation stack by zero observations
Modify: fill observation stack by the copy of the first observation

aec_mock_test modifying (Reflecting modified frame stack)

@benblack769
Copy link
Contributor

@jackyoung96 Is this meant as a replacement for #141 ? If so, can you close that PR?

@benblack769
Copy link
Contributor

This PR looks pretty good. I understand what it is supposed to do, and the changes are simple and minimal.

Just a couple of additional things to make this production-ready:

@jackyoung96
Copy link
Contributor Author

@jackyoung96 Is this meant as a replacement for #141 ? If so, can you close that PR?

Yes it is. Please close it.

This PR looks pretty good. I understand what it is supposed to do, and the changes are simple and minimal.

Just a couple of additional things to make this production-ready:

I will. Thanks!

@jackyoung96
Copy link
Contributor Author

Could you check it? Thank you!

@jackyoung96
Copy link
Contributor Author

I add the stack_dim0 flag for stack_frame wrapper.

Pytorch uses channel first order and Tensorflow uses channel last order, so we can choose where the stacked dimension locate by stack_dim0 flag.

@jjshoots
Copy link
Member

Hi @jackyoung96, can you fix linting and clear the test fails? Thanks!

@jackyoung96
Copy link
Contributor Author

Hi @jackyoung96, can you fix linting and clear the test fails? Thanks!

Sure, I will ASAP

@jackyoung96
Copy link
Contributor Author

@benblack769 Sorry for the late commit for the lint test failure. I totally forgot about it. I finally push the final version!

@jjshoots
Copy link
Member

@jackyoung96 Thanks, we will likely merge this once the Atari work is done :)

@jackyoung96
Copy link
Contributor Author

Oh... there are still issues.... sorry for this.

@jackyoung96
Copy link
Contributor Author

This commit should be worked!

@jjshoots
Copy link
Member

jjshoots commented May 2, 2022

Hey @benblack769, sorry to drag you in, but do you know anything about the above error? I can't quite understand what's the error either.

@benblack769
Copy link
Contributor

I belive it is just a flaky error that comes up from time to time. I think perhaps legal moves are just chosen by chance in Mahjog so the game doesn't end quickly enough for the test to pass in 1/1000 games or something. A bit silly.

@benblack769
Copy link
Contributor

Just try rerunning the test

@jjshoots
Copy link
Member

jjshoots commented May 4, 2022

Hi @jackyoung96, tests are still failing, do you think that can be fixed?

@jjshoots
Copy link
Member

jjshoots commented May 4, 2022

@BolunDai0216

@BolunDai0216
Copy link
Contributor

BolunDai0216 commented May 4, 2022

@BolunDai0216

I think the tests are failing partially because of this line:

When creating reset(), seed now needs to be added as an argument, i.e., def reset(seed=None):

def reset(self, seed=None):

@jackyoung96 Would it be possible to change your reset() in frame_stack_v2 so that it matches frame_stack_v1 argument-wise and see if that solves the issue?

@jackyoung96
Copy link
Contributor Author

@BolunDai0216 I fixed it and commit. I hope it can solve the issue.

Copy link
Member

@jjshoots jjshoots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks pretty good to me, just some general comments about code cleanliness some minor improvement. Once the comments are addressed, I'll get someone else to review it before merging.

Thanks for your contribution!

@@ -8,17 +8,15 @@ def change_obs_space(space, num_indicators):
if isinstance(space, Box):
ndims = len(space.shape)
if ndims == 1:
pad_space = np.ones((num_indicators,), dtype=space.dtype)
pad_space = np.max(space.high) * np.ones((num_indicators,), dtype=space.dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It unify the high and low range of the indicator channel with the original observation channels.
Because sometimes the values original observation channel are in range of [0,255] (Before normalized)

Copy link
Member

@jjshoots jjshoots May 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to instead do np.min(space.high)? This is in the very rare but potentially possible case of space.high = [1, 255, 255], pad_space will then not be contained within the observation space.

You could implement it to check that all space.high are the same values np.all(space.high[0] == space.high), and if they aren't then raise a warning. Doing this within the wrapper function would incur overhead cost though, so I think it's better to put this check in the wrapper init, and then just do np.min(space.high) everywhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Thanks

README.md Outdated
Comment on lines 47 to 49
`frame_stack_v1(env, num_frames=4, stack_dim0=False)` stacks the most recent frames. For vector games observed via plain vectors (1D arrays), the output is just concatenated to a longer 1D array. 2D or 3D arrays are stacked to be taller 3D arrays. Stacked dimension can be set to dim=0 (default dim=-1) by stack_dim0=True. At the start of the game, frames that don't yet exist are filled with 0s. `num_frames=1` is analogous to not using this function.

`frame_stack_v2(env, num_frames=4, stack_dim0=False)` stacks the most recent frames. For vector games observed via plain vectors (1D arrays), the output is just concatenated to a longer 1D array. 2D or 3D arrays are stacked to be taller 3D arrays. Stacked dimension can be set to dim=0 (default dim=-1) by stack_dim0=True. At the start of the game, frames that don't yet exist are filled with the copies of the first frame. `num_frames=1` is analogous to not using this function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I understand the idea behind stack_dim0, could we instead make it accept a stack_dim argument, and then add an assert that makes sure it's not any other value other than 0 or -1. Right now, while I'm sure this stack_dim=0 implementation works, somewhere down the line if there is an API change that allows stacking in arbitrary dimensions, the arguments would be different and will just be slightly less elegant. So I think it's best to get it right the first time so we don't need to change the API later in the future.

Also, do keep arguments (stack_dim0=True) in a code line.


return tile_shape, new_shape


def stack_obs_space(obs_space, stack_size):
def stack_obs_space(obs_space, stack_size, stack_dim0=False):
"""
obs_space_dict: Dictionary of observations spaces of agents
stack_size: Number of frames in the observation stack
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a comment here describing what stack_dim0 (or stack_dim possibly) does.

README.md Outdated
Comment on lines 47 to 49
`frame_stack_v1(env, num_frames=4, stack_dim0=False)` stacks the most recent frames. For vector games observed via plain vectors (1D arrays), the output is just concatenated to a longer 1D array. 2D or 3D arrays are stacked to be taller 3D arrays. Stacked dimension can be set to dim=0 (default dim=-1) by stack_dim0=True. At the start of the game, frames that don't yet exist are filled with 0s. `num_frames=1` is analogous to not using this function.

`frame_stack_v2(env, num_frames=4, stack_dim0=False)` stacks the most recent frames. For vector games observed via plain vectors (1D arrays), the output is just concatenated to a longer 1D array. 2D or 3D arrays are stacked to be taller 3D arrays. Stacked dimension can be set to dim=0 (default dim=-1) with `stack_dim0=True`. At the start of the game, frames that don't yet exist are filled with the copies of the first frame. `num_frames=1` is analogous to not using this function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm sure that it works, can we change stack_dim0 to just become stack_dim, and then assert it to be either 0 or -1? This way if there are new API changes in the future that allow for arbitrary stacking dimensions, the arguments between the new API and this one won't be any different.

Also, do keep arguments (like stack_dim0=True) inside code lines.

return new_obs
elif ndims == 3 or ndims == 2:
obs = obs if ndims == 3 else np.expand_dims(obs, 2)
old_shaped3 = obs.shape[2]
new_obs = np.pad(obs, [(0, 0), (0, 0), (0, num_indicators)])
new_obs[:, :, old_shaped3 + indicator_num] = 1.0
new_obs[:, :, old_shaped3 + indicator_num] = np.max(space.high)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for this change as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason with #142 (comment)

Comment on lines 72 to 80
for _ in range(stack_size):
self.stack = stack_obs(
self.stack,
obs,
self.old_obs_space,
stack_size,
stack_dim0
)
self.reset_flag = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a for loop necessary here? Is it not possible to simply do something like tile_shape * obs? This is just to make things run a little bit faster.

I'm ok if a completely new function was created just for this functionality as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your right. Actually, this is how frame_stack_v0 was implemented before, so I left it as it is. If you want, we can modify it, but I think it makes the code (very x 3) little faster 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it makes the code a lot less readable then we can just keep the original implementation, but if it's simple to implement, I think changing it would be better. Small optimizations like this add up a lot in the long run (especially for python). 😄

@jjshoots
Copy link
Member

jjshoots commented May 6, 2022

Hi @jackyoung96, tests are still failing, though I think it's just linting.

@jjshoots jjshoots merged commit 9439cb9 into Farama-Foundation:master May 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants