New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frame stack #142
Frame stack #142
Conversation
aec_mock_test modifying (Reflecting modified frame stack)
@jackyoung96 Is this meant as a replacement for #141 ? If so, can you close that PR? |
This PR looks pretty good. I understand what it is supposed to do, and the changes are simple and minimal. Just a couple of additional things to make this production-ready:
|
Yes it is. Please close it.
I will. Thanks! |
Could you check it? Thank you! |
I add the stack_dim0 flag for stack_frame wrapper. Pytorch uses channel first order and Tensorflow uses channel last order, so we can choose where the stacked dimension locate by stack_dim0 flag. |
Hi @jackyoung96, can you fix linting and clear the test fails? Thanks! |
Sure, I will ASAP |
@benblack769 Sorry for the late commit for the lint test failure. I totally forgot about it. I finally push the final version! |
@jackyoung96 Thanks, we will likely merge this once the Atari work is done :) |
Oh... there are still issues.... sorry for this. |
This commit should be worked! |
Hey @benblack769, sorry to drag you in, but do you know anything about the above error? I can't quite understand what's the error either. |
I belive it is just a flaky error that comes up from time to time. I think perhaps legal moves are just chosen by chance in Mahjog so the game doesn't end quickly enough for the test to pass in 1/1000 games or something. A bit silly. |
Just try rerunning the test |
Hi @jackyoung96, tests are still failing, do you think that can be fixed? |
I think the tests are failing partially because of this line:
When creating
@jackyoung96 Would it be possible to change your |
@BolunDai0216 I fixed it and commit. I hope it can solve the issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks pretty good to me, just some general comments about code cleanliness some minor improvement. Once the comments are addressed, I'll get someone else to review it before merging.
Thanks for your contribution!
supersuit/utils/agent_indicator.py
Outdated
@@ -8,17 +8,15 @@ def change_obs_space(space, num_indicators): | |||
if isinstance(space, Box): | |||
ndims = len(space.shape) | |||
if ndims == 1: | |||
pad_space = np.ones((num_indicators,), dtype=space.dtype) | |||
pad_space = np.max(space.high) * np.ones((num_indicators,), dtype=space.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It unify the high and low range of the indicator channel with the original observation channels.
Because sometimes the values original observation channel are in range of [0,255] (Before normalized)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to instead do np.min(space.high)
? This is in the very rare but potentially possible case of space.high = [1, 255, 255]
, pad_space
will then not be contained within the observation space.
You could implement it to check that all space.high
are the same values np.all(space.high[0] == space.high)
, and if they aren't then raise a warning. Doing this within the wrapper function would incur overhead cost though, so I think it's better to put this check in the wrapper init, and then just do np.min(space.high)
everywhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! Thanks
README.md
Outdated
`frame_stack_v1(env, num_frames=4, stack_dim0=False)` stacks the most recent frames. For vector games observed via plain vectors (1D arrays), the output is just concatenated to a longer 1D array. 2D or 3D arrays are stacked to be taller 3D arrays. Stacked dimension can be set to dim=0 (default dim=-1) by stack_dim0=True. At the start of the game, frames that don't yet exist are filled with 0s. `num_frames=1` is analogous to not using this function. | ||
|
||
`frame_stack_v2(env, num_frames=4, stack_dim0=False)` stacks the most recent frames. For vector games observed via plain vectors (1D arrays), the output is just concatenated to a longer 1D array. 2D or 3D arrays are stacked to be taller 3D arrays. Stacked dimension can be set to dim=0 (default dim=-1) by stack_dim0=True. At the start of the game, frames that don't yet exist are filled with the copies of the first frame. `num_frames=1` is analogous to not using this function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I understand the idea behind stack_dim0
, could we instead make it accept a stack_dim
argument, and then add an assert that makes sure it's not any other value other than 0 or -1. Right now, while I'm sure this stack_dim=0
implementation works, somewhere down the line if there is an API change that allows stacking in arbitrary dimensions, the arguments would be different and will just be slightly less elegant. So I think it's best to get it right the first time so we don't need to change the API later in the future.
Also, do keep arguments (stack_dim0=True
) in a code line.
|
||
return tile_shape, new_shape | ||
|
||
|
||
def stack_obs_space(obs_space, stack_size): | ||
def stack_obs_space(obs_space, stack_size, stack_dim0=False): | ||
""" | ||
obs_space_dict: Dictionary of observations spaces of agents | ||
stack_size: Number of frames in the observation stack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a comment here describing what stack_dim0
(or stack_dim
possibly) does.
README.md
Outdated
`frame_stack_v1(env, num_frames=4, stack_dim0=False)` stacks the most recent frames. For vector games observed via plain vectors (1D arrays), the output is just concatenated to a longer 1D array. 2D or 3D arrays are stacked to be taller 3D arrays. Stacked dimension can be set to dim=0 (default dim=-1) by stack_dim0=True. At the start of the game, frames that don't yet exist are filled with 0s. `num_frames=1` is analogous to not using this function. | ||
|
||
`frame_stack_v2(env, num_frames=4, stack_dim0=False)` stacks the most recent frames. For vector games observed via plain vectors (1D arrays), the output is just concatenated to a longer 1D array. 2D or 3D arrays are stacked to be taller 3D arrays. Stacked dimension can be set to dim=0 (default dim=-1) with `stack_dim0=True`. At the start of the game, frames that don't yet exist are filled with the copies of the first frame. `num_frames=1` is analogous to not using this function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I'm sure that it works, can we change stack_dim0
to just become stack_dim
, and then assert it to be either 0 or -1? This way if there are new API changes in the future that allow for arbitrary stacking dimensions, the arguments between the new API and this one won't be any different.
Also, do keep arguments (like stack_dim0=True
) inside code lines.
supersuit/utils/agent_indicator.py
Outdated
return new_obs | ||
elif ndims == 3 or ndims == 2: | ||
obs = obs if ndims == 3 else np.expand_dims(obs, 2) | ||
old_shaped3 = obs.shape[2] | ||
new_obs = np.pad(obs, [(0, 0), (0, 0), (0, num_indicators)]) | ||
new_obs[:, :, old_shaped3 + indicator_num] = 1.0 | ||
new_obs[:, :, old_shaped3 + indicator_num] = np.max(space.high) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for this change as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same reason with #142 (comment)
for _ in range(stack_size): | ||
self.stack = stack_obs( | ||
self.stack, | ||
obs, | ||
self.old_obs_space, | ||
stack_size, | ||
stack_dim0 | ||
) | ||
self.reset_flag = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a for loop necessary here? Is it not possible to simply do something like tile_shape * obs
? This is just to make things run a little bit faster.
I'm ok if a completely new function was created just for this functionality as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your right. Actually, this is how frame_stack_v0 was implemented before, so I left it as it is. If you want, we can modify it, but I think it makes the code (very x 3) little faster 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it makes the code a lot less readable then we can just keep the original implementation, but if it's simple to implement, I think changing it would be better. Small optimizations like this add up a lot in the long run (especially for python). 😄
Hi @jackyoung96, tests are still failing, though I think it's just linting. |
frame_stack.py modifying (same as stablebaseline3)
Original: fill observation stack by zero observations
Modify: fill observation stack by the copy of the first observation
aec_mock_test modifying (Reflecting modified frame stack)