Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed wrappers.vector.RecordEpisodeStatistics episode length computation from new autoreset api #1018

Merged

Conversation

TimSchneider42
Copy link
Contributor

Description

Fixed wrappers.vector.RecordEpisodeStatistics episode length computation.

Fixes #1017

Type of change

Please delete options that are not relevant.

  • Documentation only change (no code changed)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

  • I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@@ -56,16 +60,41 @@ def test_record_episode_statistics(num_envs, env_id="CartPole-v1", num_steps=100
data_equivalence(wrapper_vector_reward, vector_wrapper_reward)
data_equivalence(wrapper_vector_terminated, vector_wrapper_terminated)
data_equivalence(wrapper_vector_truncated, vector_wrapper_truncated)
data_equivalence(wrapper_vector_info, vector_wrapper_info)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this isn't possible due to the episode time taken will vary between the two

This is the reason for the sequent code

A comment would be helpful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strangely, this test did not fail for me. I will move it back to where it was originally. Not sure why I moved it in the first place.

wrapper_vector_time = wrapper_vector_info["episode"].pop("t")
vector_wrapper_time = vector_wrapper_info["episode"].pop("t")
assert wrapper_vector_time.shape == vector_wrapper_time.shape
assert wrapper_vector_time.dtype == vector_wrapper_time.dtype

data_equivalence(wrapper_vector_info, vector_wrapper_info)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we pop t, why can we not just do data_equivalence for the two infos?
Why did this test fail previously?

Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I messed up on the testing of this, it should be assert data_equivalence(...)

The new testing code should be

        assert data_equivalence(wrapper_vector_obs, vector_wrapper_obs)
        assert data_equivalence(wrapper_vector_reward, vector_wrapper_reward)
        assert data_equivalence(wrapper_vector_terminated, vector_wrapper_terminated)
        assert data_equivalence(wrapper_vector_truncated, vector_wrapper_truncated)

        if "episode" in wrapper_vector_info:
            assert "episode" in vector_wrapper_info

            wrapper_vector_time_taken = wrapper_vector_info["episode"].pop("t")
            vector_wrapper_time_taken = vector_wrapper_info["episode"].pop("t")

            assert wrapper_vector_time_taken.shape == vector_wrapper_time_taken.shape
            assert wrapper_vector_time_taken.dtype == vector_wrapper_time_taken.dtype

            vector_wrapper_info["episode"].pop("_l")
            vector_wrapper_info["episode"].pop("_r")
            vector_wrapper_info["episode"].pop("_t")

        assert data_equivalence(wrapper_vector_info, vector_wrapper_info)

This should now test it correctly.

I added max_episode_steps to the make_vec such that episode ends for this to actually be tested (update to main, this was previously broken)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that is why it did not fail when I moved the info dict comparison! Do you want me to fix the test, or are you going to do it?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the test to my suggested code above, I think it is simplier than your code and achieves the same thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree that your version is simpler. For some reason, I thought the test did not consider the episode lengths at all, but after looking at it more carefully, I now understand what it actually does. I just changed the code accordingly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it does that implicitly with the info check at the end

@pseudo-rnd-thoughts pseudo-rnd-thoughts changed the title Fixed wrappers.vector.RecordEpisodeStatistics episode length computation. Fixed wrappers.vector.RecordEpisodeStatistics episode length computation from new autoreset api Apr 16, 2024
TimSchneider42 added a commit to TimSchneider42/Gymnasium that referenced this pull request Apr 16, 2024
@TimSchneider42
Copy link
Contributor Author

I moved the info dict comparison back to its original place

TimSchneider42 added a commit to TimSchneider42/Gymnasium that referenced this pull request Apr 16, 2024
@@ -118,10 +120,13 @@ def step(
infos, dict
), f"`vector.RecordEpisodeStatistics` requires `info` type to be `dict`, its actual type is {type(infos)}. This may be due to usage of other wrappers in the wrong order."

self.episode_returns += rewards
self.episode_lengths += 1
self.episode_returns[self.prev_dones] = 0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative is self.episode_returns = np.where(self.prev_dones, self.episode_returns + rewards, 0)
However as this recreates the array each time I don't think this is a good idea

wrapper_vector_time = wrapper_vector_info["episode"].pop("t")
vector_wrapper_time = vector_wrapper_info["episode"].pop("t")
assert wrapper_vector_time.shape == vector_wrapper_time.shape
assert wrapper_vector_time.dtype == vector_wrapper_time.dtype

data_equivalence(wrapper_vector_info, vector_wrapper_info)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it does that implicitly with the info check at the end

@pseudo-rnd-thoughts pseudo-rnd-thoughts merged commit 6a8a267 into Farama-Foundation:main Apr 18, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug Report] wrappers.vector.RecordEpisodeStatistics computes episode length incorrectly
2 participants