Fixed `wrappers.vector.RecordEpisodeStatistics` episode length computation from new autoreset api #1018

TimSchneider42 · 2024-04-15T12:38:34Z

Description

Fixed wrappers.vector.RecordEpisodeStatistics episode length computation.

Fixes #1017

Type of change

Please delete options that are not relevant.

Documentation only change (no code changed)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

pseudo-rnd-thoughts · 2024-04-16T09:24:40Z

tests/wrappers/vector/test_record_episode_statistics.py

@@ -56,16 +60,41 @@ def test_record_episode_statistics(num_envs, env_id="CartPole-v1", num_steps=100
        data_equivalence(wrapper_vector_reward, vector_wrapper_reward)
        data_equivalence(wrapper_vector_terminated, vector_wrapper_terminated)
        data_equivalence(wrapper_vector_truncated, vector_wrapper_truncated)
+        data_equivalence(wrapper_vector_info, vector_wrapper_info)


I believe this isn't possible due to the episode time taken will vary between the two

This is the reason for the sequent code

A comment would be helpful

Strangely, this test did not fail for me. I will move it back to where it was originally. Not sure why I moved it in the first place.

pseudo-rnd-thoughts · 2024-04-16T09:27:17Z

tests/wrappers/vector/test_record_episode_statistics.py

            wrapper_vector_time = wrapper_vector_info["episode"].pop("t")
            vector_wrapper_time = vector_wrapper_info["episode"].pop("t")
            assert wrapper_vector_time.shape == vector_wrapper_time.shape
            assert wrapper_vector_time.dtype == vector_wrapper_time.dtype

-        data_equivalence(wrapper_vector_info, vector_wrapper_info)


Once we pop t, why can we not just do data_equivalence for the two infos?
Why did this test fail previously?

I messed up on the testing of this, it should be assert data_equivalence(...)

The new testing code should be

assert data_equivalence(wrapper_vector_obs, vector_wrapper_obs) assert data_equivalence(wrapper_vector_reward, vector_wrapper_reward) assert data_equivalence(wrapper_vector_terminated, vector_wrapper_terminated) assert data_equivalence(wrapper_vector_truncated, vector_wrapper_truncated) if "episode" in wrapper_vector_info: assert "episode" in vector_wrapper_info wrapper_vector_time_taken = wrapper_vector_info["episode"].pop("t") vector_wrapper_time_taken = vector_wrapper_info["episode"].pop("t") assert wrapper_vector_time_taken.shape == vector_wrapper_time_taken.shape assert wrapper_vector_time_taken.dtype == vector_wrapper_time_taken.dtype vector_wrapper_info["episode"].pop("_l") vector_wrapper_info["episode"].pop("_r") vector_wrapper_info["episode"].pop("_t") assert data_equivalence(wrapper_vector_info, vector_wrapper_info)

This should now test it correctly.

I added max_episode_steps to the make_vec such that episode ends for this to actually be tested (update to main, this was previously broken)

Oh, that is why it did not fail when I moved the info dict comparison! Do you want me to fix the test, or are you going to do it?

Could you update the test to my suggested code above, I think it is simplier than your code and achieves the same thing.

I totally agree that your version is simpler. For some reason, I thought the test did not consider the episode lengths at all, but after looking at it more carefully, I now understand what it actually does. I just changed the code accordingly.

Yeah, it does that implicitly with the info check at the end

…ion from new autoreset api Farama-Foundation#1018

TimSchneider42 · 2024-04-16T09:50:34Z

I moved the info dict comparison back to its original place

…ion from new autoreset api Farama-Foundation#1018

…ion from new autoreset api

pseudo-rnd-thoughts · 2024-04-18T15:46:23Z

gymnasium/wrappers/vector/common.py

@@ -118,10 +120,13 @@ def step(
            infos, dict
        ), f"`vector.RecordEpisodeStatistics` requires `info` type to be `dict`, its actual type is {type(infos)}. This may be due to usage of other wrappers in the wrong order."

-        self.episode_returns += rewards
-        self.episode_lengths += 1
+        self.episode_returns[self.prev_dones] = 0


An alternative is self.episode_returns = np.where(self.prev_dones, self.episode_returns + rewards, 0)
However as this recreates the array each time I don't think this is a good idea

pseudo-rnd-thoughts · 2024-04-18T15:47:47Z

tests/wrappers/vector/test_record_episode_statistics.py

            wrapper_vector_time = wrapper_vector_info["episode"].pop("t")
            vector_wrapper_time = vector_wrapper_info["episode"].pop("t")
            assert wrapper_vector_time.shape == vector_wrapper_time.shape
            assert wrapper_vector_time.dtype == vector_wrapper_time.dtype

-        data_equivalence(wrapper_vector_info, vector_wrapper_info)


Yeah, it does that implicitly with the info check at the end

pseudo-rnd-thoughts reviewed Apr 16, 2024

View reviewed changes

pseudo-rnd-thoughts changed the title ~~Fixed wrappers.vector.RecordEpisodeStatistics episode length computation.~~ Fixed wrappers.vector.RecordEpisodeStatistics episode length computation from new autoreset api Apr 16, 2024

TimSchneider42 added a commit to TimSchneider42/Gymnasium that referenced this pull request Apr 16, 2024

Fixed wrappers.vector.RecordEpisodeStatistics episode length computat…

760aabc

…ion from new autoreset api Farama-Foundation#1018

TimSchneider42 force-pushed the record_eps_stat_bug branch from 927e504 to 760aabc Compare April 16, 2024 09:49

TimSchneider42 added a commit to TimSchneider42/Gymnasium that referenced this pull request Apr 16, 2024

Fixed wrappers.vector.RecordEpisodeStatistics episode length computat…

0190ead

…ion from new autoreset api Farama-Foundation#1018

TimSchneider42 force-pushed the record_eps_stat_bug branch from 760aabc to 0190ead Compare April 16, 2024 11:24

Fixed wrappers.vector.RecordEpisodeStatistics episode length computat…

c0bc85c

…ion from new autoreset api

TimSchneider42 force-pushed the record_eps_stat_bug branch from 0190ead to c0bc85c Compare April 18, 2024 15:21

pseudo-rnd-thoughts approved these changes Apr 18, 2024

View reviewed changes

pseudo-rnd-thoughts merged commit 6a8a267 into Farama-Foundation:main Apr 18, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed `wrappers.vector.RecordEpisodeStatistics` episode length computation from new autoreset api #1018

Fixed `wrappers.vector.RecordEpisodeStatistics` episode length computation from new autoreset api #1018

TimSchneider42 commented Apr 15, 2024

pseudo-rnd-thoughts Apr 16, 2024

TimSchneider42 Apr 16, 2024

pseudo-rnd-thoughts Apr 16, 2024

pseudo-rnd-thoughts Apr 18, 2024 •

edited

Loading

TimSchneider42 Apr 18, 2024

pseudo-rnd-thoughts Apr 18, 2024

TimSchneider42 Apr 18, 2024

pseudo-rnd-thoughts Apr 18, 2024

TimSchneider42 commented Apr 16, 2024

pseudo-rnd-thoughts Apr 18, 2024

pseudo-rnd-thoughts Apr 18, 2024

Fixed wrappers.vector.RecordEpisodeStatistics episode length computation from new autoreset api #1018

Fixed wrappers.vector.RecordEpisodeStatistics episode length computation from new autoreset api #1018

Conversation

TimSchneider42 commented Apr 15, 2024

Description

Type of change

Checklist:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pseudo-rnd-thoughts Apr 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TimSchneider42 commented Apr 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fixed `wrappers.vector.RecordEpisodeStatistics` episode length computation from new autoreset api #1018

Fixed `wrappers.vector.RecordEpisodeStatistics` episode length computation from new autoreset api #1018

pseudo-rnd-thoughts Apr 18, 2024 •

edited

Loading