Faster EpisodeStatisticsRecorder, converted from gym PR #31

DavidSlayback · 2022-09-29T19:11:34Z

Description

(Re-PR for gymnasium, this was reviewed and approved on the old gym repo PR #3101)

This addresses two issues I've run into with RecordEpisodeStatistics

Silent list conversion in vectorized environments. Currently, RecordEpisodeStatistics converts terminated and truncated into lists, but obs and reward may still be arrays, and the new vectorized info interface uses arrays. I particularly run into this in my code if I do something like dones.any(). This PR leaves the underlying details of terminated/truncated alone by using NumPy array functionality.
Speed. Instead of looping through a potentially large batch of environments, this PR uses NumPy functions with their underlying vectorization to achieve the same task.

I considered building this with Jumpy, but I'm not sure on its status in the project, and it would also eat the cost of creating new arrays wherever I'm doing in-place modifications.

I tested to make sure that this replicates the exact functionality of the original wrapper. To prove my speedup, I need to come up with some decent micro benchmarks; a couple attempts on Colab were fairly noisy.

Type of change

Existing feature enhancement (non-breaking change which adds functionality)

Checklist:

I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
[N/A] I have commented my code, particularly in hard-to-understand areas
[N/A] I have made corresponding changes to the documentation
My changes generate no new warnings
[N/A] I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

pseudo-rnd-thoughts · 2022-09-30T08:23:27Z

Thanks for the PR and for moving it to Gymnasium.

DavidSlayback · 2022-09-30T09:19:10Z

Hmm...I'll take a look at the fail later today and fix it, it was a quick paste with a couple import changes.

DavidSlayback · 2022-09-30T13:45:57Z

Fixed, needed to store time as a vector (which seems a bit silly, it's the same time for all environments in a batch) and use float64 arrays for everything (to pass vector_list_info)

DavidSlayback · 2022-09-30T14:14:04Z

Weird, I never touched that file. Blacked it

gianlucadecola

Nice job! LGTM

gianlucadecola · 2022-09-30T22:28:55Z

gymnasium/wrappers/record_episode_statistics.py

+                    "t": np.full(
+                        (self.num_envs,), round(time.perf_counter() - self.t0, 6)
+                    ),


Extremely minor, I think this could be the same as

np.full(self.num_envs, round(time.perf_counter() - self.t0, 6))

since self.num_envs is an integer it will be evaluated as the same shape without needing to wrap it in a tuple

RedTachyon · 2022-10-02T23:56:41Z

gymnasium/wrappers/record_episode_statistics.py

@@ -96,8 +67,8 @@ def __init__(self, env: gym.Env, deque_size: int = 100):
    def reset(self, **kwargs):
        """Resets the environment using kwargs and resets the episode returns and lengths."""
        observations = super().reset(**kwargs)
-        self.episode_returns = np.zeros(self.num_envs, dtype=np.float32)


Why this change? Pretty much all the floating point math in gymnasium works on float32, this change will either add an extra cast somewhere along the line, or require an explicit cast by user by breaking their code (numpy doesn't care, but pytorch does)

Solely for passing the test for the VectorList wrapper, it checks each element with assert isinstance(..., float), which only passed for float64 (not float32 or int). I agree I'd prefer to use float32, wasn't sure whether I could change pre-existing tests

Can you point me to the specific test? File/line?

Gymnasium/tests/wrappers/test_vector_list_info.py

Line 56 in 9154d2d

assert isinstance(list_info[i]["episode"][stats], float)

Is it the case that list_info[i]["episode"][stats] used to be a list of floats, and now is a numpy array? Or am I misunderstanding something here?

RedTachyon · 2022-10-03T00:01:41Z

@pseudo-rnd-thoughts @gianlucadecola Just to be sure - did you go through the logic in both versions of the code to be sure that it's the same thing? It does seem cleaner which alone is sufficient reason for me to like this PR, but I'd need to sit down with it for a bit to make sure, and it might not be necessary.

Also I see now that there's a message in the meantime about having to switch to float64 -- this is very weird and I'd be curious to see the reason for it, but I also hope that there's a different solution than using float64

pseudo-rnd-thoughts

@RedTachyon I believe that the implementation is the same. Could you check my suggestion on modifying "t" to be the episode length not time since initialisation. I suspect that very few people already use it so I think we can change it.

@DavidSlayback Looking at the code, given we are cleaning up the code, I have spotted a couple of other places that could be improved.

pseudo-rnd-thoughts · 2022-10-03T09:00:29Z

gymnasium/wrappers/record_episode_statistics.py

@@ -96,8 +67,8 @@ def __init__(self, env: gym.Env, deque_size: int = 100):
    def reset(self, **kwargs):
        """Resets the environment using kwargs and resets the episode returns and lengths."""
        observations = super().reset(**kwargs)


This should be obs, info = super().reset(**kwargs)

pseudo-rnd-thoughts · 2022-10-03T10:27:18Z

gymnasium/wrappers/record_episode_statistics.py

+                    "r": np.where(dones, self.episode_returns, 0),
+                    "l": np.where(dones, self.episode_lengths, 0),
+                    "t": np.full(
+                        (self.num_envs,), round(time.perf_counter() - self.t0, 6)


I'm not sure this will matter much but should be use np.round rather than the python round function given the value gets passed directly to a numpy array.

Secondly, this "t" value doesn't seem much help as t0 is never updated to my eyes. So "t" seems to be the number of milli (or nano) seconds since the wrapper was initialised.

This would break backward compatibility but would we want to change "t" to be the number of milli (or nano) seconds that the episode has taken. This would require us to have a perf_counter for every sub-environment but this shouldn't be too bad

pseudo-rnd-thoughts · 2022-10-03T10:28:16Z

gymnasium/wrappers/record_episode_statistics.py

        return (
            observations,
            rewards,
-            terminateds if self.is_vector_env else terminateds[0],
-            truncateds if self.is_vector_env else truncateds[0],
+            terminateds,


Could you update terminateds to termnations and trucnateds to truncations as this is better english

pseudo-rnd-thoughts · 2022-10-03T10:29:39Z

gymnasium/wrappers/record_episode_statistics.py

-                self.episode_count += 1
-                self.episode_returns[i] = 0
-                self.episode_lengths[i] = 0
+        dones = truncateds | terminateds


For ease of reading, could you update to dones = np.logical_or(termination, truncation) as | is not clear what is means in this

pseudo-rnd-thoughts · 2022-10-03T10:33:11Z

gymnasium/wrappers/record_episode_statistics.py

+            }
+            if self.is_vector_env:
+                episode_info["_episode"] = np.where(dones, True, False)
+            infos = {**infos, **episode_info}


Could we change this to

if "episode" or "_episode" in infos: # raise warning else: infos["episode"] = {...} if self.is_vector_env: infos["_episodes"] = ...

This seems better than the old way as it is a single element we are adding to the dictionary

DavidSlayback · 2022-10-03T14:01:26Z

Made the requested changes.

I changed the storage arrays to float32 and int32. I had to adjust the test for vector_list_info to use np.isscalar. Basically, python isn't a big fan of float32. Getting a single value from a float64 array returns a python float, but from a float32 array, you get a numpy scalar.
t0 is renamed to episode_start_times and tracks the start time for each episode. Pyright didn't like it being an Optional[np.ndarray], hence the hack in the __init__. Updated wrapper documentation to reflect the change
Changed some of the wording for readability

pseudo-rnd-thoughts

There will be another Gym release with some minor bug fixes, hopefully, released in the next few days.
To keep Gymnasium equivalent to Gym until launch Im going to wait for merge this

Faster EpisodeStatisticsRecorder, converted from gym PR

0b6e0a3

DavidSlayback added 3 commits September 30, 2022 09:40

Save time as a vector, use floats

1f648c9

Fixed tests, needed float64

3a24031

Black formatting

e0e2381

Update cliffwalking.py

8e7734a

gianlucadecola approved these changes Sep 30, 2022

View reviewed changes

RedTachyon reviewed Oct 2, 2022

View reviewed changes

pseudo-rnd-thoughts requested changes Oct 3, 2022

View reviewed changes

Made requested changes

6967f49

pseudo-rnd-thoughts approved these changes Oct 4, 2022

View reviewed changes

pseudo-rnd-thoughts merged commit 8c75993 into Farama-Foundation:main Oct 10, 2022

DavidSlayback deleted the feature/faster_episode_recorder branch October 25, 2022 14:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster EpisodeStatisticsRecorder, converted from gym PR #31

Faster EpisodeStatisticsRecorder, converted from gym PR #31

DavidSlayback commented Sep 29, 2022 •

edited

Loading

pseudo-rnd-thoughts commented Sep 30, 2022

DavidSlayback commented Sep 30, 2022

DavidSlayback commented Sep 30, 2022

DavidSlayback commented Sep 30, 2022

gianlucadecola left a comment

gianlucadecola Sep 30, 2022

RedTachyon Oct 2, 2022

DavidSlayback Oct 3, 2022

RedTachyon Oct 3, 2022

DavidSlayback Oct 3, 2022

RedTachyon Oct 3, 2022

RedTachyon commented Oct 3, 2022

pseudo-rnd-thoughts left a comment

pseudo-rnd-thoughts Oct 3, 2022

pseudo-rnd-thoughts Oct 3, 2022

pseudo-rnd-thoughts Oct 3, 2022

pseudo-rnd-thoughts Oct 3, 2022

pseudo-rnd-thoughts Oct 3, 2022

DavidSlayback commented Oct 3, 2022

pseudo-rnd-thoughts left a comment

Faster EpisodeStatisticsRecorder, converted from gym PR #31

Faster EpisodeStatisticsRecorder, converted from gym PR #31

Conversation

DavidSlayback commented Sep 29, 2022 • edited Loading

Description

Type of change

Checklist:

pseudo-rnd-thoughts commented Sep 30, 2022

DavidSlayback commented Sep 30, 2022

DavidSlayback commented Sep 30, 2022

DavidSlayback commented Sep 30, 2022

gianlucadecola left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RedTachyon commented Oct 3, 2022

pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DavidSlayback commented Oct 3, 2022

pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

DavidSlayback commented Sep 29, 2022 •

edited

Loading