Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster EpisodeStatisticsRecorder, converted from gym PR #31

Conversation

DavidSlayback
Copy link
Contributor

@DavidSlayback DavidSlayback commented Sep 29, 2022

Description

(Re-PR for gymnasium, this was reviewed and approved on the old gym repo PR #3101)

This addresses two issues I've run into with RecordEpisodeStatistics

  1. Silent list conversion in vectorized environments. Currently, RecordEpisodeStatistics converts terminated and truncated into lists, but obs and reward may still be arrays, and the new vectorized info interface uses arrays. I particularly run into this in my code if I do something like dones.any(). This PR leaves the underlying details of terminated/truncated alone by using NumPy array functionality.

  2. Speed. Instead of looping through a potentially large batch of environments, this PR uses NumPy functions with their underlying vectorization to achieve the same task.

I considered building this with Jumpy, but I'm not sure on its status in the project, and it would also eat the cost of creating new arrays wherever I'm doing in-place modifications.

I tested to make sure that this replicates the exact functionality of the original wrapper. To prove my speedup, I need to come up with some decent micro benchmarks; a couple attempts on Colab were fairly noisy.

Type of change

  • Existing feature enhancement (non-breaking change which adds functionality)

Checklist:

  • I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • [N/A] I have commented my code, particularly in hard-to-understand areas
  • [N/A] I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • [N/A] I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@pseudo-rnd-thoughts
Copy link
Member

Thanks for the PR and for moving it to Gymnasium.

@DavidSlayback
Copy link
Contributor Author

Hmm...I'll take a look at the fail later today and fix it, it was a quick paste with a couple import changes.

@DavidSlayback
Copy link
Contributor Author

Fixed, needed to store time as a vector (which seems a bit silly, it's the same time for all environments in a batch) and use float64 arrays for everything (to pass vector_list_info)

@DavidSlayback
Copy link
Contributor Author

Weird, I never touched that file. Blacked it

Copy link
Contributor

@gianlucadecola gianlucadecola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! LGTM

Comment on lines 95 to 97
"t": np.full(
(self.num_envs,), round(time.perf_counter() - self.t0, 6)
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extremely minor, I think this could be the same as

np.full(self.num_envs, round(time.perf_counter() - self.t0, 6))

since self.num_envs is an integer it will be evaluated as the same shape without needing to wrap it in a tuple

@@ -96,8 +67,8 @@ def __init__(self, env: gym.Env, deque_size: int = 100):
def reset(self, **kwargs):
"""Resets the environment using kwargs and resets the episode returns and lengths."""
observations = super().reset(**kwargs)
self.episode_returns = np.zeros(self.num_envs, dtype=np.float32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? Pretty much all the floating point math in gymnasium works on float32, this change will either add an extra cast somewhere along the line, or require an explicit cast by user by breaking their code (numpy doesn't care, but pytorch does)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solely for passing the test for the VectorList wrapper, it checks each element with assert isinstance(..., float), which only passed for float64 (not float32 or int). I agree I'd prefer to use float32, wasn't sure whether I could change pre-existing tests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point me to the specific test? File/line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert isinstance(list_info[i]["episode"][stats], float)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the case that list_info[i]["episode"][stats] used to be a list of floats, and now is a numpy array? Or am I misunderstanding something here?

@RedTachyon
Copy link
Member

@pseudo-rnd-thoughts @gianlucadecola Just to be sure - did you go through the logic in both versions of the code to be sure that it's the same thing? It does seem cleaner which alone is sufficient reason for me to like this PR, but I'd need to sit down with it for a bit to make sure, and it might not be necessary.

Also I see now that there's a message in the meantime about having to switch to float64 -- this is very weird and I'd be curious to see the reason for it, but I also hope that there's a different solution than using float64

Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RedTachyon I believe that the implementation is the same. Could you check my suggestion on modifying "t" to be the episode length not time since initialisation. I suspect that very few people already use it so I think we can change it.

@DavidSlayback Looking at the code, given we are cleaning up the code, I have spotted a couple of other places that could be improved.

@@ -96,8 +67,8 @@ def __init__(self, env: gym.Env, deque_size: int = 100):
def reset(self, **kwargs):
"""Resets the environment using kwargs and resets the episode returns and lengths."""
observations = super().reset(**kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be obs, info = super().reset(**kwargs)

"r": np.where(dones, self.episode_returns, 0),
"l": np.where(dones, self.episode_lengths, 0),
"t": np.full(
(self.num_envs,), round(time.perf_counter() - self.t0, 6)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this will matter much but should be use np.round rather than the python round function given the value gets passed directly to a numpy array.

Secondly, this "t" value doesn't seem much help as t0 is never updated to my eyes. So "t" seems to be the number of milli (or nano) seconds since the wrapper was initialised.

This would break backward compatibility but would we want to change "t" to be the number of milli (or nano) seconds that the episode has taken. This would require us to have a perf_counter for every sub-environment but this shouldn't be too bad

return (
observations,
rewards,
terminateds if self.is_vector_env else terminateds[0],
truncateds if self.is_vector_env else truncateds[0],
terminateds,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update terminateds to termnations and trucnateds to truncations as this is better english

self.episode_count += 1
self.episode_returns[i] = 0
self.episode_lengths[i] = 0
dones = truncateds | terminateds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ease of reading, could you update to dones = np.logical_or(termination, truncation) as | is not clear what is means in this

}
if self.is_vector_env:
episode_info["_episode"] = np.where(dones, True, False)
infos = {**infos, **episode_info}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we change this to

if "episode" or "_episode" in infos:
    # raise warning
else:
    infos["episode"] = {...}
    if self.is_vector_env:
        infos["_episodes"] = ...

This seems better than the old way as it is a single element we are adding to the dictionary

@DavidSlayback
Copy link
Contributor Author

Made the requested changes.

  1. I changed the storage arrays to float32 and int32. I had to adjust the test for vector_list_info to use np.isscalar. Basically, python isn't a big fan of float32. Getting a single value from a float64 array returns a python float, but from a float32 array, you get a numpy scalar.

  2. t0 is renamed to episode_start_times and tracks the start time for each episode. Pyright didn't like it being an Optional[np.ndarray], hence the hack in the __init__. Updated wrapper documentation to reflect the change

  3. Changed some of the wording for readability

Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be another Gym release with some minor bug fixes, hopefully, released in the next few days.
To keep Gymnasium equivalent to Gym until launch Im going to wait for merge this

@pseudo-rnd-thoughts pseudo-rnd-thoughts merged commit 8c75993 into Farama-Foundation:main Oct 10, 2022
@DavidSlayback DavidSlayback deleted the feature/faster_episode_recorder branch October 25, 2022 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants