-
-
Notifications
You must be signed in to change notification settings - Fork 762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster EpisodeStatisticsRecorder, converted from gym PR #31
Faster EpisodeStatisticsRecorder, converted from gym PR #31
Conversation
Thanks for the PR and for moving it to Gymnasium. |
Hmm...I'll take a look at the fail later today and fix it, it was a quick paste with a couple import changes. |
Fixed, needed to store time as a vector (which seems a bit silly, it's the same time for all environments in a batch) and use float64 arrays for everything (to pass vector_list_info) |
Weird, I never touched that file. Blacked it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job! LGTM
"t": np.full( | ||
(self.num_envs,), round(time.perf_counter() - self.t0, 6) | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extremely minor, I think this could be the same as
np.full(self.num_envs, round(time.perf_counter() - self.t0, 6))
since self.num_envs is an integer it will be evaluated as the same shape without needing to wrap it in a tuple
@@ -96,8 +67,8 @@ def __init__(self, env: gym.Env, deque_size: int = 100): | |||
def reset(self, **kwargs): | |||
"""Resets the environment using kwargs and resets the episode returns and lengths.""" | |||
observations = super().reset(**kwargs) | |||
self.episode_returns = np.zeros(self.num_envs, dtype=np.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change? Pretty much all the floating point math in gymnasium works on float32, this change will either add an extra cast somewhere along the line, or require an explicit cast by user by breaking their code (numpy doesn't care, but pytorch does)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solely for passing the test for the VectorList wrapper, it checks each element with assert isinstance(..., float)
, which only passed for float64 (not float32 or int). I agree I'd prefer to use float32, wasn't sure whether I could change pre-existing tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you point me to the specific test? File/line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert isinstance(list_info[i]["episode"][stats], float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the case that list_info[i]["episode"][stats]
used to be a list of floats, and now is a numpy array? Or am I misunderstanding something here?
@pseudo-rnd-thoughts @gianlucadecola Just to be sure - did you go through the logic in both versions of the code to be sure that it's the same thing? It does seem cleaner which alone is sufficient reason for me to like this PR, but I'd need to sit down with it for a bit to make sure, and it might not be necessary. Also I see now that there's a message in the meantime about having to switch to float64 -- this is very weird and I'd be curious to see the reason for it, but I also hope that there's a different solution than using float64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RedTachyon I believe that the implementation is the same. Could you check my suggestion on modifying "t" to be the episode length not time since initialisation. I suspect that very few people already use it so I think we can change it.
@DavidSlayback Looking at the code, given we are cleaning up the code, I have spotted a couple of other places that could be improved.
@@ -96,8 +67,8 @@ def __init__(self, env: gym.Env, deque_size: int = 100): | |||
def reset(self, **kwargs): | |||
"""Resets the environment using kwargs and resets the episode returns and lengths.""" | |||
observations = super().reset(**kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be obs, info = super().reset(**kwargs)
"r": np.where(dones, self.episode_returns, 0), | ||
"l": np.where(dones, self.episode_lengths, 0), | ||
"t": np.full( | ||
(self.num_envs,), round(time.perf_counter() - self.t0, 6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this will matter much but should be use np.round
rather than the python round
function given the value gets passed directly to a numpy array.
Secondly, this "t" value doesn't seem much help as t0
is never updated to my eyes. So "t" seems to be the number of milli (or nano) seconds since the wrapper was initialised.
This would break backward compatibility but would we want to change "t" to be the number of milli (or nano) seconds that the episode has taken. This would require us to have a perf_counter
for every sub-environment but this shouldn't be too bad
return ( | ||
observations, | ||
rewards, | ||
terminateds if self.is_vector_env else terminateds[0], | ||
truncateds if self.is_vector_env else truncateds[0], | ||
terminateds, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update terminateds
to termnations
and trucnateds
to truncations
as this is better english
self.episode_count += 1 | ||
self.episode_returns[i] = 0 | ||
self.episode_lengths[i] = 0 | ||
dones = truncateds | terminateds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ease of reading, could you update to dones = np.logical_or(termination, truncation)
as |
is not clear what is means in this
} | ||
if self.is_vector_env: | ||
episode_info["_episode"] = np.where(dones, True, False) | ||
infos = {**infos, **episode_info} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we change this to
if "episode" or "_episode" in infos:
# raise warning
else:
infos["episode"] = {...}
if self.is_vector_env:
infos["_episodes"] = ...
This seems better than the old way as it is a single element we are adding to the dictionary
Made the requested changes.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be another Gym release with some minor bug fixes, hopefully, released in the next few days.
To keep Gymnasium equivalent to Gym until launch Im going to wait for merge this
Description
(Re-PR for gymnasium, this was reviewed and approved on the old gym repo PR #3101)
This addresses two issues I've run into with RecordEpisodeStatistics
Silent list conversion in vectorized environments. Currently, RecordEpisodeStatistics converts terminated and truncated into lists, but obs and reward may still be arrays, and the new vectorized
info
interface uses arrays. I particularly run into this in my code if I do something likedones.any()
. This PR leaves the underlying details of terminated/truncated alone by using NumPy array functionality.Speed. Instead of looping through a potentially large batch of environments, this PR uses NumPy functions with their underlying vectorization to achieve the same task.
I considered building this with Jumpy, but I'm not sure on its status in the project, and it would also eat the cost of creating new arrays wherever I'm doing in-place modifications.
I tested to make sure that this replicates the exact functionality of the original wrapper. To prove my speedup, I need to come up with some decent micro benchmarks; a couple attempts on Colab were fairly noisy.
Type of change
Checklist:
pre-commit
checks withpre-commit run --all-files
(seeCONTRIBUTING.md
instructions to set it up)