stable_baselines3.common.vec_env
Vectorized Environments are a method for stacking multiple independent environments into a single environment. Instead of training an RL agent on 1 environment per step, it allows us to train it on n
environments per step. Because of this, actions
passed to the environment are now a vector (of dimension n
). It is the same for observations
, rewards
and end of episode signals (dones
). In the case of non-array observation spaces such as Dict
or Tuple
, where different sub-spaces may have different shapes, the sub-observations are vectors (of dimension n
).
Name | Box |
Discrete |
Dict |
Tuple |
Multi Processing |
---|---|---|---|---|---|
DummyVecEnv | ✔️ |
|
|
|
|
SubprocVecEnv | ✔️ |
|
|
|
|
Note
Vectorized environments are required when using wrappers for frame-stacking or normalization.
Note
When using vectorized environments, the environments are automatically reset at the end of each episode. Thus, the observation returned for the i-th environment when done[i]
is true will in fact be the first observation of the next episode, not the last observation of the episode that has just terminated. You can access the "real" final observation of the terminated episode—that is, the one that accompanied the done
event provided by the underlying environment—using the terminal_observation
keys in the info dicts returned by the VecEnv
.
Warning
When defining a custom VecEnv
(for instance, using gym3 ProcgenEnv
), you should provide terminal_observation
keys in the info dicts returned by the VecEnv
(cf. note above).
Warning
When using SubprocVecEnv
, users must wrap the code in an if __name__ == "__main__":
if using the forkserver
or spawn
start method (default on Windows). On Linux, the default start method is fork
which is not thread safe and can create deadlocks.
For more information, see Python's multiprocessing guidelines.
If you want to alter or augment a VecEnv
without redefining it completely (e.g. stack multiple frames, monitor the VecEnv
, normalize the observation, ...), you can use VecEnvWrapper
for that. They are the vectorized equivalents (i.e., they act on multiple environments at the same time) of gym.Wrapper
.
You can find below an example for extracting one key from the observation:
import numpy as np
from stable_baselines3.common.vec_env.base_vec_env import VecEnv, VecEnvStepReturn, VecEnvWrapper
class VecExtractDictObs(VecEnvWrapper):
"""
A vectorized wrapper for filtering a specific key from dictionary observations.
Similar to Gym's FilterObservation wrapper:
https://github.com/openai/gym/blob/master/gym/wrappers/filter_observation.py
:param venv: The vectorized environment
:param key: The key of the dictionary observation
"""
def __init__(self, venv: VecEnv, key: str):
self.key = key
super().__init__(venv=venv, observation_space=venv.observation_space.spaces[self.key])
def reset(self) -> np.ndarray:
obs = self.venv.reset()
return obs[self.key]
def step_async(self, actions: np.ndarray) -> None:
self.venv.step_async(actions)
def step_wait(self) -> VecEnvStepReturn:
obs, reward, done, info = self.venv.step_wait()
return obs[self.key], reward, done, info
env = DummyVecEnv([lambda: gym.make("FetchReach-v1")])
# Wrap the VecEnv
env = VecExtractDictObs(env, key="observation")
VecEnv
DummyVecEnv
SubprocVecEnv
VecFrameStack
stable_baselines3.common.vec_env.stacked_observations.StackedObservations
stable_baselines3.common.vec_env.stacked_observations.StackedDictObservations
VecNormalize
VecVideoRecorder
VecCheckNan
VecTransposeImage
VecMonitor
VecExtractDictObs