Deploying to gh-pages from @ eec83c4 🚀

Farama-Foundation · May 21, 2024 · 79bb682 · 79bb682
1 parent 09ea5cd
commit 79bb682
Show file tree

Hide file tree

Showing 6 changed files with 95 additions and 69 deletions.
diff --git a/main/.buildinfo b/main/.buildinfo
@@ -1,4 +1,4 @@
 # Sphinx build info version 1
 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
-config: 706a91de4d1858040471873fbceaaf75
+config: 730e18b4156cd095352d5eed186741d8
 tags: d77d1c0d9ca2f4c8421862c7c5a0d620
diff --git a/main/_downloads/315c4c52fb68082a731b192d944e2ede/tutorials_python.zip b/main/_downloads/315c4c52fb68082a731b192d944e2ede/tutorials_python.zip
diff --git a/main/_downloads/a5659940aa3f8f568547d47752a43172/tutorials_jupyter.zip b/main/_downloads/a5659940aa3f8f568547d47752a43172/tutorials_jupyter.zip
diff --git a/main/_modules/gymnasium/vector/vector_env/index.html b/main/_modules/gymnasium/vector/vector_env/index.html
@@ -504,32 +504,45 @@ <h1>Source code for gymnasium.vector.vector_env</h1><div class="highlight"><pre>
 <span class="w">    </span><span class="sd">&quot;&quot;&quot;Base class for vectorized environments to run multiple independent copies of the same environment in parallel.</span>
 
 <span class="sd">    Vector environments can provide a linear speed-up in the steps taken per second through sampling multiple</span>
-<span class="sd">    sub-environments at the same time. To prevent terminated environments waiting until all sub-environments have</span>
-<span class="sd">    terminated or truncated, the vector environments automatically reset sub-environments after they terminate or truncated (within the same step call).</span>
-<span class="sd">    As a result, the step&#39;s observation and info are overwritten by the reset&#39;s observation and info.</span>
-<span class="sd">    To preserve this data, the observation and info for the final step of a sub-environment is stored in the info parameter,</span>
-<span class="sd">    using `&quot;final_observation&quot;` and `&quot;final_info&quot;` respectively. See :meth:`step` for more information.</span>
+<span class="sd">    sub-environments at the same time. Gymnasium contains two generalised Vector environments: :class:`AsyncVectorEnv`</span>
+<span class="sd">    and :class:`SyncVectorEnv` along with several custom vector environment implementations.</span>
+<span class="sd">    For :func:`reset` and :func:`step` batches `observations`, `rewards`,  `terminations`, `truncations` and</span>
+<span class="sd">    `info` for each sub-environment, see the example below. For the `rewards`, `terminations`, and `truncations`,</span>
+<span class="sd">    the data is packaged into a NumPy array of shape `(num_envs,)`. For `observations` (and `actions`, the batching</span>
+<span class="sd">    process is dependent on the type of observation (and action) space, and generally optimised for neural network</span>
+<span class="sd">    input/outputs. For `info`, the data is kept as a dictionary such that a key will give the data for all sub-environment.</span>
+
+<span class="sd">    For creating environments, :func:`make_vec` is a vector environment equivalent to :func:`make` for easily creating</span>
+<span class="sd">    vector environments that contains several unique arguments for modifying environment qualities, number of environment,</span>
+<span class="sd">    vectorizer type, vectorizer arguments.</span>
 
-<span class="sd">    The vector environments batches `observations`, `rewards`, `terminations`, `truncations` and `info` for each</span>
-<span class="sd">    sub-environment. In addition, :meth:`step` expects to receive a batch of actions for each parallel environment.</span>
-
-<span class="sd">    Gymnasium contains two generalised Vector environments: :class:`AsyncVectorEnv` and :class:`SyncVectorEnv` along with</span>
-<span class="sd">    several custom vector environment implementations.</span>
-
-<span class="sd">    The Vector Environments have the additional attributes for users to understand the implementation</span>
-
-<span class="sd">    - :attr:`num_envs` - The number of sub-environment in the vector environment</span>
-<span class="sd">    - :attr:`observation_space` - The batched observation space of the vector environment</span>
-<span class="sd">    - :attr:`single_observation_space` - The observation space of a single sub-environment</span>
-<span class="sd">    - :attr:`action_space` - The batched action space of the vector environment</span>
-<span class="sd">    - :attr:`single_action_space` - The action space of a single sub-environment</span>
+<span class="sd">    Note:</span>
+<span class="sd">        The info parameter of :meth:`reset` and :meth:`step` was originally implemented before v0.25 as a list</span>
+<span class="sd">        of dictionary for each sub-environment. However, this was modified in v0.25+ to be a dictionary with a NumPy</span>
+<span class="sd">        array for each key. To use the old info style, utilise the :class:`DictInfoToList` wrapper.</span>
 
 <span class="sd">    Examples:</span>
 <span class="sd">        &gt;&gt;&gt; import gymnasium as gym</span>
 <span class="sd">        &gt;&gt;&gt; envs = gym.make_vec(&quot;CartPole-v1&quot;, num_envs=3, vectorization_mode=&quot;sync&quot;, wrappers=(gym.wrappers.TimeAwareObservation,))</span>
 <span class="sd">        &gt;&gt;&gt; envs = gym.wrappers.vector.ClipReward(envs, min_reward=0.2, max_reward=0.8)</span>
 <span class="sd">        &gt;&gt;&gt; envs</span>
 <span class="sd">        &lt;ClipReward, SyncVectorEnv(CartPole-v1, num_envs=3)&gt;</span>
+<span class="sd">        &gt;&gt;&gt; envs.num_envs</span>
+<span class="sd">        3</span>
+<span class="sd">        &gt;&gt;&gt; envs.action_space</span>
+<span class="sd">        MultiDiscrete([2 2 2])</span>
+<span class="sd">        &gt;&gt;&gt; envs.observation_space</span>
+<span class="sd">        Box([[-4.80000019e+00 -3.40282347e+38 -4.18879032e-01 -3.40282347e+38</span>
+<span class="sd">           0.00000000e+00]</span>
+<span class="sd">         [-4.80000019e+00 -3.40282347e+38 -4.18879032e-01 -3.40282347e+38</span>
+<span class="sd">           0.00000000e+00]</span>
+<span class="sd">         [-4.80000019e+00 -3.40282347e+38 -4.18879032e-01 -3.40282347e+38</span>
+<span class="sd">           0.00000000e+00]], [[4.80000019e+00 3.40282347e+38 4.18879032e-01 3.40282347e+38</span>
+<span class="sd">          5.00000000e+02]</span>
+<span class="sd">         [4.80000019e+00 3.40282347e+38 4.18879032e-01 3.40282347e+38</span>
+<span class="sd">          5.00000000e+02]</span>
+<span class="sd">         [4.80000019e+00 3.40282347e+38 4.18879032e-01 3.40282347e+38</span>
+<span class="sd">          5.00000000e+02]], (3, 5), float64)</span>
 <span class="sd">        &gt;&gt;&gt; observations, infos = envs.reset(seed=123)</span>
 <span class="sd">        &gt;&gt;&gt; observations</span>
 <span class="sd">        array([[ 0.01823519, -0.0446179 , -0.02796401, -0.03156282,  0.        ],</span>
@@ -538,7 +551,8 @@ <h1>Source code for gymnasium.vector.vector_env</h1><div class="highlight"><pre>
 <span class="sd">        &gt;&gt;&gt; infos</span>
 <span class="sd">        {}</span>
 <span class="sd">        &gt;&gt;&gt; _ = envs.action_space.seed(123)</span>
-<span class="sd">        &gt;&gt;&gt; observations, rewards, terminations, truncations, infos = envs.step(envs.action_space.sample())</span>
+<span class="sd">        &gt;&gt;&gt; actions = envs.action_space.sample()</span>
+<span class="sd">        &gt;&gt;&gt; observations, rewards, terminations, truncations, infos = envs.step(actions)</span>
 <span class="sd">        &gt;&gt;&gt; observations</span>
 <span class="sd">        array([[ 0.01734283,  0.15089367, -0.02859527, -0.33293587,  1.        ],</span>
 <span class="sd">               [ 0.02909703, -0.16717631,  0.04740972,  0.3319138 ,  1.        ],</span>
@@ -553,17 +567,18 @@ <h1>Source code for gymnasium.vector.vector_env</h1><div class="highlight"><pre>
 <span class="sd">        {}</span>
 <span class="sd">        &gt;&gt;&gt; envs.close()</span>
 
-<span class="sd">    Note:</span>
-<span class="sd">        The info parameter of :meth:`reset` and :meth:`step` was originally implemented before v0.25 as a list</span>
-<span class="sd">        of dictionary for each sub-environment. However, this was modified in v0.25+ to be a</span>
-<span class="sd">        dictionary with a NumPy array for each key. To use the old info style, utilise the :class:`DictInfoToList` wrapper.</span>
+<span class="sd">    To avoid having to wait for all sub-environments to terminated before resetting, implementations will autoreset</span>
+<span class="sd">    sub-environments on episode end (`terminated or truncated is True`). As a result, when adding observations</span>
+<span class="sd">    to a replay buffer, this requires a knowning where the observation (and info) for each sub-environment are the first</span>
+<span class="sd">    observation from an autoreset. We recommend using an additional variable to store this information.</span>
 
-<span class="sd">    Note:</span>
-<span class="sd">        All parallel environments should share the identical observation and action spaces.</span>
-<span class="sd">        In other words, a vector of multiple different environments is not supported.</span>
+<span class="sd">    The Vector Environments have the additional attributes for users to understand the implementation</span>
 
-<span class="sd">    Note:</span>
-<span class="sd">        :func:`make_vec` is the equivalent function to :func:`make` for vector environments.</span>
+<span class="sd">    - :attr:`num_envs` - The number of sub-environment in the vector environment</span>
+<span class="sd">    - :attr:`observation_space` - The batched observation space of the vector environment</span>
+<span class="sd">    - :attr:`single_observation_space` - The observation space of a single sub-environment</span>
+<span class="sd">    - :attr:`action_space` - The batched action space of the vector environment</span>
+<span class="sd">    - :attr:`single_action_space` - The action space of a single sub-environment</span>
 <span class="sd">    &quot;&quot;&quot;</span>
 
     <span class="n">metadata</span><span class="p">:</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Any</span><span class="p">]</span> <span class="o">=</span> <span class="p">{}</span>
@@ -628,9 +643,8 @@ <h1>Source code for gymnasium.vector.vector_env</h1><div class="highlight"><pre>
 <span class="sd">            Batch of (observations, rewards, terminations, truncations, infos)</span>
 
 <span class="sd">        Note:</span>
-<span class="sd">            As the vector environments autoreset for a terminating and truncating sub-environments,</span>
-<span class="sd">            the returned observation and info is not the final step&#39;s observation or info which is instead stored in</span>
-<span class="sd">            info as `&quot;final_observation&quot;` and `&quot;final_info&quot;`.</span>
+<span class="sd">            As the vector environments autoreset for a terminating and truncating sub-environments, this will occur on</span>
+<span class="sd">            the next step after `terminated or truncated is True`.</span>
 
 <span class="sd">        Example:</span>
 <span class="sd">            &gt;&gt;&gt; import gymnasium as gym</span>