Update locals in callbacks (#787)

* Added update_locals function for callbacks * corrected update locals in td3/sac * Callbacks are updated before trajectory end in PG * variable availability dqn * dqn doc * title underline * updated ddpg callback info * removed a dot from ddpg * make it compile * ddpg compile * correct return * td3 * clarity * sac callback variables * completed callbacks * fix compilation issues * changelog * Update stable_baselines/common/callbacks.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Update stable_baselines/common/callbacks.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Update stable_baselines/common/callbacks.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Update stable_baselines/common/callbacks.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * fixed sphinx docstring compilation Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Stable-Baselines-Team · Jun 30, 2020 · b3b217c · b3b217c
1 parent d288b6d
commit b3b217c
Show file tree

Hide file tree

Showing 21 changed files with 560 additions and 12 deletions.
diff --git a/docs/misc/changelog.rst b/docs/misc/changelog.rst
@@ -16,25 +16,27 @@ Breaking Changes:
 New Features:
 ^^^^^^^^^^^^^
 - Added momentum parameter to A2C for the embedded RMSPropOptimizer (@kantneel)
-- ActionNoise is now an abstract base class and implements ``__call__``, ``NormalActionNoise`` and ``OrnsteinUhlenbeckActionNoise`` have return types (@solliet)
+- ActionNoise is now an abstract base class and implements ``__call__``, ``NormalActionNoise`` and ``OrnsteinUhlenbeckActionNoise`` have return types (@PartiallyTyped)
 - HER now passes info dictionary to compute_reward, allowing for the computation of rewards that are independent of the goal (@tirafesi)
 
 Bug Fixes:
 ^^^^^^^^^^
 - Fixed DDPG sampling empty replay buffer when combined with HER  (@tirafesi)
 - Fixed a bug in ``HindsightExperienceReplayWrapper``, where the openai-gym signature for ``compute_reward`` was not matched correctly (@johannes-dornheim)
-- Fixed SAC/TD3 checking time to update on learn steps instead of total steps (@solliet)
-- Added ``**kwarg`` pass through for ``reset`` method in ``atari_wrappers.FrameStack`` (@solliet)
-- Fix consistency in ``setup_model()`` for SAC, ``target_entropy`` now uses ``self.action_space`` instead of ``self.env.action_space`` (@solliet)
+- Fixed SAC/TD3 checking time to update on learn steps instead of total steps (@PartiallyTyped)
+- Added ``**kwarg`` pass through for ``reset`` method in ``atari_wrappers.FrameStack`` (@PartiallyTyped)
+- Fix consistency in ``setup_model()`` for SAC, ``target_entropy`` now uses ``self.action_space`` instead of ``self.env.action_space`` (@PartiallyTyped)
 - Fix reward threshold in ``test_identity.py``
 - Partially fix tensorboard indexing for PPO2 (@enderdead)
 - Fixed potential bug in ``DummyVecEnv`` where ``copy()`` was used instead of ``deepcopy()``
 - Fixed a bug in ``GAIL`` where the dataloader was not available after saving, causing an error when using ``CheckpointCallback``
 - Fixed a bug in ``SAC`` where any convolutional layers were not included in the target network parameters.
 - Fixed ``render()`` method for ``VecEnvs``
 - Fixed ``seed()``` method for ``SubprocVecEnv``
+- Fixed a bug ``callback.locals`` did not have the correct values (@PartiallyTyped)
 - Fixed a bug in the ``close()`` method of ``SubprocVecEnv``, causing wrappers further down in the wrapper stack to not be closed. (@NeoExtended)
 
+
 Deprecations:
 ^^^^^^^^^^^^^
 
@@ -56,7 +58,7 @@ Documentation:
 - Added imitation baselines project
 - Updated install instructions
 - Added Slime Volleyball project (@hardmaru)
-
+- Added a table of the variables accessible from the ``on_step`` function of the callbacks for each algorithm (@PartiallyTyped)
 
 Release 2.10.0 (2020-03-11)
 ---------------------------
@@ -101,7 +103,7 @@ New Features:
 - Added ``unwrap_vec_normalize`` and ``sync_envs_normalization`` in the ``vec_env`` module
   to synchronize two VecNormalize environment
 - Added a seeding method for vectorized environments. (@NeoExtended)
-- Added extend method to store batches of experience in ReplayBuffer. (@solliet)
+- Added extend method to store batches of experience in ReplayBuffer. (@PartiallyTyped)
 
 
 Bug Fixes:
@@ -711,5 +713,5 @@ Thanks to @bjmuld @iambenzo @iandanforth @r7vme @brendenpetersen @huvar @abhiskk
 @XMaster96 @kantneel @Pastafarianist @GerardMaggiolino @PatrickWalter214 @yutingsz @sc420 @Aaahh @billtubbs
 @Miffyli @dwiel @miguelrass @qxcv @jaberkow @eavelardev @ruifeng96150 @pedrohbtp @srivatsankrishnan @evilsocket
 @MarvineGothic @jdossgollin @SyllogismRXS @rusu24edward @jbulow @Antymon @seheevic @justinkterry @edbeeching
-@flodorner @KuKuXia @NeoExtended @solliet @mmcenta @richardwu @tirafesi @caburu @johannes-dornheim @kvenkman @aakash94
+@flodorner @KuKuXia @NeoExtended @PartiallyTyped @mmcenta @richardwu @tirafesi @caburu @johannes-dornheim @kvenkman @aakash94
 @enderdead @hardmaru
diff --git a/docs/modules/a2c.rst b/docs/modules/a2c.rst
@@ -75,3 +75,53 @@ Parameters
 .. autoclass:: A2C
   :members:
   :inherited-members:
+
+
+Callbacks - Accessible Variables
+--------------------------------
+
+Depending on initialization parameters and timestep, different variables are accessible.
+Variables accessible "From timestep X" are variables that can be accessed when
+``self.timestep==X`` in the ``on_step`` function.
+
+      +--------------------------------+-----------------------------------------------------+
+      |Variable                        |                                         Availability|
+      +================================+=====================================================+
+      |- self                          |From timestep 1                                      |
+      |- total_timesteps               |                                                     |
+      |- callback                      |                                                     |
+      |- log_interval                  |                                                     |
+      |- tb_log_name                   |                                                     |
+      |- reset_num_timesteps           |                                                     |
+      |- new_tb_log                    |                                                     |
+      |- writer                        |                                                     |
+      |- t_start                       |                                                     |
+      |- mb_obs                        |                                                     |
+      |- mb_rewards                    |                                                     |
+      |- mb_actions                    |                                                     |
+      |- mb_values                     |                                                     |
+      |- mb_dones                      |                                                     |
+      |- mb_states                     |                                                     |
+      |- ep_infos                      |                                                     |
+      |- actions                       |                                                     |
+      |- values                        |                                                     |
+      |- states                        |                                                     |
+      |- clipped_actions               |                                                     |
+      |- obs                           |                                                     |
+      |- rewards                       |                                                     |
+      |- dones                         |                                                     |
+      |- infos                         |                                                     |
+      +--------------------------------+-----------------------------------------------------+
+      |- info                          |From timestep 2                                      |
+      |- maybe_ep_info                 |                                                     |
+      +--------------------------------+-----------------------------------------------------+
+      |- update                        |From timestep ``n_step+1``                           |
+      |- rollout                       |                                                     |
+      |- masks                         |                                                     |
+      |- true_reward                   |                                                     |
+      +--------------------------------+-----------------------------------------------------+
+      |- value_loss                    |From timestep ``2 * n_step+1``                       |
+      |- policy_entropy                |                                                     |
+      |- n_seconds                     |                                                     |
+      |- fps                           |                                                     |
+      +--------------------------------+-----------------------------------------------------+
diff --git a/docs/modules/acer.rst b/docs/modules/acer.rst
@@ -70,3 +70,49 @@ Parameters
 .. autoclass:: ACER
   :members:
   :inherited-members:
+
+
+Callbacks - Accessible Variables 
+--------------------------------
+
+Depending on initialization parameters and timestep, different variables are accessible.
+Variables accessible from "timestep X" are variables that can be accessed when
+``self.timestep==X`` from the ``on_step`` function.
+
+    +--------------------------------+-----------------------------------------------------+
+    |Variable                        |                                         Availability|
+    +================================+=====================================================+
+    |- self                          | From timestep 1                                     |
+    |- total_timesteps               |                                                     |
+    |- callback                      |                                                     |
+    |- log_interval                  |                                                     |
+    |- tb_log_name                   |                                                     |
+    |- reset_num_timesteps           |                                                     |
+    |- new_tb_log                    |                                                     |
+    |- writer                        |                                                     |
+    |- episode_stats                 |                                                     |
+    |- buffer                        |                                                     |
+    |- t_start                       |                                                     |
+    |- enc_obs                       |                                                     |
+    |- mb_obs                        |                                                     |
+    |- mb_actions                    |                                                     |
+    |- mb_mus                        |                                                     |
+    |- mb_dones                      |                                                     |
+    |- mb_rewards                    |                                                     |
+    |- actions                       |                                                     |
+    |- states                        |                                                     |
+    |- mus                           |                                                     |
+    |- clipped_actions               |                                                     |
+    |- obs                           |                                                     |
+    |- rewards                       |                                                     |
+    |- dones                         |                                                     |
+    +--------------------------------+-----------------------------------------------------+
+    |- steps                         | From timestep ``n_step+1``                          |
+    |- masks                         |                                                     |
+    +--------------------------------+-----------------------------------------------------+
+    |- names_ops                     | From timestep ``2 * n_step+1``                      |
+    |- values_ops                    |                                                     |
+    +--------------------------------+-----------------------------------------------------+
+    |- samples_number                | After replay_start steps,  when replay_ratio > 0 and|
+    |                                | buffer is not None                                  |
+    +--------------------------------+-----------------------------------------------------+
diff --git a/docs/modules/acktr.rst b/docs/modules/acktr.rst
@@ -71,3 +71,64 @@ Parameters
 .. autoclass:: ACKTR
   :members:
   :inherited-members:
+
+
+
+
+Callbacks - Accessible Variables 
+--------------------------------
+
+Depending on initialization parameters and timestep, different variables are accessible.
+Variables accessible from "timestep X" are variables that can be accessed when
+``self.timestep==X`` from the ``on_step`` function.
+
+    +--------------------------------+-----------------------------------------------------+
+    |Variable                        |                                         Availability|
+    +================================+=====================================================+
+    |- self                          |From timestep 1                                      |
+    |- total_timesteps               |                                                     |
+    |- callback                      |                                                     |
+    |- log_interval                  |                                                     |
+    |- tb_log_name                   |                                                     |
+    |- reset_num_timesteps           |                                                     |
+    |- new_tb_log                    |                                                     |
+    |- writer                        |                                                     |
+    |- tf_vars                       |                                                     |
+    |- is_uninitialized              |                                                     |
+    |- new_uninitialized_vars        |                                                     |
+    |- t_start                       |                                                     |
+    |- coord                         |                                                     |
+    |- enqueue_threads               |                                                     |
+    |- old_uninitialized_vars        |                                                     |
+    |- mb_obs                        |                                                     |
+    |- mb_rewards                    |                                                     |
+    |- mb_actions                    |                                                     |
+    |- mb_values                     |                                                     |
+    |- mb_dones                      |                                                     |
+    |- mb_states                     |                                                     |
+    |- ep_infos                      |                                                     |
+    |- _                             |                                                     |
+    |- actions                       |                                                     |
+    |- values                        |                                                     |
+    |- states                        |                                                     |
+    |- clipped_actions               |                                                     |
+    |- obs                           |                                                     |
+    |- rewards                       |                                                     |
+    |- dones                         |                                                     |
+    |- infos                         |                                                     |
+    +--------------------------------+-----------------------------------------------------+
+    |- info                          |From timestep 2                                      |
+    |- maybe_ep_info                 |                                                     |
+    +--------------------------------+-----------------------------------------------------+
+    |- update                        |From timestep ``n_steps+1``                          |
+    |- rollout                       |                                                     |
+    |- returns                       |                                                     |
+    |- masks                         |                                                     |
+    |- true_reward                   |                                                     |
+    +--------------------------------+-----------------------------------------------------+
+    |- policy_loss                   |From timestep ``2*n_steps+1``                        |
+    |- value_loss                    |                                                     |
+    |- policy_entropy                |                                                     |
+    |- n_seconds                     |                                                     |
+    |- fps                           |                                                     |
+    +--------------------------------+-----------------------------------------------------+