Skip to content

Commit

Permalink
Update locals in callbacks (#787)
Browse files Browse the repository at this point in the history
* Added update_locals function for callbacks

* corrected update locals in td3/sac

* Callbacks are updated before trajectory end in PG

* variable availability dqn

* dqn doc

* title underline

* updated ddpg callback info

* removed a dot from ddpg

* make it compile

* ddpg compile

* correct return

* td3

* clarity

* sac callback variables

* completed callbacks

* fix compilation issues

* changelog

* Update stable_baselines/common/callbacks.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Update stable_baselines/common/callbacks.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Update stable_baselines/common/callbacks.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Update stable_baselines/common/callbacks.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* fixed sphinx docstring compilation

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
  • Loading branch information
m-rph and araffin committed Jun 30, 2020
1 parent d288b6d commit b3b217c
Show file tree
Hide file tree
Showing 21 changed files with 560 additions and 12 deletions.
16 changes: 9 additions & 7 deletions docs/misc/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,25 +16,27 @@ Breaking Changes:
New Features:
^^^^^^^^^^^^^
- Added momentum parameter to A2C for the embedded RMSPropOptimizer (@kantneel)
- ActionNoise is now an abstract base class and implements ``__call__``, ``NormalActionNoise`` and ``OrnsteinUhlenbeckActionNoise`` have return types (@solliet)
- ActionNoise is now an abstract base class and implements ``__call__``, ``NormalActionNoise`` and ``OrnsteinUhlenbeckActionNoise`` have return types (@PartiallyTyped)
- HER now passes info dictionary to compute_reward, allowing for the computation of rewards that are independent of the goal (@tirafesi)

Bug Fixes:
^^^^^^^^^^
- Fixed DDPG sampling empty replay buffer when combined with HER (@tirafesi)
- Fixed a bug in ``HindsightExperienceReplayWrapper``, where the openai-gym signature for ``compute_reward`` was not matched correctly (@johannes-dornheim)
- Fixed SAC/TD3 checking time to update on learn steps instead of total steps (@solliet)
- Added ``**kwarg`` pass through for ``reset`` method in ``atari_wrappers.FrameStack`` (@solliet)
- Fix consistency in ``setup_model()`` for SAC, ``target_entropy`` now uses ``self.action_space`` instead of ``self.env.action_space`` (@solliet)
- Fixed SAC/TD3 checking time to update on learn steps instead of total steps (@PartiallyTyped)
- Added ``**kwarg`` pass through for ``reset`` method in ``atari_wrappers.FrameStack`` (@PartiallyTyped)
- Fix consistency in ``setup_model()`` for SAC, ``target_entropy`` now uses ``self.action_space`` instead of ``self.env.action_space`` (@PartiallyTyped)
- Fix reward threshold in ``test_identity.py``
- Partially fix tensorboard indexing for PPO2 (@enderdead)
- Fixed potential bug in ``DummyVecEnv`` where ``copy()`` was used instead of ``deepcopy()``
- Fixed a bug in ``GAIL`` where the dataloader was not available after saving, causing an error when using ``CheckpointCallback``
- Fixed a bug in ``SAC`` where any convolutional layers were not included in the target network parameters.
- Fixed ``render()`` method for ``VecEnvs``
- Fixed ``seed()``` method for ``SubprocVecEnv``
- Fixed a bug ``callback.locals`` did not have the correct values (@PartiallyTyped)
- Fixed a bug in the ``close()`` method of ``SubprocVecEnv``, causing wrappers further down in the wrapper stack to not be closed. (@NeoExtended)


Deprecations:
^^^^^^^^^^^^^

Expand All @@ -56,7 +58,7 @@ Documentation:
- Added imitation baselines project
- Updated install instructions
- Added Slime Volleyball project (@hardmaru)

- Added a table of the variables accessible from the ``on_step`` function of the callbacks for each algorithm (@PartiallyTyped)

Release 2.10.0 (2020-03-11)
---------------------------
Expand Down Expand Up @@ -101,7 +103,7 @@ New Features:
- Added ``unwrap_vec_normalize`` and ``sync_envs_normalization`` in the ``vec_env`` module
to synchronize two VecNormalize environment
- Added a seeding method for vectorized environments. (@NeoExtended)
- Added extend method to store batches of experience in ReplayBuffer. (@solliet)
- Added extend method to store batches of experience in ReplayBuffer. (@PartiallyTyped)


Bug Fixes:
Expand Down Expand Up @@ -711,5 +713,5 @@ Thanks to @bjmuld @iambenzo @iandanforth @r7vme @brendenpetersen @huvar @abhiskk
@XMaster96 @kantneel @Pastafarianist @GerardMaggiolino @PatrickWalter214 @yutingsz @sc420 @Aaahh @billtubbs
@Miffyli @dwiel @miguelrass @qxcv @jaberkow @eavelardev @ruifeng96150 @pedrohbtp @srivatsankrishnan @evilsocket
@MarvineGothic @jdossgollin @SyllogismRXS @rusu24edward @jbulow @Antymon @seheevic @justinkterry @edbeeching
@flodorner @KuKuXia @NeoExtended @solliet @mmcenta @richardwu @tirafesi @caburu @johannes-dornheim @kvenkman @aakash94
@flodorner @KuKuXia @NeoExtended @PartiallyTyped @mmcenta @richardwu @tirafesi @caburu @johannes-dornheim @kvenkman @aakash94
@enderdead @hardmaru
50 changes: 50 additions & 0 deletions docs/modules/a2c.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,3 +75,53 @@ Parameters
.. autoclass:: A2C
:members:
:inherited-members:


Callbacks - Accessible Variables
--------------------------------

Depending on initialization parameters and timestep, different variables are accessible.
Variables accessible "From timestep X" are variables that can be accessed when
``self.timestep==X`` in the ``on_step`` function.

+--------------------------------+-----------------------------------------------------+
|Variable | Availability|
+================================+=====================================================+
|- self |From timestep 1 |
|- total_timesteps | |
|- callback | |
|- log_interval | |
|- tb_log_name | |
|- reset_num_timesteps | |
|- new_tb_log | |
|- writer | |
|- t_start | |
|- mb_obs | |
|- mb_rewards | |
|- mb_actions | |
|- mb_values | |
|- mb_dones | |
|- mb_states | |
|- ep_infos | |
|- actions | |
|- values | |
|- states | |
|- clipped_actions | |
|- obs | |
|- rewards | |
|- dones | |
|- infos | |
+--------------------------------+-----------------------------------------------------+
|- info |From timestep 2 |
|- maybe_ep_info | |
+--------------------------------+-----------------------------------------------------+
|- update |From timestep ``n_step+1`` |
|- rollout | |
|- masks | |
|- true_reward | |
+--------------------------------+-----------------------------------------------------+
|- value_loss |From timestep ``2 * n_step+1`` |
|- policy_entropy | |
|- n_seconds | |
|- fps | |
+--------------------------------+-----------------------------------------------------+
46 changes: 46 additions & 0 deletions docs/modules/acer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,3 +70,49 @@ Parameters
.. autoclass:: ACER
:members:
:inherited-members:


Callbacks - Accessible Variables
--------------------------------

Depending on initialization parameters and timestep, different variables are accessible.
Variables accessible from "timestep X" are variables that can be accessed when
``self.timestep==X`` from the ``on_step`` function.

+--------------------------------+-----------------------------------------------------+
|Variable | Availability|
+================================+=====================================================+
|- self | From timestep 1 |
|- total_timesteps | |
|- callback | |
|- log_interval | |
|- tb_log_name | |
|- reset_num_timesteps | |
|- new_tb_log | |
|- writer | |
|- episode_stats | |
|- buffer | |
|- t_start | |
|- enc_obs | |
|- mb_obs | |
|- mb_actions | |
|- mb_mus | |
|- mb_dones | |
|- mb_rewards | |
|- actions | |
|- states | |
|- mus | |
|- clipped_actions | |
|- obs | |
|- rewards | |
|- dones | |
+--------------------------------+-----------------------------------------------------+
|- steps | From timestep ``n_step+1`` |
|- masks | |
+--------------------------------+-----------------------------------------------------+
|- names_ops | From timestep ``2 * n_step+1`` |
|- values_ops | |
+--------------------------------+-----------------------------------------------------+
|- samples_number | After replay_start steps, when replay_ratio > 0 and|
| | buffer is not None |
+--------------------------------+-----------------------------------------------------+
61 changes: 61 additions & 0 deletions docs/modules/acktr.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,3 +71,64 @@ Parameters
.. autoclass:: ACKTR
:members:
:inherited-members:




Callbacks - Accessible Variables
--------------------------------

Depending on initialization parameters and timestep, different variables are accessible.
Variables accessible from "timestep X" are variables that can be accessed when
``self.timestep==X`` from the ``on_step`` function.

+--------------------------------+-----------------------------------------------------+
|Variable | Availability|
+================================+=====================================================+
|- self |From timestep 1 |
|- total_timesteps | |
|- callback | |
|- log_interval | |
|- tb_log_name | |
|- reset_num_timesteps | |
|- new_tb_log | |
|- writer | |
|- tf_vars | |
|- is_uninitialized | |
|- new_uninitialized_vars | |
|- t_start | |
|- coord | |
|- enqueue_threads | |
|- old_uninitialized_vars | |
|- mb_obs | |
|- mb_rewards | |
|- mb_actions | |
|- mb_values | |
|- mb_dones | |
|- mb_states | |
|- ep_infos | |
|- _ | |
|- actions | |
|- values | |
|- states | |
|- clipped_actions | |
|- obs | |
|- rewards | |
|- dones | |
|- infos | |
+--------------------------------+-----------------------------------------------------+
|- info |From timestep 2 |
|- maybe_ep_info | |
+--------------------------------+-----------------------------------------------------+
|- update |From timestep ``n_steps+1`` |
|- rollout | |
|- returns | |
|- masks | |
|- true_reward | |
+--------------------------------+-----------------------------------------------------+
|- policy_loss |From timestep ``2*n_steps+1`` |
|- value_loss | |
|- policy_entropy | |
|- n_seconds | |
|- fps | |
+--------------------------------+-----------------------------------------------------+

0 comments on commit b3b217c

Please sign in to comment.