Skip to content

SB3 v1.6.0: Recurrent PPO (PPO LSTM), better defaults for learning from pixels with SAC/TD3

Compare
Choose a tag to compare
@araffin araffin released this 12 Jul 20:55
· 208 commits to master since this release
c1f1c3d

SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

Breaking Changes:

  • Changed the way policy "aliases" are handled ("MlpPolicy", "CnnPolicy", ...), removing the former
    register_policy helper, policy_base parameter and using policy_aliases static attributes instead (@Gregwar)
  • SB3 now requires PyTorch >= 1.11
  • Changed the default network architecture when using CnnPolicy or MultiInputPolicy with SAC or DDPG/TD3,
    share_features_extractor is now set to False by default and the net_arch=[256, 256] (instead of net_arch=[] that was before)

SB3-Contrib

Bug Fixes:

  • Fixed saving and loading large policies greater than 2GB (@jkterry1, @ycheng517)
  • Fixed final goal selection strategy that did not sample the final achieved goal (@qgallouedec)
  • Fixed a bug with special characters in the tensorboard log name (@quantitative-technologies)
  • Fixed a bug in DummyVecEnv's and SubprocVecEnv's seeding function. None value was unchecked (@ScheiklP)
  • Fixed a bug where EvalCallback would crash when trying to synchronize VecNormalize stats when observation normalization was disabled
  • Added a check for unbounded actions
  • Fixed issues due to newer version of protobuf (tensorboard) and sphinx
  • Fix exception causes all over the codebase (@cool-RR)
  • Prohibit simultaneous use of optimize_memory_usage and handle_timeout_termination due to a bug (@MWeltevrede)
  • Fixed a bug in kl_divergence check that would fail when using numpy arrays with MultiCategorical distribution

Others:

  • Upgraded to Python 3.7+ syntax using pyupgrade
  • Removed redundant double-check for nested observations from BaseAlgorithm._wrap_env (@TibiGG)

Documentation:

  • Added link to gym doc and gym env checker
  • Fix typo in PPO doc (@bcollazo)
  • Added link to PPO ICLR blog post
  • Added remark about breaking Markov assumption and timeout handling
  • Added doc about MLFlow integration via custom logger (@git-thor)
  • Updated Huggingface integration doc
  • Added copy button for code snippets
  • Added doc about EnvPool and Isaac Gym support