Release SB3 v1.6.0: Recurrent PPO (PPO LSTM), better defaults for learning from pixels with SAC/TD3 · DLR-RM/stable-baselines3

Breaking Changes:

Changed the way policy "aliases" are handled ("MlpPolicy", "CnnPolicy", ...), removing the former
register_policy helper, policy_base parameter and using policy_aliases static attributes instead (@Gregwar)
SB3 now requires PyTorch >= 1.11
Changed the default network architecture when using CnnPolicy or MultiInputPolicy with SAC or DDPG/TD3,
share_features_extractor is now set to False by default and the net_arch=[256, 256] (instead of net_arch=[] that was before)

Fixed saving and loading large policies greater than 2GB (@jkterry1, @ycheng517)
Fixed final goal selection strategy that did not sample the final achieved goal (@qgallouedec)
Fixed a bug with special characters in the tensorboard log name (@quantitative-technologies)
Fixed a bug in DummyVecEnv's and SubprocVecEnv's seeding function. None value was unchecked (@ScheiklP)
Fixed a bug where EvalCallback would crash when trying to synchronize VecNormalize stats when observation normalization was disabled
Added a check for unbounded actions
Fixed issues due to newer version of protobuf (tensorboard) and sphinx
Fix exception causes all over the codebase (@cool-RR)
Prohibit simultaneous use of optimize_memory_usage and handle_timeout_termination due to a bug (@MWeltevrede)
Fixed a bug in kl_divergence check that would fail when using numpy arrays with MultiCategorical distribution

Upgraded to Python 3.7+ syntax using pyupgrade
Removed redundant double-check for nested observations from BaseAlgorithm._wrap_env (@TibiGG)