- The random noise to help for better exploration (Ornstein–Uhlenbeck process)
- The initialization of weights (torch.nn.init.xavier_normal_)
- The architecture was not big enough (just play with it a bit)
- The activation function (ELU)
simple_spread
environment- PyTorch | SIGMOID
- PyTorch Lightning | LIGHTNINGDATAMODULE
- PyTorch Lightning | MANAGING DATA
- RealPython | How to Use Generators and yield in Python
- Other implementations: variant 1, variant 2, variant 3
- DDPG | OpenAI
- Environments | OpenAI
- Optimization | Pytorch-Lightning
- Adam Grad - page 36 (Training NNs from Stanford's course)
- Kullback–Leibler divergence (YouTube video) - great
- Deep-Reinforcement-Learning-Hands-On-Second-Edition (page 512)
- 1 - Deep Deterministic Policy Gradient (DDPG): Theory and Implementation | Medium
- 2 - DDPG implementation | Medium
- Policy Gradient Algorithms | Lilian Weng's Blog
- PyTorch | DATASETS & DATALOADERS
- PyTorch | SAVING AND LOADING MODELS
- GitHub | zzzxxxttt / pytorch_simple_RL
- A Gentle Introduction to Cross-Entropy for Machine Learning
- Multi-Agent Actor-Critic for MixedCooperative-Competitive Environments, Lowe 2020
- Deterministic Policy Gradient Algorithms, Silver 2014 (paper)
- Policy Gradient Methods for Reinforcement Learning with Function Approximation, Sutton 1999 (paper)
- CONTINUOUS CONTROL WITH DEEP REINFORCEMENT LEARNING, Lillicrap 2016 (paper)
- Policy Gradient Algorithms, Weng 2018