A comprehensive implementation of Proximal Policy Optimization (PPO) algorithms in PyTorch, featuring both theoretical foundations and practical demonstrations.
- Clean, modular PyTorch implementation of PPO
- Support for continuous and discrete action spaces
- Implementations of key PPO components:
- Clipped surrogate objective
- Value function estimation
- Generalized Advantage Estimation (GAE)
- Policy and value function updates
- Multiple environment demonstrations:
- CartPole-v1
- LunarLander-v2
- Real-time visualization of agent performance
- Training progress tracking and plotting
# Install dependencies
pip install -r requirements.txt
- Clone the repository:
git clone https://github.com/ai-in-pm/Proximal-Policy-Optimization-Algorithms.git
cd Proximal-Policy-Optimization-Algorithms
- Install dependencies:
pip install -r requirements.txt
- Run a demo:
# Run CartPole demo
python demonstrations/cartpole_demo.py
# Run LunarLander demo
python demonstrations/lunar_lander_demo.py
.
├── ppo.py # Core PPO implementation
├── demonstrations/ # Example implementations
│ ├── cartpole_demo.py
│ ├── lunar_lander_demo.py
│ └── README.md
├── requirements.txt # Project dependencies
└── README.md # This file
-
Actor-Critic Architecture
- Actor (Policy) network outputs action distributions
- Critic (Value) network estimates state values
-
PPO Algorithm
- Clipped surrogate objective for stable updates
- Value function loss with clipping
- Entropy bonus for exploration
- Generalized Advantage Estimation (GAE)
-
Key Features
- Modular design for easy extension
- Configurable hyperparameters
- Support for different environments
- Training progress visualization
- Learning rate: 3e-4
- Discount factor (gamma): 0.99
- GAE parameter (lambda): 0.95
- Clipping parameter (epsilon): 0.2
- Value function coefficient: 1.0
- Entropy coefficient: 0.01
The implementation has been tested on various environments:
-
CartPole-v1
- Achieves optimal performance (500 steps) within 500 episodes
- Stable learning across different random seeds
-
LunarLander-v2
- Achieves landing within 1000 episodes
- Demonstrates stable control and smooth landing
Contributions are welcome! Please feel free to submit a Pull Request. See CONTRIBUTING.md for guidelines.
This project is licensed under the MIT License - see the LICENSE file for details.
- Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal Policy Optimization Algorithms. arXiv preprint arXiv:1707.06347.
- Gymnasium Documentation: https://gymnasium.farama.org/
- PyTorch Documentation: https://pytorch.org/docs/