Skip to content

This repository contains a clean and efficient implementation of the Proximal Policy Optimization (PPO) algorithm, a state-of-the-art policy gradient method for reinforcement learning.

License

Notifications You must be signed in to change notification settings

ai-in-pm/Proximal-Policy-Optimization-Algorithms

Repository files navigation

Proximal Policy Optimization (PPO) Implementation

A comprehensive implementation of Proximal Policy Optimization (PPO) algorithms in PyTorch, featuring both theoretical foundations and practical demonstrations.

🌟 Features

  • Clean, modular PyTorch implementation of PPO
  • Support for continuous and discrete action spaces
  • Implementations of key PPO components:
    • Clipped surrogate objective
    • Value function estimation
    • Generalized Advantage Estimation (GAE)
    • Policy and value function updates
  • Multiple environment demonstrations:
    • CartPole-v1
    • LunarLander-v2
  • Real-time visualization of agent performance
  • Training progress tracking and plotting

📋 Requirements

# Install dependencies
pip install -r requirements.txt

🚀 Quick Start

  1. Clone the repository:
git clone https://github.com/ai-in-pm/Proximal-Policy-Optimization-Algorithms.git
cd Proximal-Policy-Optimization-Algorithms
  1. Install dependencies:
pip install -r requirements.txt
  1. Run a demo:
# Run CartPole demo
python demonstrations/cartpole_demo.py

# Run LunarLander demo
python demonstrations/lunar_lander_demo.py

🏗️ Project Structure

.
├── ppo.py              # Core PPO implementation
├── demonstrations/     # Example implementations
│   ├── cartpole_demo.py
│   ├── lunar_lander_demo.py
│   └── README.md
├── requirements.txt    # Project dependencies
└── README.md          # This file

💻 Implementation Details

Core Components

  1. Actor-Critic Architecture

    • Actor (Policy) network outputs action distributions
    • Critic (Value) network estimates state values
  2. PPO Algorithm

    • Clipped surrogate objective for stable updates
    • Value function loss with clipping
    • Entropy bonus for exploration
    • Generalized Advantage Estimation (GAE)
  3. Key Features

    • Modular design for easy extension
    • Configurable hyperparameters
    • Support for different environments
    • Training progress visualization

Hyperparameters

  • Learning rate: 3e-4
  • Discount factor (gamma): 0.99
  • GAE parameter (lambda): 0.95
  • Clipping parameter (epsilon): 0.2
  • Value function coefficient: 1.0
  • Entropy coefficient: 0.01

📊 Results

The implementation has been tested on various environments:

  1. CartPole-v1

    • Achieves optimal performance (500 steps) within 500 episodes
    • Stable learning across different random seeds
  2. LunarLander-v2

    • Achieves landing within 1000 episodes
    • Demonstrates stable control and smooth landing

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. See CONTRIBUTING.md for guidelines.

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

📚 References

  1. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal Policy Optimization Algorithms. arXiv preprint arXiv:1707.06347.
  2. Gymnasium Documentation: https://gymnasium.farama.org/
  3. PyTorch Documentation: https://pytorch.org/docs/

About

This repository contains a clean and efficient implementation of the Proximal Policy Optimization (PPO) algorithm, a state-of-the-art policy gradient method for reinforcement learning.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages