Skip to content

Training a PPO to balance a pendulum in a fully observable environment.

Notifications You must be signed in to change notification settings

Ezgii/PPO-on-pendulum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project description

An implementation of the PPO algorithm written in Python using Pytorch. The actor and critic networks are a simple MLP with one hidden layer of size 64. The environment is fully observable; i.e. obs = [cos(angle), sin(angle), angular velocity].

balancing_pendulum.mov

Environment

OpenAI's Gym is a framework for training reinforcement learning agents. It provides a set of environments and a standardized interface for interacting with those.
In this project, I used the Pendulum environment from gym.

Installation

Using conda (recommended)

  1. Install Anaconda

  2. Create the env
    conda create a1 python=3.8

  3. Activate the env
    conda activate a1

  4. install torch (steps from pytorch installation guide):

  • if you don't have an nvidia gpu or don't want to bother with cuda installation:
    conda install pytorch torchvision torchaudio cpuonly -c pytorch

  • if you have an nvidia gpu and want to use it:
    install cuda
    install torch with cuda:
    conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch

  1. other dependencies
    conda install -c conda-forge matplotlib gym opencv pyglet

Using pip

python3 -m pip install -r requirements.txt

How to run the code

On terminal, write:

python3 main.py

Results

Loss functions and Learning curve:

figure1

Value grid:

figure2

About

Training a PPO to balance a pendulum in a fully observable environment.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages