GitHub - Ezgii/PPO-on-pendulum: Training a PPO to balance a pendulum in a fully observable environment.

Project description

An implementation of the PPO algorithm written in Python using Pytorch. The actor and critic networks are a simple MLP with one hidden layer of size 64. The environment is fully observable; i.e. obs = [cos(angle), sin(angle), angular velocity].

balancing_pendulum.mov

Environment

OpenAI's Gym is a framework for training reinforcement learning agents. It provides a set of environments and a standardized interface for interacting with those.
In this project, I used the Pendulum environment from gym.

Installation

Using conda (recommended)

Install Anaconda
Create the env
conda create a1 python=3.8
Activate the env
conda activate a1
install torch (steps from pytorch installation guide):

if you don't have an nvidia gpu or don't want to bother with cuda installation:
conda install pytorch torchvision torchaudio cpuonly -c pytorch
if you have an nvidia gpu and want to use it:
install cuda
install torch with cuda:
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch

other dependencies
conda install -c conda-forge matplotlib gym opencv pyglet

Using pip

python3 -m pip install -r requirements.txt

How to run the code

On terminal, write:

python3 main.py

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
__pycache__		__pycache__
a3_gym_env		a3_gym_env
results		results
.DS_Store		.DS_Store
Modules.py		Modules.py
README.md		README.md
balancing_pendulum.mov		balancing_pendulum.mov
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project description

Environment

Installation

Using conda (recommended)

Using pip

How to run the code

Results

Loss functions and Learning curve:

Value grid:

About

Releases

Packages

Languages

Ezgii/PPO-on-pendulum

Folders and files

Latest commit

History

Repository files navigation

Project description

Environment

Installation

Using conda (recommended)

Using pip

How to run the code

Results

Loss functions and Learning curve:

Value grid:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages