An implementation of the PPO algorithm written in Python using Pytorch. The actor and critic networks are a simple MLP with one hidden layer of size 64. The environment is fully observable; i.e. obs = [cos(angle), sin(angle), angular velocity].
balancing_pendulum.mov
OpenAI's Gym is a framework for training reinforcement
learning agents. It provides a set of environments and a
standardized interface for interacting with those.
In this project, I used the Pendulum environment from gym.
-
Create the env
conda create a1 python=3.8
-
Activate the env
conda activate a1
-
install torch (steps from pytorch installation guide):
-
if you don't have an nvidia gpu or don't want to bother with cuda installation:
conda install pytorch torchvision torchaudio cpuonly -c pytorch
-
if you have an nvidia gpu and want to use it:
install cuda
install torch with cuda:
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
- other dependencies
conda install -c conda-forge matplotlib gym opencv pyglet
python3 -m pip install -r requirements.txt
On terminal, write:
python3 main.py