Skip to content

drakesvoboda/RL-QWOP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RL QWOP

drawing

Forked from here.

A friend of mine is really good at the video game QWOP where you control a sprinters legs to run a 100 meter dash: you can try it out yourself. I'm not very good at the game, but I wanted to beat my friend, so I trained a computer to beat the game for me using a reinforcement learning algorithm called proximal policy optimization.

Requirements

QWOP Environment

I modified the internals of QWOP so that it can be run as a gym environment. Agents can send key commands to the game, observe the state of the game, and are rewarded by the runner's velocity. The game is hosted in a local node server. To get it running, run the following:

cd game
npm i
node server

The ./gym-qwop folder has python classes modeling QWOP as an open-ai gym environment. There are three versions of the environment: qwop-v0, frame-qwop-v0, and multi-frame-qwop-v0. Run pip install ./gym-qwop/ to install the gym environment.

The default environment (qwop-v0) encodes the state of the game as the position and angle of each of the runner's limbs. frame-qwop-v0 encodes the state as the pixel data of the current frame of the game. multi-frame-qwop-v0 uses three sequential frames as the state.

Since we have implemented the environment as a gym environment, we can use existing implementations of many popular RL algorithms to train an agent. Running python run-gym.py will train a model using one of openai's implementations.

Proximal Policy Optimization

The ./RLQWOP directory contains a custom implementation of the proximal policy optimization algorithm. My implementation borrows much from openai's spinning up implementation of PPO with a modified strategy to parallelize training using OpenMPI.

In my implementation, several actor processes maintain their own copy of the model and gather experiences from a locally simulated environment. Each actor adds their experiences to a shared replay buffer. A single learner process does gradient updates from the experiences stored in the replay buffer. After updating the model, the learner distributes the model's parameters back to each actor.

Running python run.py will train a model using my implementation.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published