RL QWOP

Forked from here.

A friend of mine is really good at the video game QWOP where you control a sprinters legs to run a 100 meter dash: you can try it out yourself. I'm not very good at the game, but I wanted to beat my friend, so I trained a computer to beat the game for me using a reinforcement learning algorithm called proximal policy optimization.

Requirements

QWOP Environment

I modified the internals of QWOP so that it can be run as a gym environment. Agents can send key commands to the game, observe the state of the game, and are rewarded by the runner's velocity. The game is hosted in a local node server. To get it running, run the following:

cd game
npm i
node server

The ./gym-qwop folder has python classes modeling QWOP as an open-ai gym environment. There are three versions of the environment: qwop-v0, frame-qwop-v0, and multi-frame-qwop-v0. Run pip install ./gym-qwop/ to install the gym environment.

The default environment (qwop-v0) encodes the state of the game as the position and angle of each of the runner's limbs. frame-qwop-v0 encodes the state as the pixel data of the current frame of the game. multi-frame-qwop-v0 uses three sequential frames as the state.

Since we have implemented the environment as a gym environment, we can use existing implementations of many popular RL algorithms to train an agent. Running python run-gym.py will train a model using one of openai's implementations.

Proximal Policy Optimization

The ./RLQWOP directory contains a custom implementation of the proximal policy optimization algorithm. My implementation borrows much from openai's spinning up implementation of PPO with a modified strategy to parallelize training using OpenMPI.

In my implementation, several actor processes maintain their own copy of the model and gather experiences from a locally simulated environment. Each actor adds their experiences to a shared replay buffer. A single learner process does gradient updates from the experiences stored in the replay buffer. After updating the model, the learner distributes the model's parameters back to each actor.

Running python run.py will train a model using my implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
RLQWOP		RLQWOP
game		game
gym-qwop		gym-qwop
.gitignore		.gitignore
README.md		README.md
run-gym.py		run-gym.py
run.py		run.py
test-gym.py		test-gym.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RLQWOP

RLQWOP

game

game

gym-qwop

gym-qwop

.gitignore

.gitignore

README.md

README.md

run-gym.py

run-gym.py

run.py

run.py

test-gym.py

test-gym.py

test.py

test.py

Repository files navigation

RL QWOP

QWOP Environment

Proximal Policy Optimization

About

Releases

Packages

Languages

drakesvoboda/RL-QWOP

Folders and files

Latest commit

History

Repository files navigation

RL QWOP

QWOP Environment

Proximal Policy Optimization

About

Resources

Stars

Watchers

Forks

Languages