dqn

Implementing Deep Q Networks (DQN) from scratch, using pytorch. I wrote a Medium post (Towards Data Science publication) describing my process, learnings, and results: https://towardsdatascience.com/learnings-from-reproducing-dqn-for-atari-games-1630d35f01a9.

Installation

I use the Poetry package manager. If you don't already have Poetry installed, see their docs for instructions (https://python-poetry.org/docs/master/). E.g. for a macOS, it just amounts to running curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/install-poetry.py | python - in terminal.
To install this dqn repository: git clone it, then navigate to the root directory and run poetry install.
To get the Atari envs working, you'll also need to follow these short instructions to download and import the Atari ROMs: https://github.com/openai/atari-py#roms
Test by running unit tests! Run pytest in the root directory.

Results

I tested this DQN implementation on some classic benchmarks (CartPole and FrozenLake) and some Atari games as well (Pong, Freeway). Here is a summary of the results (check out my Medium post for full details).

CartPole

(Left) Mean of 10 training runs on CartPole. Error ribbons, indicating 1 standard error, are in red. (Middle) A representative training run, where x-axis is number of env steps, y-axis is mean episode return over 100 evaluation episodes. (Right) Gameplay of a fully trained agent, whose goal is to move the cart so the pole stays balanced without toppling. (Image and gif source: author)

FrozenLake

(Left) Mean of 10 training runs on FrozenLake. Error ribbons, indicating 1 standard error, are in red. (Middle) A representative training run, where x-axis is number of env steps, y-axis is mean episode return over 100 evaluation episodes. (Right) Gameplay of a fully trained agent, whose goal is to navigate from the start position S to the goal position G by walking through frozen spaces F without falling into hole spaces H. The catch is that the floor is slippery and the actual step direction can be randomly rotated 90° from the intended direction. The agent’s input direction for every step is indicated at the top of the screen. (Image and gif source: author)

Pong

(Top) Three training runs on Pong, where x-axis is number of env steps and y-axis is episode return of a single evaluation episode. (Bottom) Gameplay of fully trained agent (green player), whose goal is to hit the ball past the opponent’s paddle. Here, I added a small amount of stochasticity (10% chance of random action) to show how the agent deals with a more varied range of scenarios. Without the added stochasticity, the agent beats the opponent in a very similar way each time. (Image and gif source: author)

Freeway

(Top) Three training runs on Freeway, where x-axis is number of env steps and y-axis is episode return of a single evaluation episode. (Bottom) Gameplay of fully trained agent (left-side player), whose goal is to direct the chicken across the road as quickly as possible while avoiding cars. (Image and gif source: author)

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
dqn		dqn
img		img
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dqn

dqn

img

img

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

poetry.lock

poetry.lock

pyproject.toml

pyproject.toml

Repository files navigation

dqn

Installation

Results

CartPole

FrozenLake

Pong

Freeway

About

Releases

Packages

Languages

License

dennischenfeng/dqn

Folders and files

Latest commit

History

Repository files navigation

dqn

Installation

Results

CartPole

FrozenLake

Pong

Freeway

About

Resources

License

Stars

Watchers

Forks

Languages