GitHub - AdamStelmaszczyk/dqn: TensorFlow & Keras implementation of DQN with HER (Hindsight Experience Replay)

TensorFlow & Keras implementation of DQN with HER (Hindsight Experience Replay)

Hardware

If TensorFlow finds a GPU you will see Creating TensorFlow device (/device:GPU:0) in the beginning of log and the code will use 1 GPU + 1 CPU. If it doesn't find a GPU, it will use 1 CPU.

Tesla K40 + Intel i5 Haswell give about 80 steps/s during training. 1M training + 200k evaluation steps (20k evaluation steps every 100k training steps) takes about 3.5 hours with K40.

I'd recommend about 10 GB of RAM to safely train. REPLAY_BUFFER_SIZE = 100000 and stacking 4 frames in the observation already uses 84 * 84 * 4 * 100000 = 2.6 GB RAM.

Install

Clone this repo: git clone https://github.com/AdamStelmaszczyk/dqn.git.
Install conda for dependency management.
Create dqn conda environment: conda create -yn dqn python=3 tensorflow tensorflow-gpu opencv psutil.
Activate dqn conda environment: source activate dqn. All the following commands should be run in the activated dqn environment.
Install OpenAI gym: pip install gym[atari].

There is an automatic build on Travis which does the same.

Uninstall

Deactivate conda environment: conda deactivate.
Remove dqn conda environment: conda env remove -yn dqn.

Usage

Basic file is run.py.

usage: run.py [-h] [--debug] [--env ENV] [--eval] [--images] [--model MODEL]
              [--name NAME] [--play] [--seed SEED] [--test] [--view]
              [--weights]

optional arguments:
  -h, --help     show this help message and exit
  --debug        load debug files and run fit_batch with them (default: False)
  --env ENV      Atari game name (default: Breakout)
  --eval         run evaluation with log only (default: False)
  --images       save images during evaluation (default: False)
  --model MODEL  model filename to load (default: None)
  --name NAME    name for saved files (default: 10-23-22-04)
  --play         play with WSAD + Space (default: False)
  --seed SEED    pseudo random number generator seed (default: None)
  --test         run tests (default: False)
  --view         view evaluation in a window (default: False)
  --weights      print model weights (default: False)

Train

python run.py --env Pong

There are 60 games you can choose from:

AirRaid, Alien, Amidar, Assault, Asterix, Asteroids, Atlantis, BankHeist, BattleZone, BeamRider, Berzerk, Bowling, Boxing, Breakout, Carnival, Centipede, ChopperCommand, CrazyClimber, DemonAttack, DoubleDunk, ElevatorAction, Enduro, FishingDerby, Freeway, Frostbite, Gopher, Gravitar, Hero, IceHockey, Jamesbond, JourneyEscape, Kangaroo, Krull, KungFuMaster, MontezumaRevenge, MsPacman, NameThisGame, Phoenix, Pitfall, Pong, Pooyan, PrivateEye, Qbert, Riverraid, RoadRunner, Robotank, Seaquest, Skiing, Solaris, SpaceInvaders, StarGunner, Tennis, TimePilot, Tutankham, UpNDown, Venture, VideoPinball, WizardOfWor, YarsRevenge, Zaxxon

Play using the same observations as DQN

python run.py --play

Keys:

W - up
S - down
A - left
D - right
SPACE - fire button (concrete action depends on a game)

Generate GIFs

Generate images: python run.py --images --model=PONG_MODEL.h5 --env Pong.
We will use convert tool, which is part of ImageMagick, here are the installation instructions.
Convert images from episode 1 to GIF: convert -layers optimize-frame 1_*.png 1.gif

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
gifs		gifs
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
atari_wrappers.py		atari_wrappers.py
replay_buffer.py		replay_buffer.py
run.py		run.py
tensor_board_logger.py		tensor_board_logger.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hardware

Install

Uninstall

Usage

Train

Play using the same observations as DQN

Generate GIFs

Best scores observed using the same hyperparameters as in the code

Pong: 21 after 0.5M steps

Breakout: 419 after 2M steps

SpaceInvaders: 1370 after 6.5M steps

BeamRider: 7111 after 5.5M steps

Seaquest: 8040 after 6.5M steps

Links

About

Releases

Packages

Languages

License

AdamStelmaszczyk/dqn

Folders and files

Latest commit

History

Repository files navigation

Hardware

Install

Uninstall

Usage

Train

Play using the same observations as DQN

Generate GIFs

Best scores observed using the same hyperparameters as in the code

Pong: 21 after 0.5M steps

Breakout: 419 after 2M steps

SpaceInvaders: 1370 after 6.5M steps

BeamRider: 7111 after 5.5M steps

Seaquest: 8040 after 6.5M steps

Links

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages