Skip to content

SinanGncgl/Deep-Q-Network-AtariBreakoutGame

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 

Repository files navigation

Deep-Q-Network-AtariBreakoutGame

Playing Atari Breakout Game with Reinforcement Learning ( Deep Q Learning )

Overview

This project follows the description of the Deep Q Learning algorithm described in this paper.

Installation Dependencies:

  • Python 3.x
  • Numpy
  • OpenCV-Python
  • PyGame
  • PyTorch

How To Run

  • git clone https://github.com/SnnGnc/Deep-Q-Network-AtariBreakoutGame.git
  • cd brekout
  • To train the game;
  • python dqn.py train,
  • To test the pre-trained version,
  • python dqn.py test

What is Deep Q Learning and How does it works ?

I highly recommend to read this Demystifying Deep Reinforcement Learning who are curious about reinforcement learning.

DQN Algorithm

dqn

Future Reward Function = Q

q_learning_equation

Loss Function

q-learning-equation

Network Architecture

"Working directly with raw Atari frames, which are 210 × 160 (in our case it depends on pygame screen) pixel images with a 128 color palette, can be computationally demanding, so we apply a basic preprocessing step aimed at reducing the input dimensionality. The raw frames are preprocessed by first converting their RGB representation to gray-scale and down-sampling it to a 84×84 image.As input Q-Network is preprocessing to the last 4 frames of a history and stacks them to produce the input to the Q-function.This process can be visualized as the following figure:

a0a1a2a3

And convert these images to gray scale...

stack0stack1stack2stack3

And send these into the Q-Network.

So what we have done;

  • Take last 4 frames
  • Resize images to 84x84
  • Convert frames to gray-scale
  • Stack them 84x84x4 input array and send them into the Q-Network.

The input to the neural network consists is an 84 × 84 × 4 image produced by φ. The first hidden layer convolves 32 8 × 8 filters with stride 4 with the input image and applies a rectifier nonlinearity. The second hidden layer convolves 64 4 × 4 filters with stride 2, again followed by a rectifier nonlinearity.The third hidden layer is fully-connected and consists of 7x7x64 input with 512 output,followed by a rectifier nonlinearity(input tensor is flattened). The final hidden layer is fully-connected and consists of 512 rectifier units. The output layer is a fully-connected linear layer with a single output for each valid action. The number of valid actions are 1 for left and 0 for right action.The architecture of the network is shown in the figure below:(Coming...)

Any contribution is welcome.

Releases

No releases published

Packages

No packages published

Languages