Deep Atari

1-step Q Learning from the paper "Asynchronous Methods for Deep Reinforcement Learning" Atari games are one of the coolest games out there and have gained widespread mainstream popularity. Breakout is one of my personal favorites. Pong, which was the first game ever developed by Atari Inc. was also one of the most influential video games ever created. In 2013, Deep Mind released its paper “Playing Atari with Deep Reinforcement Learning”. It's a very popular paper in literature. My project implements 1-step Q Learning from this paper.

Environment:

Here, I've used OpenAI gym's Atari environment, which is a toolkit for developing and comparing RL algorithms. Changing the environments is as simple as changing the value of a string variable.

Q learning:

In Q-learning we define a function Q(s, a) representing the maximum discounted future reward when we perform action a in state s, and continue optimally from that point on. We can think of Q(s, a) as the best possible score at the end of the game after performing action a in state s. It is called Q-function, because it represents the “quality” of a certain action in a given state.

Deep Q network:

We use a CNN which takes in the State S, and predicts the Q values for all the possible actions from that state S. The network architecture that DeepMind used is as follows: This is a classical convolutional neural network with three convolutional layers, followed by two fully connected layers. There are no pooling layers since pooling layers buy us translation invariance, which is not something that we desire when we train our bots for games. Input to the network are four 84×84 grayscale game screens. We use 4 recent screens as the environment state. Outputs of the network are Q-values for each possible action. This is a regression task, since Q-values can be any real values . The loss function of this network is a simple squared error loss.

Training the network:

Experience Replay:

During gameplay all the experiences < s, a, r, s’> are stored in a replay memory. We use these minibatches to train the network, which makes the training task similar to usual supervised learning.

Exploitation vs Exploration:

Results:

https://www.youtube.com/watch?v=0KRVL-VkMGw

https://www.youtube.com/watch?v=0-ATaiFjzi8

https://www.youtube.com/watch?v=48HNdmfGEjE

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Atari.ipynb		Atari.ipynb
README.md		README.md
atari_environment.py		atari_environment.py
globals.py		globals.py
model.py		model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Atari

Environment:

Q learning:

Deep Q network:

Training the network:

Experience Replay:

Exploitation vs Exploration:

Results:

About

Releases

Packages

Languages

bhaktipriya/Atari

Folders and files

Latest commit

History

Repository files navigation

Deep Atari

Environment:

Q learning:

Deep Q network:

Training the network:

Experience Replay:

Exploitation vs Exploration:

Results:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages