Deep Q-Network (DQN) implemeted by PyTorch for the Unity-based Banana Envirnment

Introduction

This repository is an implementation of the DQN algorithm for the Banana Environment developed by Unity3D and accessed through the UnityEnvironment library. It is an extension of the code sample provided by the Udacity Deep RL teaching crew (for more information visit their website). The environment is presented as a vector; thus, we did not use Convolutional Neural Networks (CNN) in the implementation.

This repository consists of these files:

These files are saved under the "src" directory.

model.py : This module provides the underlying neural network for our agent. When we train our agent, this neural network is going to be updated by backpropagation.
buffer.py: This module implements the "memory" of our agent, also known as the Experience Replay.
agent.py: This is the body of our agent. It implements the way the agent acts (using epsilon-greedy policy), and learn an optimal policy.
train.py: This module has the train function which takes the agent, the environment, number of training episodes and the required hyper-parameters and trains the agent accordingly.

To test the code, after cloning the project, open the Navigation.ipynb notebook. It has all the necessary steps to install and load the packages, and train and test the agent. It also automatically detects the operating system, and loads the corresponding environment. There is an already trained agent stored in checkpoint.pth, by running the last part of the notebook, this can be directly tested.

The Banana Environment

The Banana environment is a vectorized version of the Banana Collection designed by the Unity game engine. The task is simple. The agent explores its world consists of Yellow (good) and Purple (bad) bananas. It should absorb the good ones (by walking on them), and avoid the bad ones. Having a good banana is rewarded with a +1 reward, and a bad one with -1. In this activity, we consider any agent capable of achieving an average reward more than 13, a successful agent.

The action space has four discrete actions:

Move Forward (action 0)
Move Backward (action 1)
Turn Right (action 2)
Turn Left (action 3)

A Smart Agent

Here is a reward plot acquired by the agent while learning. It surpasses +16 after around 1100 episodes.

Look at it go:

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
Banana_ENVs		Banana_ENVs
pics		pics
python		python
src		src
tests		tests
.gitignore		.gitignore
Navigation.ipynb		Navigation.ipynb
README.md		README.md
Report.md		Report.md
checkpoint.pth		checkpoint.pth

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Q-Network (DQN) implemeted by PyTorch for the Unity-based Banana Envirnment

Introduction

The Banana Environment

A Smart Agent

About

Releases

Packages

Languages

FredAmouzgar/DQN_PyTorch

Folders and files

Latest commit

History

Repository files navigation

Deep Q-Network (DQN) implemeted by PyTorch for the Unity-based Banana Envirnment

Introduction

The Banana Environment

A Smart Agent

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages