Project 1 - Navigation using Reinforcement Learning (DQN and Double DQN)

Introduction

The first project in Udacity Deep Reinforcement Learning nanodegree consists on solving a navigation problem - collecting yellow bananas and avoiding blue bananas in a large, square world - using reinforcement learning.

Our solution uses a standard Deep Q-Network (DQN) algorithm with experience replay and fixed Q-Targets, as described in the original research paper. We also implemented a variant known as Double DQN, which improves performance by avoiding overestimating action values, as described in the paper.

Both implementations were based on the samples provided as exercises in the course, modified to use the Unity ML-Agents environment, and use the PyTorch framework.

The Environment

In this project we'll train an agent to navigate a large, square two-dimensional world, collecting yellow bananas (each one providing a reward of +1) and avoiding blue bananas (which provides a negative reward of -1). This is an episodic task; the problem is considered solved if the agent get an average score of +13 over 100 consecutive episodes.

The state space perceived by the agent is a vector with 37 continuous dimensions, representing the agent's velocity and (ray-based) information about perceived objects around agent's forward direction. After each observation of the state space, the agent may choose between four discrete actions numbered from 0 to 3: move forward (0), move backward (1), turn left (2) and turn right (3).

This environment is a variant created for the nanodegree and provided as a compiled Unity binary. The animated image below was part of the problem description and illustrates the problem.

Getting Started

All the work was performed on a Windows 10 laptop, with a GeForce GTX 970M GPU. The training was performed using CUDA.

After cloning the project, download and extract the pre-built "Banana" environment using the link adequate to your operational system, in the same directory of the project:

Linux: click here
Mac OSX: click here
Windows (32-bit): click here
Windows (64-bit): click here

It is also necessary to install Unity ML-Agents, unityagents and NumPy. Our development used an Anaconda (Python 3.6) environment to install all packages.

Training the agent

Run python ./train.py to train the agent using Double DQN. The average rewards over 100 consecutive episodes will be printed to the standard output.

At the end, the plot showing the agent progress will be saved in the image training.png, and the model (weights learned by the agent) will be saved in file checkpoint.pth.

Running a trained agent

The repository already contains weights trained using DQN and Double DQN (checkpont_dqn.pth and checkpoint_ddqn.pth, respectively).

Run python ./test.py checkpoint_ddqn.pth to see the agent in action!

Report

Please refer to file Report.md for a detailed description of the solution, including neural network architecture and hyperparameters used.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
Report.md		Report.md
checkpoint_ddqn.pth		checkpoint_ddqn.pth
checkpoint_dqn.pth		checkpoint_dqn.pth
ddqn_results.png		ddqn_results.png
dqn_agent.py		dqn_agent.py
dqn_results.png		dqn_results.png
model.py		model.py
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Report.md

Report.md

checkpoint_ddqn.pth

checkpoint_ddqn.pth

checkpoint_dqn.pth

checkpoint_dqn.pth

ddqn_results.png

ddqn_results.png

dqn_agent.py

dqn_agent.py

dqn_results.png

dqn_results.png

model.py

model.py

test.py

test.py

train.py

train.py

Repository files navigation

Project 1 - Navigation using Reinforcement Learning (DQN and Double DQN)

Introduction

The Environment

Getting Started

Training the agent

Running a trained agent

Report

About

Releases

Packages

Contributors 2

Languages

muriloq/drlnd-p1-navigation

Folders and files

Latest commit

History

Repository files navigation

Project 1 - Navigation using Reinforcement Learning (DQN and Double DQN)

Introduction

The Environment

Getting Started

Training the agent

Running a trained agent

Report

About

Resources

Stars

Watchers

Forks

Languages