# Deep Q-Network (DQN)

**Description:** Implementing DQN algorithm on the Robot Navigation Problem.

## Introduction

**Deep Q-Network (DQN)** is a model-free off-policy algorithm for learning discrete actions.

It uses neural networks to apply function approximation, to estimate the Q-function while avoiding convergence problems in the case of continous actions space, states space or both.

It uses a method called replayed memory, which represents a memory buffer for storing the experiences of the agent.

## Problem

We are trying to solve the **Robot Navigation** problem.
In this setting, we can assume taking three actions: [Forward, LeftTurn, RightTurn].


The implementation of the algorithm is carried out in the rest of the file.

In [None]:
import numpy as np
import gym
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import mean_squared_error
from matplotlib import pyplot as plt

In [None]:

class DQNAgent:
    def __init__(self, state_size, action_size):
        self.n_actions = action_size
        '''
        # "lr" : learning rate
        # "gamma": discount factor
        # "decay_rate": exponential decay rate of the exploration probability
        # "batch_size": size of the sampled experiences to train the DNN
        # "epsilon:" exploration probability
        '''
        self.lr = 0.001
        self.gamma = 0.99
        self.epsilon = 1.0
        self.decay_rate = 0.005
        self.batch_size = 32
        
        # Define memory buffer for experiences storage
        self.memory_buffer= list()
        # Store the last 2000  time steps
        self.max_memory_buffer = 2000
        
        # Create a model having two hidden layers of 24 units
        # The first layer has the size of "state space"
        # The last layer has the size of "actions space"
        self.model = Sequential([
            Dense(units=24,input_dim=state_size, activation = 'relu'),
            Dense(units=24,activation = 'relu'),
            Dense(units=action_size, activation = 'linear')
        ])
        self.model.compile(loss="mse", optimizer = Adam(lr=self.lr))

        def get_action(self, current_state):
            if np.random.uniform(0,1) < self.epsilon:
                return np.random.choice(range(self.n_actions))
            q_values = self.model.predict(current_state)[0]
            return np.argmax(q_values)

        def update_epsilon(self):
            self.epsilon = self.epsilon*np.exp(-self.decay_rate)
            print(self.epsilon)

        def store_episode(self, current_state, action, reward, next_state, done):
            self.memory_buffer.append({
                "current_state":current_state,
                "action":action,
                "reward":reward,
                "next_state":next_state,
                "done":done,
            })
            # @ memory buffer size exceeds the max. size, remove the first element (oldest)
            if len(self.memory_buffer) > self.max_memory_buffer:
                self.memory_buffer.pop(0)

        def train(self):
            # Select batch size of experience, after shuffling the memory buffer
            np.random.shuffle(self.memory_buffer)
            batch_sample = self.memory_buffer[0:self.batch_size] 

In [None]:
env = gym.make('bot3RLNav/DiscreteWorld-v0', map_file="data/gray.jpg")
prev_state = env.reset()
x = [x.tolist() for x in prev_state.values()]
print(x)