# Keras-RL DQN Exercise 


In this exercise you are going to implement your first keras-rl agent based on the **Acrobot** environment (https://gym.openai.com/envs/Acrobot-v1/) <br />
The goal of this environment is to maneuver the robot arm upwards above the line with as little steps as possible

**TASK: Import necessary libraries** <br />

In [1]:
import time  # to reduce the game speed when playing manually

import gym  # Contains the game we want to play

# import necessary blocks from keras to build the Deep Learning backbone of our agent
from tensorflow.keras.models import Sequential  # To compose multiple Layers
from tensorflow.keras.layers import Dense  # Fully-Connected layer
from tensorflow.keras.layers import Activation  # Activation functions
from tensorflow.keras.layers import Flatten  # Flatten function

from tensorflow.keras.optimizers import Adam  # Adam optimizer

# Now the keras-rl2 packages. Dont get confused as they are only called rl and not keras-rl

from rl.agents.dqn import DQNAgent  # Use the basic Deep-Q-Network agent

# LinearAnnealedPolicy allows to decay the epsilon for the epsilon greedy strategy
from rl.policy import LinearAnnealedPolicy, EpsGreedyQPolicy
from rl.memory import SequentialMemory  # Sequential Memory for storing observations ( optimized circular buffer)


**TASK: Create the environment** <br />
The name is: *Acrobot-v1*

In [2]:
env_name = ENV_NAME = 'Acrobot-v1'
env = gym.make(env_name)

In [3]:
num_actions = env.action_space.n
num_observations = env.observation_space.shape
print(f"Action Space: {env.action_space.n}")
print(f"Observation Space: {num_observations}")

assert num_actions == 3 and num_observations == (6,) , "Wrong environment!"

Action Space: 3
Observation Space: (6,)


**TASK: Create the Neural Network for your Deep-Q-Agent** <br />
Take a look at the size of the action space and the size of the observation space.
You are free to chose any architecture you want! <br />
Hint: It already works with three layers, each having 64 neurons.

In [4]:
model = Sequential()
model.add(Flatten(input_shape=(1,) + num_observations))

model.add(Dense(64))
model.add(Activation('relu'))

model.add(Dense(64))
model.add(Activation('relu'))

model.add(Dense(64))
model.add(Activation('relu'))


model.add(Dense(num_actions))
model.add(Activation('linear'))

print(model.summary())

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten (Flatten)           (None, 6)                 0         
                                                                 
 dense (Dense)               (None, 64)                448       
                                                                 
 activation (Activation)     (None, 64)                0         
                                                                 
 dense_1 (Dense)             (None, 64)                4160      
                                                                 
 activation_1 (Activation)   (None, 64)                0         
                                                                 
 dense_2 (Dense)             (None, 64)                4160      
                                                                 
 activation_2 (Activation)   (None, 64)                0

**TASK: Initialize the circular buffer**<br />
Make sure you set the limit appropriately (50000 works well)

In [5]:
memory = SequentialMemory(limit=50000, window_length=1)


**TASK: Use the epsilon greedy action selection strategy with *decaying* epsilon**

In [6]:
policy = LinearAnnealedPolicy(EpsGreedyQPolicy(), 
                              attr='eps',
                              value_max=1.,
                              value_min=.1,
                              value_test=.05,
                              nb_steps=150000) 


**TASK: Create the DQNAgent** <br />
Feel free to play with the nb_steps_warump, target_model_update, batch_size and gamma parameters. <br />
Hint:<br />
You can try *nb_steps_warmup*=1000, *target_model_update*=1000, *batch_size*=32 and *gamma*=0.99 as a first guess

In [7]:
dqn = DQNAgent(model=model, nb_actions=num_actions, memory=memory, nb_steps_warmup=1000,
               target_model_update=1000, policy=policy, batch_size=32, gamma=0.99)

 # Use the Adam optimizer with a learning rate of 1e-3 and log the mean absolute error


**TASK: Compile the model** <br />
Feel free to explore the effects of different optimizers and learning rates.
You can try Adam with a learning rate of 1e-3 as a first guess 

In [11]:
dqn.compile(Adam(learning_rate=1e-3), metrics=['mae']) 

**TASK: Fit the model** <br />
150,000 steps should be a very good starting point

In [12]:
dqn.fit(env, nb_steps=150000, visualize=False, verbose=2)

Training for 150000 steps ...
    500/150000: episode: 1, duration: 0.470s, episode steps: 500, steps per second: 1064, episode reward: -500.000, mean reward: -1.000 [-1.000, -1.000], mean action: 0.976 [0.000, 2.000],  loss: --, mae: --, mean_q: --, mean_eps: --
   1000/150000: episode: 2, duration: 0.358s, episode steps: 500, steps per second: 1395, episode reward: -500.000, mean reward: -1.000 [-1.000, -1.000], mean action: 0.990 [0.000, 2.000],  loss: --, mae: --, mean_q: --, mean_eps: --


  updates=self.state_updates,


AttributeError: 'Adam' object has no attribute 'get_updates'

**TASK: Evaluate the model**

In [13]:
dqn.test(env, nb_episodes=5, visualize=True)
env.close()

Testing for 5 episodes ...
Episode 1: reward: -500.000, steps: 500
Episode 2: reward: -500.000, steps: 500
Episode 3: reward: -500.000, steps: 500
Episode 4: reward: -500.000, steps: 500
Episode 5: reward: -500.000, steps: 500
