# Keras-RL DQN Exercise


In this exercise you are going to implement your first keras-rl agent based on the **Acrobot** environment (https://gym.openai.com/envs/Acrobot-v1/) <br />
The goal of this environment is to maneuver the robot arm upwards above the line with as little steps as possible

In [1]:
import time 
import gym

from tensorflow.keras.models import Sequential  
from tensorflow.keras.layers import Dense, Activation, InputLayer
from tensorflow.keras.layers import Flatten
from tensorflow.keras.optimizers import Adam  

from rl.agents.dqn import DQNAgent
from rl.memory import SequentialMemory  
from rl.policy import LinearAnnealedPolicy, EpsGreedyQPolicy


**TASK: Create the environment** <br />
The name is: *Acrobot-v1*

In [2]:
def recall():
    env = gym.make('Acrobot-v1')
    return env

env = recall()
env.reset()

for _ in range(300):
    env.render(mode="human")  
    random_action = env.action_space.sample()
    env.step(random_action)

env.close()

In [3]:
num_actions = env.action_space.n
num_observations = env.observation_space.shape[0]
print(f"Action Space: {env.action_space.n}")
print(f"Observation Space: {num_observations}")

assert num_actions == 3 and num_observations == 6 , "Wrong environment!"

Action Space: 3
Observation Space: 6


**TASK: Create the Neural Network for your Deep-Q-Agent** <br />
Take a look at the size of the action space and the size of the observation space.
You are free to chose any architecture you want! <br />
Hint: It already works with three layers, each having 64 neurons.

In [4]:
model = Sequential()
model.add(InputLayer(input_shape=(1, num_observations)))
model.add(Flatten())

model.add(Dense(16))
model.add(Activation('relu'))

model.add(Dense(32))
model.add(Activation('relu'))

model.add(Dense(64))
model.add(Activation('relu'))

model.add(Dense(64))
model.add(Activation('relu'))

model.add(Dense(64))
model.add(Activation('relu'))

model.add(Dense(32))
model.add(Activation('relu'))

model.add(Dense(16))
model.add(Activation('relu'))

model.add(Dense(num_actions))
model.add(Activation('linear'))

print(model.summary())

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten (Flatten)           (None, 6)                 0         
                                                                 
 dense (Dense)               (None, 16)                112       
                                                                 
 activation (Activation)     (None, 16)                0         
                                                                 
 dense_1 (Dense)             (None, 32)                544       
                                                                 
 activation_1 (Activation)   (None, 32)                0         
                                                                 
 dense_2 (Dense)             (None, 64)                2112      
                                                                 
 activation_2 (Activation)   (None, 64)                0

**TASK: Initialize the circular buffer**<br />
Make sure you set the limit appropriately (50000 works well)

In [5]:
memory = SequentialMemory(limit=50000, window_length=1)

**TASK: Use the epsilon greedy action selection strategy with *decaying* epsilon**

In [6]:
policy = LinearAnnealedPolicy(EpsGreedyQPolicy(), # inner policy
                              attr='eps', # attribute 
                              value_max=1.0, # max value of the attribute
                              value_min=0.1, # min value of the attribute 
                              value_test=0.05, # small value to test the model --> explotation
                              nb_steps=50000) 

**TASK: Create the DQNAgent** <br />
Feel free to play with the nb_steps_warump, target_model_update, batch_size and gamma parameters. <br />
Hint:<br />
You can try *nb_steps_warmup*=1000, *target_model_update*=1000, *batch_size*=32 and *gamma*=0.99 as a first guess

In [7]:
dqn = DQNAgent(model=model, 
               nb_actions=num_actions, 
               memory=memory, 
               nb_steps_warmup=1000,
               target_model_update=1000, 
               policy=policy,
               gamma=0.99,
               batch_size=32)

**TASK: Compile the model** <br />
Feel free to explore the effects of different optimizers and learning rates.
You can try Adam with a learning rate of 1e-3 as a first guess 

In [8]:
dqn.compile(Adam(learning_rate=0.001), metrics=['mae']) 

**TASK: Fit the model** <br />
150,000 steps should be a very good starting point

In [9]:
dqn.fit(env, nb_steps=150000, visualize=False, verbose=0)

  updates=self.state_updates,


<keras.callbacks.History at 0x1652ad499a0>

**TASK: Evaluate the model**

In [11]:
dqn.test(env, nb_episodes=5, visualize=True)
env.close()

Testing for 5 episodes ...
Episode 1: reward: -94.000, steps: 95
Episode 2: reward: -83.000, steps: 84
Episode 3: reward: -95.000, steps: 96
Episode 4: reward: -69.000, steps: 70
Episode 5: reward: -84.000, steps: 85
