## Apress - Industrialized Machine Learning Examples

Andreas Francois Vermeulen
2019

### This is an example add-on to a book and needs to be accepted as part of that copyright.

## Chapter-009-015-Q-Learn-01

### Install keras-rl library

In [None]:
#!pip install keras-rl

In [None]:
#!pip install pyglet

### Install h5py

In [None]:
#!pip install h5py

 ### Install dependencies for CartPole environment

In [None]:
#!pip install gym

# You are ready to perform the Q-Learning

In [None]:
%matplotlib inline
import numpy as np
import gym

from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten
from keras.optimizers import Adam

from rl.agents.dqn import DQNAgent
from rl.policy import EpsGreedyQPolicy
from rl.memory import SequentialMemory

You need to set several variables

In [None]:
ENV_NAME = 'CartPole-v0'

Get the environment and extract the number of actions available in the Cartpole problem

In [None]:
env = gym.make(ENV_NAME)
np.random.seed(20)
env.seed(20)
nb_actions = env.action_space.n

Create a single hidden layer neural network model

In [None]:
model = Sequential()
model.add(Flatten(input_shape=(1,) + env.observation_space.shape))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(nb_actions))
model.add(Activation('linear'))

In [None]:
print(model.summary())

Next you configure and compile our agent. Suggest you use the policy as Epsilon Greedy and you set the memory as Sequential Memory because you must to store the result of actions you Cart performed and the rewards it gets for each action.

In [None]:
policy = EpsGreedyQPolicy()

memory = SequentialMemory(limit=50000, 
                          window_length=1
                         )

dqn = DQNAgent(model=model, 
               nb_actions=nb_actions, 
               memory=memory, 
               nb_steps_warmup=1000, 
               target_model_update=1e-2, 
               policy=policy,
               enable_dueling_network=False,
               dueling_type='avg'
              )

dqn.compile(Adam(lr=1e-3), 
            metrics=['mae']
           )

Time to perform the training process.

In [None]:
try:
  dqn.fit(env, nb_steps=5000, visualize=True, verbose=2)
except:
  dqn.fit(env, nb_steps=5000, visualize=False, verbose=2)

In [None]:
try:
  dqn.test(env, nb_episodes=5, visualize=True, verbose=2)
except:
  dqn.test(env, nb_episodes=5, visualize=False, verbose=2)

## Done

In [None]:
import datetime
now = datetime.datetime.now()
print('Done!',str(now))

Your can now test the reinforcement learning model