# Pipple Lecture #12 - Reinforcement Learning
Now, you have seen quite some information relating to Reinforcement Learning. In this notebook, you will have the chance to program your own Deep Reinforcement Learning model. At least... tune its parameters. The programming of the game-environment, state-transitions, reward-calculations and training of the model has already been prepared for you. It is your job to focus on one task and one task only: keep your pole straight up!

During the lecture, we have not been able to discuss all elements of a DRL-model, as there are many aspects which can be tuned to perfection (or far from it). Some additional explanation will be given in the notebook where deemed necessary, but don't be shy to ask more!

## 1. Importing relevant modules
Let's get started. First, import necessary modules (and suppress some unwanted warnings). The 'gym' package is imported to be able to create a Cart Pole environment for you to play with. Further on, 'keras' enables the usage of a neural network, while 'keras-rl' contains a whole bunch of interesting Reinforcement Learning functions.

In [1]:
import numpy as np
import gym
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=UserWarning)

from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten
from keras.optimizers import Adam

from rl.agents.dqn import DQNAgent
from rl.policy import EpsGreedyQPolicy
from rl.memory import SequentialMemory

Using TensorFlow backend.


## 2. Setting variables
Then, set the relevant variables. Get the environment and extract the number of actions available in the Cartpole problem. The seed settings can be useful to compare your results over different runs. However, both a neural network as the RL framework itself still contain a high level of randomization, which may make comparison of distinct runs difficult. Keep this in mind when trying different parameter settings

In [2]:
env = gym.make('CartPole-v0')
np.random.seed(123)
env.seed(123)
nb_actions = env.action_space.n

Next, build a neural network model. Initially, it is set to a simple feed-forward neural net, with a single hidden layer and 4 nodes. Try different settings by yourself, to find your optimal set-up! Unfortunately, until the day of today, there are no clear rules for choosing how many layers or nodes to use. Google may give you some idea, but most decisions still follow the famous method of trial-and-error.

Try tuning the number of hidden layers, the number of nodes per hidden layer, and the type of activation functions in the hidden and output layers. Use the 'print(model.summary())' to get an overview of the complexity of your model.

In [3]:
model = Sequential()
model.add(Flatten(input_shape=(1,) + env.observation_space.shape))
model.add(Dense(4))
model.add(Activation('relu'))
#model.add(Dense(4))
#model.add(Activation('relu'))
model.add(Dense(nb_actions))
model.add(Activation('linear'))
print(model.summary())

Instructions for updating:
Colocations handled automatically by placer.
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten_1 (Flatten)          (None, 4)                 0         
_________________________________________________________________
dense_1 (Dense)              (None, 4)                 20        
_________________________________________________________________
activation_1 (Activation)    (None, 4)                 0         
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 10        
_________________________________________________________________
activation_2 (Activation)    (None, 2)                 0         
Total params: 30
Trainable params: 30
Non-trainable params: 0
_________________________________________________________________
None


Now, configure and compile your agent. The memory is set to Sequential Memory, storing the result of performed actions and obtained rewards. Try using different types of action-selection policies, memory sizes, learning rates, training steps, or w/e you can think of. Settings you can tune:

* **policy**: the way in which actions are selected over time, following some balancing method. This RL-concept is very important, incorporating a trade-off between exploring unknown parts of the environment, and exploiting known information. (possible policies: EpsGreedyQPolicy, LinearAnnealedPolicy, SoftmaxPolicy, GreedyQPolicy, BoltzmannQPolicy, MaxBoltzmannQPolicy, BoltzmannGumbelQPolicy)
* **memory limit**: the number of previous actions+rewards that are taken into account while learning, at a certain moment in time.
* **window_length**: actually not sure... just keep it at 1 to avoid errors ;)
* **target_model_update**: in theory denoted by $\alpha$, the network's learning rate. It determines how quickly the algorithm wants to converge to found target values (such as Q-values).

In [4]:
policy = EpsGreedyQPolicy()
memory = SequentialMemory(limit=10000, window_length=1)
dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, target_model_update=1e-2, policy=policy)
dqn.compile(Adam(lr=1e-3), metrics=['mae'])

Now it's time to learn something! There are four settings you can consider changing, however, only one which has an effect on your training performance:

* **nb_steps**: the larger, the more time your bot gets for trying to find a good strategy, but the longer you'll have to wait.
* **verbose**: printing running status. 0 for no logging, 1 for interval logging, 2 for episode logging
* **visualize**: you can visualize the training for show, but this mostly slows down training
* **log_interval**: if verbose=1, the number of steps that are considered to be an interval

In [None]:
dqn.fit(env, nb_steps=50000, verbose=1, visualize=False, log_interval=10)

Training for 50000 steps ...
Interval 1 (0 steps performed)
Interval 2 (10 steps performed)
Interval 3 (20 steps performed)
Interval 4 (30 steps performed)
Interval 5 (40 steps performed)
Interval 6 (50 steps performed)
Interval 7 (60 steps performed)
Interval 8 (70 steps performed)
Interval 9 (80 steps performed)
Interval 10 (90 steps performed)
Interval 11 (100 steps performed)
Interval 12 (110 steps performed)
Interval 13 (120 steps performed)
Interval 14 (130 steps performed)
Interval 15 (140 steps performed)
Interval 16 (150 steps performed)
Interval 17 (160 steps performed)
Interval 18 (170 steps performed)
Interval 19 (180 steps performed)
1 episodes - episode_reward: 184.000 [184.000, 184.000]

Interval 20 (190 steps performed)
Interval 21 (200 steps performed)
Interval 22 (210 steps performed)
Interval 23 (220 steps performed)
Interval 24 (230 steps performed)
Interval 25 (240 steps performed)
Interval 26 (250 steps performed)
Interval 27 (260 steps performed)
Interval 28 (270

Interval 78 (770 steps performed)
1 episodes - episode_reward: 200.000 [200.000, 200.000]

Interval 79 (780 steps performed)
Interval 80 (790 steps performed)
Interval 81 (800 steps performed)
Interval 82 (810 steps performed)
Interval 83 (820 steps performed)
Interval 84 (830 steps performed)
Interval 85 (840 steps performed)
Interval 86 (850 steps performed)
Interval 87 (860 steps performed)
Interval 88 (870 steps performed)
Interval 89 (880 steps performed)
Interval 90 (890 steps performed)
Interval 91 (900 steps performed)
Interval 92 (910 steps performed)
Interval 93 (920 steps performed)
Interval 94 (930 steps performed)
Interval 95 (940 steps performed)
Interval 96 (950 steps performed)
Interval 97 (960 steps performed)
1 episodes - episode_reward: 185.000 [185.000, 185.000]

Interval 98 (970 steps performed)
Interval 99 (980 steps performed)
Interval 100 (990 steps performed)
Interval 101 (1000 steps performed)
Interval 102 (1010 steps performed)
Interval 103 (1020 steps perfor

Interval 147 (1460 steps performed)
Interval 148 (1470 steps performed)
Interval 149 (1480 steps performed)
Interval 150 (1490 steps performed)
1 episodes - episode_reward: 48.000 [48.000, 48.000] - loss: 0.483 - mean_absolute_error: 11.404 - mean_q: 22.781

Interval 151 (1500 steps performed)
Interval 152 (1510 steps performed)
Interval 153 (1520 steps performed)
Interval 154 (1530 steps performed)
Interval 155 (1540 steps performed)
Interval 156 (1550 steps performed)
Interval 157 (1560 steps performed)
1 episodes - episode_reward: 69.000 [69.000, 69.000] - loss: 1.960 - mean_absolute_error: 11.267 - mean_q: 22.466

Interval 158 (1570 steps performed)
Interval 159 (1580 steps performed)
Interval 160 (1590 steps performed)
Interval 161 (1600 steps performed)
Interval 162 (1610 steps performed)
Interval 163 (1620 steps performed)
Interval 164 (1630 steps performed)
Interval 165 (1640 steps performed)
Interval 166 (1650 steps performed)
Interval 167 (1660 steps performed)
Interval 168 (

Interval 215 (2140 steps performed)
Interval 216 (2150 steps performed)
1 episodes - episode_reward: 63.000 [63.000, 63.000] - loss: 0.352 - mean_absolute_error: 12.123 - mean_q: 24.339

Interval 217 (2160 steps performed)
Interval 218 (2170 steps performed)
Interval 219 (2180 steps performed)
Interval 220 (2190 steps performed)
Interval 221 (2200 steps performed)
Interval 222 (2210 steps performed)
Interval 223 (2220 steps performed)
Interval 224 (2230 steps performed)
Interval 225 (2240 steps performed)
Interval 226 (2250 steps performed)
1 episodes - episode_reward: 100.000 [100.000, 100.000] - loss: 1.553 - mean_absolute_error: 12.138 - mean_q: 24.387

Interval 227 (2260 steps performed)
Interval 228 (2270 steps performed)
Interval 229 (2280 steps performed)
Interval 230 (2290 steps performed)
Interval 231 (2300 steps performed)
Interval 232 (2310 steps performed)
Interval 233 (2320 steps performed)
Interval 234 (2330 steps performed)
Interval 235 (2340 steps performed)
Interval 23

Interval 282 (2810 steps performed)
Interval 283 (2820 steps performed)
Interval 284 (2830 steps performed)
Interval 285 (2840 steps performed)
Interval 286 (2850 steps performed)
Interval 287 (2860 steps performed)
1 episodes - episode_reward: 75.000 [75.000, 75.000] - loss: 0.395 - mean_absolute_error: 13.288 - mean_q: 26.713

Interval 288 (2870 steps performed)
Interval 289 (2880 steps performed)
Interval 290 (2890 steps performed)
Interval 291 (2900 steps performed)
Interval 292 (2910 steps performed)
Interval 293 (2920 steps performed)
Interval 294 (2930 steps performed)
Interval 295 (2940 steps performed)
Interval 296 (2950 steps performed)
Interval 297 (2960 steps performed)
Interval 298 (2970 steps performed)
Interval 299 (2980 steps performed)
1 episodes - episode_reward: 116.000 [116.000, 116.000] - loss: 1.671 - mean_absolute_error: 13.548 - mean_q: 27.159

Interval 300 (2990 steps performed)
Interval 301 (3000 steps performed)
Interval 302 (3010 steps performed)
Interval 30

Interval 351 (3500 steps performed)
Interval 352 (3510 steps performed)
Interval 353 (3520 steps performed)
Interval 354 (3530 steps performed)
1 episodes - episode_reward: 87.000 [87.000, 87.000] - loss: 1.720 - mean_absolute_error: 14.417 - mean_q: 28.944

Interval 355 (3540 steps performed)
Interval 356 (3550 steps performed)
Interval 357 (3560 steps performed)
Interval 358 (3570 steps performed)
Interval 359 (3580 steps performed)
Interval 360 (3590 steps performed)
Interval 361 (3600 steps performed)
Interval 362 (3610 steps performed)
Interval 363 (3620 steps performed)
Interval 364 (3630 steps performed)
1 episodes - episode_reward: 99.000 [99.000, 99.000] - loss: 4.915 - mean_absolute_error: 15.120 - mean_q: 30.194

Interval 365 (3640 steps performed)
Interval 366 (3650 steps performed)
Interval 367 (3660 steps performed)
Interval 368 (3670 steps performed)
Interval 369 (3680 steps performed)
Interval 370 (3690 steps performed)
Interval 371 (3700 steps performed)
Interval 372 (

Interval 422 (4210 steps performed)
Interval 423 (4220 steps performed)
Interval 424 (4230 steps performed)
Interval 425 (4240 steps performed)
Interval 426 (4250 steps performed)
Interval 427 (4260 steps performed)
Interval 428 (4270 steps performed)
Interval 429 (4280 steps performed)
Interval 430 (4290 steps performed)
Interval 431 (4300 steps performed)
Interval 432 (4310 steps performed)
1 episodes - episode_reward: 124.000 [124.000, 124.000] - loss: 1.779 - mean_absolute_error: 15.738 - mean_q: 31.706

Interval 433 (4320 steps performed)
Interval 434 (4330 steps performed)
Interval 435 (4340 steps performed)
Interval 436 (4350 steps performed)
Interval 437 (4360 steps performed)
Interval 438 (4370 steps performed)
Interval 439 (4380 steps performed)
Interval 440 (4390 steps performed)
Interval 441 (4400 steps performed)
Interval 442 (4410 steps performed)
Interval 443 (4420 steps performed)
Interval 444 (4430 steps performed)
1 episodes - episode_reward: 116.000 [116.000, 116.000

Interval 492 (4910 steps performed)
Interval 493 (4920 steps performed)
Interval 494 (4930 steps performed)
Interval 495 (4940 steps performed)
Interval 496 (4950 steps performed)
Interval 497 (4960 steps performed)
Interval 498 (4970 steps performed)
Interval 499 (4980 steps performed)
Interval 500 (4990 steps performed)
Interval 501 (5000 steps performed)
Interval 502 (5010 steps performed)
1 episodes - episode_reward: 122.000 [122.000, 122.000] - loss: 0.681 - mean_absolute_error: 17.163 - mean_q: 34.735

Interval 503 (5020 steps performed)
Interval 504 (5030 steps performed)
Interval 505 (5040 steps performed)
Interval 506 (5050 steps performed)
Interval 507 (5060 steps performed)
Interval 508 (5070 steps performed)
Interval 509 (5080 steps performed)
Interval 510 (5090 steps performed)
Interval 511 (5100 steps performed)
Interval 512 (5110 steps performed)
Interval 513 (5120 steps performed)
1 episodes - episode_reward: 107.000 [107.000, 107.000] - loss: 3.743 - mean_absolute_erro

Interval 561 (5600 steps performed)
1 episodes - episode_reward: 111.000 [111.000, 111.000] - loss: 1.991 - mean_absolute_error: 18.202 - mean_q: 36.534

Interval 562 (5610 steps performed)
Interval 563 (5620 steps performed)
Interval 564 (5630 steps performed)
Interval 565 (5640 steps performed)
Interval 566 (5650 steps performed)
Interval 567 (5660 steps performed)
Interval 568 (5670 steps performed)
Interval 569 (5680 steps performed)
Interval 570 (5690 steps performed)
Interval 571 (5700 steps performed)
1 episodes - episode_reward: 104.000 [104.000, 104.000] - loss: 0.640 - mean_absolute_error: 19.070 - mean_q: 38.258

Interval 572 (5710 steps performed)
Interval 573 (5720 steps performed)
Interval 574 (5730 steps performed)
1 episodes - episode_reward: 25.000 [25.000, 25.000] - loss: 1.519 - mean_absolute_error: 18.050 - mean_q: 36.309

Interval 575 (5740 steps performed)
Interval 576 (5750 steps performed)
Interval 577 (5760 steps performed)
Interval 578 (5770 steps performed)
I

Interval 629 (6280 steps performed)
1 episodes - episode_reward: 104.000 [104.000, 104.000] - loss: 1.048 - mean_absolute_error: 19.215 - mean_q: 38.762

Interval 630 (6290 steps performed)
Interval 631 (6300 steps performed)
1 episodes - episode_reward: 18.000 [18.000, 18.000] - loss: 2.036 - mean_absolute_error: 19.517 - mean_q: 39.140

Interval 632 (6310 steps performed)
Interval 633 (6320 steps performed)
Interval 634 (6330 steps performed)
Interval 635 (6340 steps performed)
Interval 636 (6350 steps performed)
Interval 637 (6360 steps performed)
Interval 638 (6370 steps performed)
Interval 639 (6380 steps performed)
Interval 640 (6390 steps performed)
Interval 641 (6400 steps performed)
1 episodes - episode_reward: 97.000 [97.000, 97.000] - loss: 7.107 - mean_absolute_error: 20.269 - mean_q: 40.477

Interval 642 (6410 steps performed)
1 episodes - episode_reward: 15.000 [15.000, 15.000] - loss: 0.984 - mean_absolute_error: 20.053 - mean_q: 40.333

Interval 643 (6420 steps performe

Interval 688 (6870 steps performed)
Interval 689 (6880 steps performed)
Interval 690 (6890 steps performed)
Interval 691 (6900 steps performed)
Interval 692 (6910 steps performed)
1 episodes - episode_reward: 102.000 [102.000, 102.000] - loss: 7.623 - mean_absolute_error: 20.294 - mean_q: 40.424

Interval 693 (6920 steps performed)
Interval 694 (6930 steps performed)
1 episodes - episode_reward: 20.000 [20.000, 20.000] - loss: 26.839 - mean_absolute_error: 21.008 - mean_q: 41.010

Interval 695 (6940 steps performed)
Interval 696 (6950 steps performed)
Interval 697 (6960 steps performed)
1 episodes - episode_reward: 24.000 [24.000, 24.000] - loss: 3.063 - mean_absolute_error: 21.386 - mean_q: 42.948

Interval 698 (6970 steps performed)
1 episodes - episode_reward: 16.000 [16.000, 16.000] - loss: 2.824 - mean_absolute_error: 20.188 - mean_q: 40.406

Interval 699 (6980 steps performed)
Interval 700 (6990 steps performed)
1 episodes - episode_reward: 16.000 [16.000, 16.000] - loss: 4.815 -

Interval 751 (7500 steps performed)
1 episodes - episode_reward: 108.000 [108.000, 108.000] - loss: 10.385 - mean_absolute_error: 21.041 - mean_q: 41.793

Interval 752 (7510 steps performed)
Interval 753 (7520 steps performed)
Interval 754 (7530 steps performed)
Interval 755 (7540 steps performed)
Interval 756 (7550 steps performed)
Interval 757 (7560 steps performed)
Interval 758 (7570 steps performed)
Interval 759 (7580 steps performed)
Interval 760 (7590 steps performed)
Interval 761 (7600 steps performed)
Interval 762 (7610 steps performed)
1 episodes - episode_reward: 111.000 [111.000, 111.000] - loss: 7.263 - mean_absolute_error: 21.537 - mean_q: 42.824

Interval 763 (7620 steps performed)
Interval 764 (7630 steps performed)
1 episodes - episode_reward: 23.000 [23.000, 23.000] - loss: 10.456 - mean_absolute_error: 21.426 - mean_q: 42.480

Interval 765 (7640 steps performed)
Interval 766 (7650 steps performed)
Interval 767 (7660 steps performed)
Interval 768 (7670 steps performed)

Interval 818 (8170 steps performed)
Interval 819 (8180 steps performed)
Interval 820 (8190 steps performed)
Interval 821 (8200 steps performed)
Interval 822 (8210 steps performed)
1 episodes - episode_reward: 114.000 [114.000, 114.000] - loss: 9.700 - mean_absolute_error: 22.085 - mean_q: 43.736

Interval 823 (8220 steps performed)
Interval 824 (8230 steps performed)
Interval 825 (8240 steps performed)
Interval 826 (8250 steps performed)
Interval 827 (8260 steps performed)
Interval 828 (8270 steps performed)
Interval 829 (8280 steps performed)
Interval 830 (8290 steps performed)
Interval 831 (8300 steps performed)
Interval 832 (8310 steps performed)
Interval 833 (8320 steps performed)
Interval 834 (8330 steps performed)
1 episodes - episode_reward: 114.000 [114.000, 114.000] - loss: 0.524 - mean_absolute_error: 21.940 - mean_q: 44.005

Interval 835 (8340 steps performed)
Interval 836 (8350 steps performed)
Interval 837 (8360 steps performed)
Interval 838 (8370 steps performed)
Interval

Interval 888 (8870 steps performed)
Interval 889 (8880 steps performed)
Interval 890 (8890 steps performed)
Interval 891 (8900 steps performed)
1 episodes - episode_reward: 122.000 [122.000, 122.000] - loss: 6.019 - mean_absolute_error: 21.665 - mean_q: 43.219

Interval 892 (8910 steps performed)
Interval 893 (8920 steps performed)
Interval 894 (8930 steps performed)
Interval 895 (8940 steps performed)
Interval 896 (8950 steps performed)
Interval 897 (8960 steps performed)
Interval 898 (8970 steps performed)
Interval 899 (8980 steps performed)
Interval 900 (8990 steps performed)
Interval 901 (9000 steps performed)
Interval 902 (9010 steps performed)
Interval 903 (9020 steps performed)
1 episodes - episode_reward: 122.000 [122.000, 122.000] - loss: 10.963 - mean_absolute_error: 22.241 - mean_q: 44.030

Interval 904 (9030 steps performed)
Interval 905 (9040 steps performed)
Interval 906 (9050 steps performed)
Interval 907 (9060 steps performed)
Interval 908 (9070 steps performed)
Interva

Interval 958 (9570 steps performed)
Interval 959 (9580 steps performed)
Interval 960 (9590 steps performed)
Interval 961 (9600 steps performed)
Interval 962 (9610 steps performed)
Interval 963 (9620 steps performed)
Interval 964 (9630 steps performed)
Interval 965 (9640 steps performed)
Interval 966 (9650 steps performed)
Interval 967 (9660 steps performed)
1 episodes - episode_reward: 133.000 [133.000, 133.000] - loss: 19.191 - mean_absolute_error: 21.940 - mean_q: 43.130

Interval 968 (9670 steps performed)
Interval 969 (9680 steps performed)
Interval 970 (9690 steps performed)
Interval 971 (9700 steps performed)
Interval 972 (9710 steps performed)
Interval 973 (9720 steps performed)
Interval 974 (9730 steps performed)
Interval 975 (9740 steps performed)
Interval 976 (9750 steps performed)
Interval 977 (9760 steps performed)
Interval 978 (9770 steps performed)
Interval 979 (9780 steps performed)
Interval 980 (9790 steps performed)
1 episodes - episode_reward: 129.000 [129.000, 129.00

Interval 1030 (10290 steps performed)
1 episodes - episode_reward: 200.000 [200.000, 200.000] - loss: 1.205 - mean_absolute_error: 20.492 - mean_q: 41.006

Interval 1031 (10300 steps performed)
Interval 1032 (10310 steps performed)
Interval 1033 (10320 steps performed)
Interval 1034 (10330 steps performed)
Interval 1035 (10340 steps performed)
Interval 1036 (10350 steps performed)
Interval 1037 (10360 steps performed)
Interval 1038 (10370 steps performed)
Interval 1039 (10380 steps performed)
Interval 1040 (10390 steps performed)
Interval 1041 (10400 steps performed)
Interval 1042 (10410 steps performed)
Interval 1043 (10420 steps performed)
Interval 1044 (10430 steps performed)
Interval 1045 (10440 steps performed)
Interval 1046 (10450 steps performed)
Interval 1047 (10460 steps performed)
Interval 1048 (10470 steps performed)
Interval 1049 (10480 steps performed)
Interval 1050 (10490 steps performed)
1 episodes - episode_reward: 200.000 [200.000, 200.000] - loss: 1.722 - mean_absolut

Interval 1101 (11000 steps performed)
Interval 1102 (11010 steps performed)
Interval 1103 (11020 steps performed)
Interval 1104 (11030 steps performed)
Interval 1105 (11040 steps performed)
Interval 1106 (11050 steps performed)
Interval 1107 (11060 steps performed)
Interval 1108 (11070 steps performed)
Interval 1109 (11080 steps performed)
Interval 1110 (11090 steps performed)
1 episodes - episode_reward: 195.000 [195.000, 195.000] - loss: 1.180 - mean_absolute_error: 22.900 - mean_q: 46.056

Interval 1111 (11100 steps performed)
Interval 1112 (11110 steps performed)
Interval 1113 (11120 steps performed)
Interval 1114 (11130 steps performed)
Interval 1115 (11140 steps performed)
Interval 1116 (11150 steps performed)
Interval 1117 (11160 steps performed)
Interval 1118 (11170 steps performed)
Interval 1119 (11180 steps performed)
Interval 1120 (11190 steps performed)
Interval 1121 (11200 steps performed)
Interval 1122 (11210 steps performed)
Interval 1123 (11220 steps performed)
Interval

Interval 1172 (11710 steps performed)
Interval 1173 (11720 steps performed)
Interval 1174 (11730 steps performed)
Interval 1175 (11740 steps performed)
Interval 1176 (11750 steps performed)
Interval 1177 (11760 steps performed)
Interval 1178 (11770 steps performed)
Interval 1179 (11780 steps performed)
Interval 1180 (11790 steps performed)
Interval 1181 (11800 steps performed)
1 episodes - episode_reward: 163.000 [163.000, 163.000] - loss: 0.232 - mean_absolute_error: 25.176 - mean_q: 50.711

Interval 1182 (11810 steps performed)
Interval 1183 (11820 steps performed)
Interval 1184 (11830 steps performed)
Interval 1185 (11840 steps performed)
Interval 1186 (11850 steps performed)
Interval 1187 (11860 steps performed)
Interval 1188 (11870 steps performed)
Interval 1189 (11880 steps performed)
Interval 1190 (11890 steps performed)
Interval 1191 (11900 steps performed)
Interval 1192 (11910 steps performed)
Interval 1193 (11920 steps performed)
Interval 1194 (11930 steps performed)
Interval

Interval 1243 (12420 steps performed)
Interval 1244 (12430 steps performed)
Interval 1245 (12440 steps performed)
Interval 1246 (12450 steps performed)
Interval 1247 (12460 steps performed)
Interval 1248 (12470 steps performed)
Interval 1249 (12480 steps performed)
Interval 1250 (12490 steps performed)
Interval 1251 (12500 steps performed)
Interval 1252 (12510 steps performed)
Interval 1253 (12520 steps performed)
Interval 1254 (12530 steps performed)
Interval 1255 (12540 steps performed)
Interval 1256 (12550 steps performed)
Interval 1257 (12560 steps performed)
Interval 1258 (12570 steps performed)
Interval 1259 (12580 steps performed)
1 episodes - episode_reward: 183.000 [183.000, 183.000] - loss: 1.161 - mean_absolute_error: 25.504 - mean_q: 51.176

Interval 1260 (12590 steps performed)
Interval 1261 (12600 steps performed)
Interval 1262 (12610 steps performed)
Interval 1263 (12620 steps performed)
Interval 1264 (12630 steps performed)
Interval 1265 (12640 steps performed)
Interval

Interval 1315 (13140 steps performed)
Interval 1316 (13150 steps performed)
Interval 1317 (13160 steps performed)
Interval 1318 (13170 steps performed)
Interval 1319 (13180 steps performed)
1 episodes - episode_reward: 200.000 [200.000, 200.000] - loss: 12.193 - mean_absolute_error: 26.612 - mean_q: 52.825

Interval 1320 (13190 steps performed)
Interval 1321 (13200 steps performed)
Interval 1322 (13210 steps performed)
Interval 1323 (13220 steps performed)
Interval 1324 (13230 steps performed)
Interval 1325 (13240 steps performed)
Interval 1326 (13250 steps performed)
Interval 1327 (13260 steps performed)
Interval 1328 (13270 steps performed)
Interval 1329 (13280 steps performed)
Interval 1330 (13290 steps performed)
Interval 1331 (13300 steps performed)
Interval 1332 (13310 steps performed)
Interval 1333 (13320 steps performed)
Interval 1334 (13330 steps performed)
Interval 1335 (13340 steps performed)
Interval 1336 (13350 steps performed)
Interval 1337 (13360 steps performed)
Interva

Interval 1386 (13850 steps performed)
Interval 1387 (13860 steps performed)
Interval 1388 (13870 steps performed)
Interval 1389 (13880 steps performed)
Interval 1390 (13890 steps performed)
Interval 1391 (13900 steps performed)
Interval 1392 (13910 steps performed)
Interval 1393 (13920 steps performed)
Interval 1394 (13930 steps performed)
Interval 1395 (13940 steps performed)
Interval 1396 (13950 steps performed)
Interval 1397 (13960 steps performed)
Interval 1398 (13970 steps performed)
Interval 1399 (13980 steps performed)
1 episodes - episode_reward: 200.000 [200.000, 200.000] - loss: 6.144 - mean_absolute_error: 28.245 - mean_q: 56.082

Interval 1400 (13990 steps performed)
Interval 1401 (14000 steps performed)
Interval 1402 (14010 steps performed)
Interval 1403 (14020 steps performed)
Interval 1404 (14030 steps performed)
Interval 1405 (14040 steps performed)
Interval 1406 (14050 steps performed)
Interval 1407 (14060 steps performed)
Interval 1408 (14070 steps performed)
Interval

Interval 1457 (14560 steps performed)
Interval 1458 (14570 steps performed)
Interval 1459 (14580 steps performed)
Interval 1460 (14590 steps performed)
Interval 1461 (14600 steps performed)
Interval 1462 (14610 steps performed)
Interval 1463 (14620 steps performed)
Interval 1464 (14630 steps performed)
Interval 1465 (14640 steps performed)
Interval 1466 (14650 steps performed)
Interval 1467 (14660 steps performed)
Interval 1468 (14670 steps performed)
1 episodes - episode_reward: 128.000 [128.000, 128.000] - loss: 16.616 - mean_absolute_error: 29.620 - mean_q: 58.517

Interval 1469 (14680 steps performed)
Interval 1470 (14690 steps performed)
Interval 1471 (14700 steps performed)
Interval 1472 (14710 steps performed)
Interval 1473 (14720 steps performed)
Interval 1474 (14730 steps performed)
Interval 1475 (14740 steps performed)
Interval 1476 (14750 steps performed)
Interval 1477 (14760 steps performed)
Interval 1478 (14770 steps performed)
Interval 1479 (14780 steps performed)
1 episo

1 episodes - episode_reward: 157.000 [157.000, 157.000] - loss: 14.857 - mean_absolute_error: 29.412 - mean_q: 58.041

Interval 1528 (15270 steps performed)
Interval 1529 (15280 steps performed)
Interval 1530 (15290 steps performed)
Interval 1531 (15300 steps performed)
Interval 1532 (15310 steps performed)
Interval 1533 (15320 steps performed)
Interval 1534 (15330 steps performed)
Interval 1535 (15340 steps performed)
Interval 1536 (15350 steps performed)
Interval 1537 (15360 steps performed)
Interval 1538 (15370 steps performed)
Interval 1539 (15380 steps performed)
1 episodes - episode_reward: 123.000 [123.000, 123.000] - loss: 11.315 - mean_absolute_error: 29.289 - mean_q: 57.825

Interval 1540 (15390 steps performed)
Interval 1541 (15400 steps performed)
Interval 1542 (15410 steps performed)
Interval 1543 (15420 steps performed)
Interval 1544 (15430 steps performed)
Interval 1545 (15440 steps performed)
Interval 1546 (15450 steps performed)
Interval 1547 (15460 steps performed)
In

Interval 1598 (15970 steps performed)
Interval 1599 (15980 steps performed)
1 episodes - episode_reward: 156.000 [156.000, 156.000] - loss: 1.163 - mean_absolute_error: 29.163 - mean_q: 57.798

Interval 1600 (15990 steps performed)
Interval 1601 (16000 steps performed)
Interval 1602 (16010 steps performed)
Interval 1603 (16020 steps performed)
Interval 1604 (16030 steps performed)
Interval 1605 (16040 steps performed)
Interval 1606 (16050 steps performed)
Interval 1607 (16060 steps performed)
Interval 1608 (16070 steps performed)
Interval 1609 (16080 steps performed)
Interval 1610 (16090 steps performed)
Interval 1611 (16100 steps performed)
Interval 1612 (16110 steps performed)
Interval 1613 (16120 steps performed)
Interval 1614 (16130 steps performed)
Interval 1615 (16140 steps performed)
Interval 1616 (16150 steps performed)
1 episodes - episode_reward: 170.000 [170.000, 170.000] - loss: 9.818 - mean_absolute_error: 29.913 - mean_q: 59.022

Interval 1617 (16160 steps performed)
Inte

Run the below code to test your DRL model. The larger the reward and number of steps per episode, the better your model performs. Running about 10 episodes will give you a proper overall status.

**NOTE:** Don't close the graph after/while running it. This will reset the kernel and cause you having to re-run everything. You can simply re-run the below code instead, each time.

In [8]:
dqn.test(env, nb_episodes=10, visualize=True)

Testing for 10 episodes ...
Episode 1: reward: 115.000, steps: 115
Episode 2: reward: 72.000, steps: 72
Episode 3: reward: 200.000, steps: 200
Episode 4: reward: 200.000, steps: 200
Episode 5: reward: 73.000, steps: 73
Episode 6: reward: 200.000, steps: 200
Episode 7: reward: 108.000, steps: 108
Episode 8: reward: 200.000, steps: 200
Episode 9: reward: 94.000, steps: 94
Episode 10: reward: 200.000, steps: 200


<keras.callbacks.History at 0x25e330286d8>