# Tabular Q-learning 

This tutorial show how to solve the smart Taxi problem in [OPENAI Gym](https://gym.openai.com/) using tabular Q-learning algorithm.

The smart Taxi job is to pick up the passenger at one location and drop them off in another. Here are a few things that we'd love our Smartcab to take care of:

- Drop off the passenger to the right location.
- Save passenger's time by taking minimum time possible to drop off
- Take care of passenger's safety and traffic rules

![image](./resources/taxiV3.png)

In [1]:
#!pip install cmake 'gym[atari]' scipy

from collections import defaultdict
import pickle
import random
import click
import gym

In [2]:
def select_optimal_action(q_table, state, action_space):
    max_q_value_action = None
    max_q_value = 0

    if q_table[state]:
        for action, action_q_value in q_table[state].items():
            if action_q_value >= max_q_value:
                max_q_value = action_q_value
                max_q_value_action = action

    return max_q_value_action if max_q_value_action else action_space.sample()

## Train

In [6]:
# Create the game environment

env = gym.make("Taxi-v3")
env.render()

+---------+
|[35mR[0m: | : :[34;1mG[0m|
| : | : : |
| :[43m [0m: : : |
| | : | : |
|Y| : |B: |
+---------+



In [3]:
# The hyperparameters
alpha = 0.1
gamma = 0.6
epsilon = 0.1

NUM_EPISODES = 100000

def update(q_table, env, state):
    if random.uniform(0, 1) < epsilon:
        action = env.action_space.sample()
    else:
        action = select_optimal_action(q_table, state, env.action_space)

    next_state, reward, _, _ = env.step(action)
    old_q_value = q_table[state][action]

    # Check if next_state has q values already
    if not q_table[next_state]:
        q_table[next_state] = {action: 0 for action in range(env.action_space.n)}

    # Maximum q_value for the actions in next state
    next_max = max(q_table[next_state].values())

    # Calculate the new q_value
    new_q_value = (1 - alpha) * old_q_value + alpha * (reward + gamma * next_max)

    # Finally, update the q_value
    q_table[state][action] = new_q_value

    return next_state, reward


def train_agent(q_table, env, num_episodes):
    for i in range(num_episodes):
        state = env.reset()
        if not q_table[state]:
            q_table[state] = {
                action: 0 for action in range(env.action_space.n)}

        epochs = 0
        num_penalties, reward, total_reward = 0, 0, 0
        while reward != 20:
            state, reward = update(q_table, env, state)
            total_reward += reward

            if reward == -10:
                num_penalties += 1

            epochs += 1
        print("\nTraining episode {}".format(i + 1))
        print("Time steps: {}, Penalties: {}, Reward: {}".format(epochs,
                                                                 num_penalties,
                                                                 total_reward))

    print("Training finished.\n")
    return q_table

In [4]:
q_table = defaultdict(int, {})
q_table = train_agent(q_table, env, NUM_EPISODES)

# save the table for future use
with open("q_table.pickle", "wb") as f:
    pickle.dump(dict(q_table), f)


Training episode 1
Time steps: 652, Penalties: 221, Reward: -2620

Training episode 2
Time steps: 1496, Penalties: 496, Reward: -5939

Training episode 3
Time steps: 862, Penalties: 297, Reward: -3514

Training episode 4
Time steps: 384, Penalties: 138, Reward: -1605

Training episode 5
Time steps: 737, Penalties: 250, Reward: -2966

Training episode 6
Time steps: 344, Penalties: 124, Reward: -1439

Training episode 7
Time steps: 821, Penalties: 269, Reward: -3221

Training episode 8
Time steps: 1152, Penalties: 394, Reward: -4677

Training episode 9
Time steps: 1076, Penalties: 318, Reward: -3917

Training episode 10
Time steps: 2578, Penalties: 862, Reward: -10315

Training episode 11
Time steps: 234, Penalties: 60, Reward: -753

Training episode 12
Time steps: 7080, Penalties: 2308, Reward: -27831

Training episode 13
Time steps: 766, Penalties: 238, Reward: -2887

Training episode 14
Time steps: 3688, Penalties: 1190, Reward: -14377

Training episode 15
Time steps: 3283, Penalties

Training episode 139
Time steps: 930, Penalties: 279, Reward: -3420

Training episode 140
Time steps: 55, Penalties: 10, Reward: -124

Training episode 141
Time steps: 574, Penalties: 177, Reward: -2146

Training episode 142
Time steps: 1082, Penalties: 342, Reward: -4139

Training episode 143
Time steps: 1812, Penalties: 578, Reward: -6993

Training episode 144
Time steps: 593, Penalties: 178, Reward: -2174

Training episode 145
Time steps: 658, Penalties: 207, Reward: -2500

Training episode 146
Time steps: 1110, Penalties: 368, Reward: -4401

Training episode 147
Time steps: 226, Penalties: 68, Reward: -817

Training episode 148
Time steps: 149, Penalties: 47, Reward: -551

Training episode 149
Time steps: 3559, Penalties: 1207, Reward: -14401

Training episode 150
Time steps: 146, Penalties: 49, Reward: -566

Training episode 151
Time steps: 1290, Penalties: 423, Reward: -5076

Training episode 152
Time steps: 1251, Penalties: 407, Reward: -4893

Training episode 153
Time steps: 46


Training episode 264
Time steps: 2864, Penalties: 981, Reward: -11672

Training episode 265
Time steps: 924, Penalties: 303, Reward: -3630

Training episode 266
Time steps: 94, Penalties: 23, Reward: -280

Training episode 267
Time steps: 1302, Penalties: 445, Reward: -5286

Training episode 268
Time steps: 1672, Penalties: 533, Reward: -6448

Training episode 269
Time steps: 940, Penalties: 315, Reward: -3754

Training episode 270
Time steps: 1015, Penalties: 341, Reward: -4063

Training episode 271
Time steps: 1083, Penalties: 362, Reward: -4320

Training episode 272
Time steps: 268, Penalties: 93, Reward: -1084

Training episode 273
Time steps: 2002, Penalties: 637, Reward: -7714

Training episode 274
Time steps: 1756, Penalties: 610, Reward: -7225

Training episode 275
Time steps: 206, Penalties: 55, Reward: -680

Training episode 276
Time steps: 693, Penalties: 210, Reward: -2562

Training episode 277
Time steps: 439, Penalties: 147, Reward: -1741

Training episode 278
Time steps

Training episode 385
Time steps: 1903, Penalties: 644, Reward: -7678

Training episode 386
Time steps: 94, Penalties: 33, Reward: -370

Training episode 387
Time steps: 244, Penalties: 84, Reward: -979

Training episode 388
Time steps: 1606, Penalties: 536, Reward: -6409

Training episode 389
Time steps: 42, Penalties: 14, Reward: -147

Training episode 390
Time steps: 471, Penalties: 150, Reward: -1800

Training episode 391
Time steps: 546, Penalties: 192, Reward: -2253

Training episode 392
Time steps: 868, Penalties: 270, Reward: -3277

Training episode 393
Time steps: 1778, Penalties: 593, Reward: -7094

Training episode 394
Time steps: 3052, Penalties: 1005, Reward: -12076

Training episode 395
Time steps: 215, Penalties: 71, Reward: -833

Training episode 396
Time steps: 101, Penalties: 35, Reward: -395

Training episode 397
Time steps: 1608, Penalties: 517, Reward: -6240

Training episode 398
Time steps: 174, Penalties: 56, Reward: -657

Training episode 399
Time steps: 1545, Pe


Training episode 522
Time steps: 2249, Penalties: 750, Reward: -8978

Training episode 523
Time steps: 926, Penalties: 296, Reward: -3569

Training episode 524
Time steps: 278, Penalties: 83, Reward: -1004

Training episode 525
Time steps: 476, Penalties: 149, Reward: -1796

Training episode 526
Time steps: 110, Penalties: 33, Reward: -386

Training episode 527
Time steps: 1912, Penalties: 623, Reward: -7498

Training episode 528
Time steps: 175, Penalties: 44, Reward: -550

Training episode 529
Time steps: 431, Penalties: 150, Reward: -1760

Training episode 530
Time steps: 1394, Penalties: 454, Reward: -5459

Training episode 531
Time steps: 1430, Penalties: 460, Reward: -5549

Training episode 532
Time steps: 1092, Penalties: 353, Reward: -4248

Training episode 533
Time steps: 549, Penalties: 162, Reward: -1986

Training episode 534
Time steps: 283, Penalties: 74, Reward: -928

Training episode 535
Time steps: 771, Penalties: 255, Reward: -3045

Training episode 536
Time steps: 11

Training episode 643
Time steps: 943, Penalties: 309, Reward: -3703

Training episode 644
Time steps: 39, Penalties: 11, Reward: -117

Training episode 645
Time steps: 168, Penalties: 61, Reward: -696

Training episode 646
Time steps: 738, Penalties: 246, Reward: -2931

Training episode 647
Time steps: 137, Penalties: 48, Reward: -548

Training episode 648
Time steps: 265, Penalties: 75, Reward: -919

Training episode 649
Time steps: 2066, Penalties: 656, Reward: -7949

Training episode 650
Time steps: 1529, Penalties: 493, Reward: -5945

Training episode 651
Time steps: 4482, Penalties: 1464, Reward: -17637

Training episode 652
Time steps: 722, Penalties: 256, Reward: -3005

Training episode 653
Time steps: 590, Penalties: 186, Reward: -2243

Training episode 654
Time steps: 1126, Penalties: 350, Reward: -4255

Training episode 655
Time steps: 492, Penalties: 159, Reward: -1902

Training episode 656
Time steps: 1053, Penalties: 334, Reward: -4038

Training episode 657
Time steps: 335

Training episode 768
Time steps: 669, Penalties: 221, Reward: -2637

Training episode 769
Time steps: 2618, Penalties: 814, Reward: -9923

Training episode 770
Time steps: 1232, Penalties: 385, Reward: -4676

Training episode 771
Time steps: 677, Penalties: 196, Reward: -2420

Training episode 772
Time steps: 97, Penalties: 28, Reward: -328

Training episode 773
Time steps: 1056, Penalties: 316, Reward: -3879

Training episode 774
Time steps: 347, Penalties: 118, Reward: -1388

Training episode 775
Time steps: 2352, Penalties: 780, Reward: -9351

Training episode 776
Time steps: 2565, Penalties: 822, Reward: -9942

Training episode 777
Time steps: 522, Penalties: 151, Reward: -1860

Training episode 778
Time steps: 353, Penalties: 115, Reward: -1367

Training episode 779
Time steps: 397, Penalties: 139, Reward: -1627

Training episode 780
Time steps: 4296, Penalties: 1359, Reward: -16506

Training episode 781
Time steps: 441, Penalties: 149, Reward: -1761

Training episode 782
Time ste

Training episode 900
Time steps: 705, Penalties: 222, Reward: -2682

Training episode 901
Time steps: 2217, Penalties: 748, Reward: -8928

Training episode 902
Time steps: 541, Penalties: 190, Reward: -2230

Training episode 903
Time steps: 1263, Penalties: 410, Reward: -4932

Training episode 904
Time steps: 248, Penalties: 76, Reward: -911

Training episode 905
Time steps: 404, Penalties: 128, Reward: -1535

Training episode 906
Time steps: 1882, Penalties: 588, Reward: -7153

Training episode 907
Time steps: 1104, Penalties: 332, Reward: -4071

Training episode 908
Time steps: 210, Penalties: 66, Reward: -783

Training episode 909
Time steps: 500, Penalties: 176, Reward: -2063

Training episode 910
Time steps: 499, Penalties: 147, Reward: -1801

Training episode 911
Time steps: 934, Penalties: 302, Reward: -3631

Training episode 912
Time steps: 523, Penalties: 177, Reward: -2095

Training episode 913
Time steps: 46, Penalties: 18, Reward: -187

Training episode 914
Time steps: 2730

Training episode 1028
Time steps: 1884, Penalties: 589, Reward: -7164

Training episode 1029
Time steps: 885, Penalties: 280, Reward: -3384

Training episode 1030
Time steps: 467, Penalties: 148, Reward: -1778

Training episode 1031
Time steps: 161, Penalties: 54, Reward: -626

Training episode 1032
Time steps: 113, Penalties: 39, Reward: -443

Training episode 1033
Time steps: 3335, Penalties: 1103, Reward: -13241

Training episode 1034
Time steps: 256, Penalties: 83, Reward: -982

Training episode 1035
Time steps: 1844, Penalties: 570, Reward: -6953

Training episode 1036
Time steps: 2293, Penalties: 732, Reward: -8860

Training episode 1037
Time steps: 254, Penalties: 78, Reward: -935

Training episode 1038
Time steps: 1443, Penalties: 501, Reward: -5931

Training episode 1039
Time steps: 367, Penalties: 114, Reward: -1372

Training episode 1040
Time steps: 344, Penalties: 110, Reward: -1313

Training episode 1041
Time steps: 58, Penalties: 12, Reward: -145

Training episode 1042
Ti

Training episode 1155
Time steps: 2610, Penalties: 841, Reward: -10158

Training episode 1156
Time steps: 2115, Penalties: 703, Reward: -8421

Training episode 1157
Time steps: 1172, Penalties: 385, Reward: -4616

Training episode 1158
Time steps: 1384, Penalties: 473, Reward: -5620

Training episode 1159
Time steps: 568, Penalties: 179, Reward: -2158

Training episode 1160
Time steps: 265, Penalties: 84, Reward: -1000

Training episode 1161
Time steps: 62, Penalties: 14, Reward: -167

Training episode 1162
Time steps: 482, Penalties: 151, Reward: -1820

Training episode 1163
Time steps: 2537, Penalties: 859, Reward: -10247

Training episode 1164
Time steps: 60, Penalties: 15, Reward: -174

Training episode 1165
Time steps: 178, Penalties: 42, Reward: -535

Training episode 1166
Time steps: 122, Penalties: 22, Reward: -299

Training episode 1167
Time steps: 735, Penalties: 244, Reward: -2910

Training episode 1168
Time steps: 1602, Penalties: 502, Reward: -6099

Training episode 1169
T

Training episode 1278
Time steps: 321, Penalties: 116, Reward: -1344

Training episode 1279
Time steps: 239, Penalties: 77, Reward: -911

Training episode 1280
Time steps: 134, Penalties: 46, Reward: -527

Training episode 1281
Time steps: 849, Penalties: 260, Reward: -3168

Training episode 1282
Time steps: 961, Penalties: 302, Reward: -3658

Training episode 1283
Time steps: 2292, Penalties: 727, Reward: -8814

Training episode 1284
Time steps: 260, Penalties: 95, Reward: -1094

Training episode 1285
Time steps: 502, Penalties: 150, Reward: -1831

Training episode 1286
Time steps: 982, Penalties: 293, Reward: -3598

Training episode 1287
Time steps: 1073, Penalties: 377, Reward: -4445

Training episode 1288
Time steps: 1262, Penalties: 404, Reward: -4877

Training episode 1289
Time steps: 468, Penalties: 141, Reward: -1716

Training episode 1290
Time steps: 47, Penalties: 13, Reward: -143

Training episode 1291
Time steps: 739, Penalties: 235, Reward: -2833

Training episode 1292
Tim

Training episode 1410
Time steps: 1826, Penalties: 580, Reward: -7025

Training episode 1411
Time steps: 3411, Penalties: 1097, Reward: -13263

Training episode 1412
Time steps: 680, Penalties: 209, Reward: -2540

Training episode 1413
Time steps: 596, Penalties: 204, Reward: -2411

Training episode 1414
Time steps: 597, Penalties: 184, Reward: -2232

Training episode 1415
Time steps: 366, Penalties: 127, Reward: -1488

Training episode 1416
Time steps: 340, Penalties: 117, Reward: -1372

Training episode 1417
Time steps: 54, Penalties: 16, Reward: -177

Training episode 1418
Time steps: 430, Penalties: 132, Reward: -1597

Training episode 1419
Time steps: 191, Penalties: 65, Reward: -755

Training episode 1420
Time steps: 1220, Penalties: 430, Reward: -5069

Training episode 1421
Time steps: 334, Penalties: 111, Reward: -1312

Training episode 1422
Time steps: 459, Penalties: 144, Reward: -1734

Training episode 1423
Time steps: 854, Penalties: 278, Reward: -3335

Training episode 142

Training episode 1532
Time steps: 295, Penalties: 90, Reward: -1084

Training episode 1533
Time steps: 1298, Penalties: 425, Reward: -5102

Training episode 1534
Time steps: 1657, Penalties: 520, Reward: -6316

Training episode 1535
Time steps: 2043, Penalties: 683, Reward: -8169

Training episode 1536
Time steps: 1437, Penalties: 477, Reward: -5709

Training episode 1537
Time steps: 88, Penalties: 26, Reward: -301

Training episode 1538
Time steps: 1215, Penalties: 384, Reward: -4650

Training episode 1539
Time steps: 102, Penalties: 33, Reward: -378

Training episode 1540
Time steps: 266, Penalties: 90, Reward: -1055

Training episode 1541
Time steps: 1165, Penalties: 381, Reward: -4573

Training episode 1542
Time steps: 730, Penalties: 235, Reward: -2824

Training episode 1543
Time steps: 154, Penalties: 45, Reward: -538

Training episode 1544
Time steps: 1495, Penalties: 480, Reward: -5794

Training episode 1545
Time steps: 2340, Penalties: 738, Reward: -8961

Training episode 1546

Training episode 1661
Time steps: 1775, Penalties: 586, Reward: -7028

Training episode 1662
Time steps: 90, Penalties: 29, Reward: -330

Training episode 1663
Time steps: 3246, Penalties: 1057, Reward: -12738

Training episode 1664
Time steps: 776, Penalties: 264, Reward: -3131

Training episode 1665
Time steps: 2118, Penalties: 699, Reward: -8388

Training episode 1666
Time steps: 179, Penalties: 56, Reward: -662

Training episode 1667
Time steps: 1773, Penalties: 641, Reward: -7521

Training episode 1668
Time steps: 471, Penalties: 156, Reward: -1854

Training episode 1669
Time steps: 26, Penalties: 4, Reward: -41

Training episode 1670
Time steps: 831, Penalties: 270, Reward: -3240

Training episode 1671
Time steps: 433, Penalties: 140, Reward: -1672

Training episode 1672
Time steps: 161, Penalties: 55, Reward: -635

Training episode 1673
Time steps: 26, Penalties: 6, Reward: -59

Training episode 1674
Time steps: 558, Penalties: 170, Reward: -2067

Training episode 1675
Time step

Training episode 1790
Time steps: 2975, Penalties: 935, Reward: -11369

Training episode 1791
Time steps: 255, Penalties: 76, Reward: -918

Training episode 1792
Time steps: 395, Penalties: 127, Reward: -1517

Training episode 1793
Time steps: 678, Penalties: 207, Reward: -2520

Training episode 1794
Time steps: 1817, Penalties: 621, Reward: -7385

Training episode 1795
Time steps: 206, Penalties: 67, Reward: -788

Training episode 1796
Time steps: 86, Penalties: 26, Reward: -299

Training episode 1797
Time steps: 208, Penalties: 65, Reward: -772

Training episode 1798
Time steps: 315, Penalties: 92, Reward: -1122

Training episode 1799
Time steps: 1140, Penalties: 362, Reward: -4377

Training episode 1800
Time steps: 2109, Penalties: 692, Reward: -8316

Training episode 1801
Time steps: 74, Penalties: 21, Reward: -242

Training episode 1802
Time steps: 737, Penalties: 227, Reward: -2759

Training episode 1803
Time steps: 1287, Penalties: 421, Reward: -5055

Training episode 1804
Time 

Training episode 1917
Time steps: 1747, Penalties: 562, Reward: -6784

Training episode 1918
Time steps: 1033, Penalties: 343, Reward: -4099

Training episode 1919
Time steps: 268, Penalties: 90, Reward: -1057

Training episode 1920
Time steps: 69, Penalties: 25, Reward: -273

Training episode 1921
Time steps: 185, Penalties: 59, Reward: -695

Training episode 1922
Time steps: 616, Penalties: 208, Reward: -2467

Training episode 1923
Time steps: 1795, Penalties: 627, Reward: -7417

Training episode 1924
Time steps: 1890, Penalties: 602, Reward: -7287

Training episode 1925
Time steps: 107, Penalties: 36, Reward: -410

Training episode 1926
Time steps: 322, Penalties: 99, Reward: -1192

Training episode 1927
Time steps: 681, Penalties: 214, Reward: -2586

Training episode 1928
Time steps: 242, Penalties: 78, Reward: -923

Training episode 1929
Time steps: 618, Penalties: 202, Reward: -2415

Training episode 1930
Time steps: 728, Penalties: 230, Reward: -2777

Training episode 1931
Time 

Training episode 2054
Time steps: 2790, Penalties: 903, Reward: -10896

Training episode 2055
Time steps: 300, Penalties: 101, Reward: -1188

Training episode 2056
Time steps: 278, Penalties: 75, Reward: -932

Training episode 2057
Time steps: 637, Penalties: 178, Reward: -2218

Training episode 2058
Time steps: 2561, Penalties: 806, Reward: -9794

Training episode 2059
Time steps: 637, Penalties: 211, Reward: -2515

Training episode 2060
Time steps: 496, Penalties: 175, Reward: -2050

Training episode 2061
Time steps: 364, Penalties: 100, Reward: -1243

Training episode 2062
Time steps: 842, Penalties: 246, Reward: -3035

Training episode 2063
Time steps: 1898, Penalties: 643, Reward: -7664

Training episode 2064
Time steps: 1151, Penalties: 340, Reward: -4190

Training episode 2065
Time steps: 64, Penalties: 23, Reward: -250

Training episode 2066
Time steps: 1067, Penalties: 325, Reward: -3971

Training episode 2067
Time steps: 1769, Penalties: 573, Reward: -6905

Training episode 2

Training episode 2175
Time steps: 1493, Penalties: 482, Reward: -5810

Training episode 2176
Time steps: 182, Penalties: 52, Reward: -629

Training episode 2177
Time steps: 836, Penalties: 263, Reward: -3182

Training episode 2178
Time steps: 202, Penalties: 60, Reward: -721

Training episode 2179
Time steps: 1902, Penalties: 621, Reward: -7470

Training episode 2180
Time steps: 1622, Penalties: 543, Reward: -6488

Training episode 2181
Time steps: 10, Penalties: 2, Reward: -7

Training episode 2182
Time steps: 348, Penalties: 94, Reward: -1173

Training episode 2183
Time steps: 781, Penalties: 254, Reward: -3046

Training episode 2184
Time steps: 565, Penalties: 177, Reward: -2137

Training episode 2185
Time steps: 556, Penalties: 160, Reward: -1975

Training episode 2186
Time steps: 458, Penalties: 146, Reward: -1751

Training episode 2187
Time steps: 132, Penalties: 37, Reward: -444

Training episode 2188
Time steps: 1426, Penalties: 461, Reward: -5554

Training episode 2189
Time st

Training episode 2298
Time steps: 407, Penalties: 107, Reward: -1349

Training episode 2299
Time steps: 1399, Penalties: 451, Reward: -5437

Training episode 2300
Time steps: 1071, Penalties: 349, Reward: -4191

Training episode 2301
Time steps: 1623, Penalties: 553, Reward: -6579

Training episode 2302
Time steps: 2712, Penalties: 834, Reward: -10197

Training episode 2303
Time steps: 519, Penalties: 189, Reward: -2199

Training episode 2304
Time steps: 3742, Penalties: 1265, Reward: -15106

Training episode 2305
Time steps: 66, Penalties: 20, Reward: -225

Training episode 2306
Time steps: 598, Penalties: 169, Reward: -2098

Training episode 2307
Time steps: 543, Penalties: 190, Reward: -2232

Training episode 2308
Time steps: 30, Penalties: 4, Reward: -45

Training episode 2309
Time steps: 899, Penalties: 292, Reward: -3506

Training episode 2310
Time steps: 73, Penalties: 23, Reward: -259

Training episode 2311
Time steps: 1235, Penalties: 392, Reward: -4742

Training episode 2312


Training episode 2425
Time steps: 1224, Penalties: 390, Reward: -4713

Training episode 2426
Time steps: 1810, Penalties: 615, Reward: -7324

Training episode 2427
Time steps: 2697, Penalties: 901, Reward: -10785

Training episode 2428
Time steps: 2165, Penalties: 681, Reward: -8273

Training episode 2429
Time steps: 961, Penalties: 309, Reward: -3721

Training episode 2430
Time steps: 988, Penalties: 327, Reward: -3910

Training episode 2431
Time steps: 1104, Penalties: 355, Reward: -4278

Training episode 2432
Time steps: 1245, Penalties: 393, Reward: -4761

Training episode 2433
Time steps: 1383, Penalties: 442, Reward: -5340

Training episode 2434
Time steps: 870, Penalties: 273, Reward: -3306

Training episode 2435
Time steps: 523, Penalties: 167, Reward: -2005

Training episode 2436
Time steps: 750, Penalties: 248, Reward: -2961

Training episode 2437
Time steps: 1343, Penalties: 423, Reward: -5129

Training episode 2438
Time steps: 2400, Penalties: 711, Reward: -8778

Training e

Training episode 2548
Time steps: 747, Penalties: 221, Reward: -2715

Training episode 2549
Time steps: 1461, Penalties: 477, Reward: -5733

Training episode 2550
Time steps: 1787, Penalties: 544, Reward: -6662

Training episode 2551
Time steps: 1055, Penalties: 359, Reward: -4265

Training episode 2552
Time steps: 321, Penalties: 94, Reward: -1146

Training episode 2553
Time steps: 2936, Penalties: 968, Reward: -11627

Training episode 2554
Time steps: 260, Penalties: 89, Reward: -1040

Training episode 2555
Time steps: 926, Penalties: 275, Reward: -3380

Training episode 2556
Time steps: 1277, Penalties: 440, Reward: -5216

Training episode 2557
Time steps: 99, Penalties: 24, Reward: -294

Training episode 2558
Time steps: 440, Penalties: 134, Reward: -1625

Training episode 2559
Time steps: 483, Penalties: 150, Reward: -1812

Training episode 2560
Time steps: 1809, Penalties: 563, Reward: -6855

Training episode 2561
Time steps: 1610, Penalties: 557, Reward: -6602

Training episode 


Training episode 2684
Time steps: 1437, Penalties: 487, Reward: -5799

Training episode 2685
Time steps: 2085, Penalties: 682, Reward: -8202

Training episode 2686
Time steps: 918, Penalties: 257, Reward: -3210

Training episode 2687
Time steps: 443, Penalties: 133, Reward: -1619

Training episode 2688
Time steps: 887, Penalties: 276, Reward: -3350

Training episode 2689
Time steps: 1300, Penalties: 433, Reward: -5176

Training episode 2690
Time steps: 750, Penalties: 261, Reward: -3078

Training episode 2691
Time steps: 253, Penalties: 78, Reward: -934

Training episode 2692
Time steps: 1208, Penalties: 405, Reward: -4832

Training episode 2693
Time steps: 719, Penalties: 210, Reward: -2588

Training episode 2694
Time steps: 386, Penalties: 106, Reward: -1319

Training episode 2695
Time steps: 820, Penalties: 269, Reward: -3220

Training episode 2696
Time steps: 35, Penalties: 10, Reward: -104

Training episode 2697
Time steps: 1300, Penalties: 429, Reward: -5140

Training episode 26

Training episode 2811
Time steps: 2868, Penalties: 929, Reward: -11208

Training episode 2812
Time steps: 2442, Penalties: 772, Reward: -9369

Training episode 2813
Time steps: 1566, Penalties: 547, Reward: -6468

Training episode 2814
Time steps: 1709, Penalties: 547, Reward: -6611

Training episode 2815
Time steps: 586, Penalties: 178, Reward: -2167

Training episode 2816
Time steps: 2353, Penalties: 782, Reward: -9370

Training episode 2817
Time steps: 905, Penalties: 300, Reward: -3584

Training episode 2818
Time steps: 11, Penalties: 2, Reward: -8

Training episode 2819
Time steps: 3217, Penalties: 1018, Reward: -12358

Training episode 2820
Time steps: 1382, Penalties: 457, Reward: -5474

Training episode 2821
Time steps: 511, Penalties: 148, Reward: -1822

Training episode 2822
Time steps: 410, Penalties: 131, Reward: -1568

Training episode 2823
Time steps: 308, Penalties: 101, Reward: -1196

Training episode 2824
Time steps: 1473, Penalties: 475, Reward: -5727

Training episod

Training episode 2936
Time steps: 5544, Penalties: 1749, Reward: -21264

Training episode 2937
Time steps: 704, Penalties: 212, Reward: -2591

Training episode 2938
Time steps: 450, Penalties: 131, Reward: -1608

Training episode 2939
Time steps: 828, Penalties: 269, Reward: -3228

Training episode 2940
Time steps: 100, Penalties: 26, Reward: -313

Training episode 2941
Time steps: 694, Penalties: 207, Reward: -2536

Training episode 2942
Time steps: 180, Penalties: 51, Reward: -618

Training episode 2943
Time steps: 533, Penalties: 167, Reward: -2015

Training episode 2944
Time steps: 597, Penalties: 161, Reward: -2025

Training episode 2945
Time steps: 1417, Penalties: 469, Reward: -5617

Training episode 2946
Time steps: 201, Penalties: 56, Reward: -684

Training episode 2947
Time steps: 195, Penalties: 48, Reward: -606

Training episode 2948
Time steps: 313, Penalties: 123, Reward: -1399

Training episode 2949
Time steps: 693, Penalties: 212, Reward: -2580

Training episode 2950
Ti

Training episode 3077
Time steps: 302, Penalties: 104, Reward: -1217

Training episode 3078
Time steps: 747, Penalties: 227, Reward: -2769

Training episode 3079
Time steps: 71, Penalties: 9, Reward: -131

Training episode 3080
Time steps: 2554, Penalties: 834, Reward: -10039

Training episode 3081
Time steps: 823, Penalties: 275, Reward: -3277

Training episode 3082
Time steps: 8, Penalties: 1, Reward: 4

Training episode 3083
Time steps: 12, Penalties: 1, Reward: 0

Training episode 3084
Time steps: 903, Penalties: 306, Reward: -3636

Training episode 3085
Time steps: 312, Penalties: 94, Reward: -1137

Training episode 3086
Time steps: 66, Penalties: 15, Reward: -180

Training episode 3087
Time steps: 1188, Penalties: 387, Reward: -4650

Training episode 3088
Time steps: 749, Penalties: 237, Reward: -2861

Training episode 3089
Time steps: 177, Penalties: 31, Reward: -435

Training episode 3090
Time steps: 1283, Penalties: 390, Reward: -4772

Training episode 3091
Time steps: 416, Pe


Training episode 3208
Time steps: 3476, Penalties: 1177, Reward: -14048

Training episode 3209
Time steps: 752, Penalties: 224, Reward: -2747

Training episode 3210
Time steps: 3133, Penalties: 1035, Reward: -12427

Training episode 3211
Time steps: 244, Penalties: 71, Reward: -862

Training episode 3212
Time steps: 661, Penalties: 205, Reward: -2485

Training episode 3213
Time steps: 146, Penalties: 52, Reward: -593

Training episode 3214
Time steps: 265, Penalties: 78, Reward: -946

Training episode 3215
Time steps: 1753, Penalties: 551, Reward: -6691

Training episode 3216
Time steps: 689, Penalties: 261, Reward: -3017

Training episode 3217
Time steps: 306, Penalties: 95, Reward: -1140

Training episode 3218
Time steps: 1086, Penalties: 316, Reward: -3909

Training episode 3219
Time steps: 275, Penalties: 76, Reward: -938

Training episode 3220
Time steps: 3128, Penalties: 1066, Reward: -12701

Training episode 3221
Time steps: 617, Penalties: 201, Reward: -2405

Training episode 

Training episode 3329
Time steps: 1234, Penalties: 415, Reward: -4948

Training episode 3330
Time steps: 971, Penalties: 305, Reward: -3695

Training episode 3331
Time steps: 213, Penalties: 71, Reward: -831

Training episode 3332
Time steps: 566, Penalties: 186, Reward: -2219

Training episode 3333
Time steps: 2101, Penalties: 638, Reward: -7822

Training episode 3334
Time steps: 1132, Penalties: 403, Reward: -4738

Training episode 3335
Time steps: 1055, Penalties: 358, Reward: -4256

Training episode 3336
Time steps: 249, Penalties: 88, Reward: -1020

Training episode 3337
Time steps: 1722, Penalties: 573, Reward: -6858

Training episode 3338
Time steps: 100, Penalties: 32, Reward: -367

Training episode 3339
Time steps: 226, Penalties: 75, Reward: -880

Training episode 3340
Time steps: 1689, Penalties: 564, Reward: -6744

Training episode 3341
Time steps: 533, Penalties: 163, Reward: -1979

Training episode 3342
Time steps: 671, Penalties: 226, Reward: -2684

Training episode 3343

Training episode 3448
Time steps: 508, Penalties: 155, Reward: -1882

Training episode 3449
Time steps: 2061, Penalties: 676, Reward: -8124

Training episode 3450
Time steps: 361, Penalties: 114, Reward: -1366

Training episode 3451
Time steps: 165, Penalties: 43, Reward: -531

Training episode 3452
Time steps: 835, Penalties: 269, Reward: -3235

Training episode 3453
Time steps: 46, Penalties: 13, Reward: -142

Training episode 3454
Time steps: 1920, Penalties: 636, Reward: -7623

Training episode 3455
Time steps: 3051, Penalties: 1033, Reward: -12327

Training episode 3456
Time steps: 1140, Penalties: 386, Reward: -4593

Training episode 3457
Time steps: 1699, Penalties: 567, Reward: -6781

Training episode 3458
Time steps: 236, Penalties: 71, Reward: -854

Training episode 3459
Time steps: 852, Penalties: 256, Reward: -3135

Training episode 3460
Time steps: 1999, Penalties: 649, Reward: -7819

Training episode 3461
Time steps: 167, Penalties: 52, Reward: -614

Training episode 3462

Training episode 3568
Time steps: 2514, Penalties: 836, Reward: -10017

Training episode 3569
Time steps: 101, Penalties: 29, Reward: -341

Training episode 3570
Time steps: 1712, Penalties: 568, Reward: -6803

Training episode 3571
Time steps: 204, Penalties: 68, Reward: -795

Training episode 3572
Time steps: 231, Penalties: 60, Reward: -750

Training episode 3573
Time steps: 634, Penalties: 195, Reward: -2368

Training episode 3574
Time steps: 3414, Penalties: 1125, Reward: -13518

Training episode 3575
Time steps: 128, Penalties: 43, Reward: -494

Training episode 3576
Time steps: 655, Penalties: 210, Reward: -2524

Training episode 3577
Time steps: 695, Penalties: 221, Reward: -2663

Training episode 3578
Time steps: 384, Penalties: 118, Reward: -1425

Training episode 3579
Time steps: 55, Penalties: 15, Reward: -169

Training episode 3580
Time steps: 1476, Penalties: 490, Reward: -5865

Training episode 3581
Time steps: 1677, Penalties: 546, Reward: -6570

Training episode 3582
T

Training episode 3689
Time steps: 1009, Penalties: 327, Reward: -3931

Training episode 3690
Time steps: 277, Penalties: 101, Reward: -1165

Training episode 3691
Time steps: 123, Penalties: 37, Reward: -435

Training episode 3692
Time steps: 51, Penalties: 14, Reward: -156

Training episode 3693
Time steps: 265, Penalties: 88, Reward: -1036

Training episode 3694
Time steps: 143, Penalties: 47, Reward: -545

Training episode 3695
Time steps: 325, Penalties: 103, Reward: -1231

Training episode 3696
Time steps: 609, Penalties: 185, Reward: -2253

Training episode 3697
Time steps: 271, Penalties: 90, Reward: -1060

Training episode 3698
Time steps: 4532, Penalties: 1491, Reward: -17930

Training episode 3699
Time steps: 2591, Penalties: 842, Reward: -10148

Training episode 3700
Time steps: 63, Penalties: 20, Reward: -222

Training episode 3701
Time steps: 297, Penalties: 111, Reward: -1275

Training episode 3702
Time steps: 587, Penalties: 198, Reward: -2348

Training episode 3703
Time

Training episode 3812
Time steps: 1861, Penalties: 626, Reward: -7474

Training episode 3813
Time steps: 4507, Penalties: 1527, Reward: -18229

Training episode 3814
Time steps: 1464, Penalties: 474, Reward: -5709

Training episode 3815
Time steps: 452, Penalties: 152, Reward: -1799

Training episode 3816
Time steps: 71, Penalties: 21, Reward: -239

Training episode 3817
Time steps: 104, Penalties: 34, Reward: -389

Training episode 3818
Time steps: 332, Penalties: 100, Reward: -1211

Training episode 3819
Time steps: 493, Penalties: 148, Reward: -1804

Training episode 3820
Time steps: 83, Penalties: 20, Reward: -242

Training episode 3821
Time steps: 1433, Penalties: 449, Reward: -5453

Training episode 3822
Time steps: 1133, Penalties: 337, Reward: -4145

Training episode 3823
Time steps: 918, Penalties: 266, Reward: -3291

Training episode 3824
Time steps: 1163, Penalties: 399, Reward: -4733

Training episode 3825
Time steps: 469, Penalties: 158, Reward: -1870

Training episode 382

Training episode 3937
Time steps: 392, Penalties: 114, Reward: -1397

Training episode 3938
Time steps: 39, Penalties: 7, Reward: -81

Training episode 3939
Time steps: 1117, Penalties: 385, Reward: -4561

Training episode 3940
Time steps: 432, Penalties: 130, Reward: -1581

Training episode 3941
Time steps: 75, Penalties: 21, Reward: -243

Training episode 3942
Time steps: 173, Penalties: 49, Reward: -593

Training episode 3943
Time steps: 1548, Penalties: 506, Reward: -6081

Training episode 3944
Time steps: 540, Penalties: 170, Reward: -2049

Training episode 3945
Time steps: 674, Penalties: 223, Reward: -2660

Training episode 3946
Time steps: 1028, Penalties: 322, Reward: -3905

Training episode 3947
Time steps: 994, Penalties: 300, Reward: -3673

Training episode 3948
Time steps: 1602, Penalties: 509, Reward: -6162

Training episode 3949
Time steps: 425, Penalties: 138, Reward: -1646

Training episode 3950
Time steps: 1321, Penalties: 435, Reward: -5215

Training episode 3951
Tim

Training episode 4071
Time steps: 128, Penalties: 38, Reward: -449

Training episode 4072
Time steps: 718, Penalties: 226, Reward: -2731

Training episode 4073
Time steps: 609, Penalties: 202, Reward: -2406

Training episode 4074
Time steps: 529, Penalties: 160, Reward: -1948

Training episode 4075
Time steps: 727, Penalties: 231, Reward: -2785

Training episode 4076
Time steps: 2144, Penalties: 684, Reward: -8279

Training episode 4077
Time steps: 59, Penalties: 18, Reward: -200

Training episode 4078
Time steps: 597, Penalties: 204, Reward: -2412

Training episode 4079
Time steps: 1889, Penalties: 588, Reward: -7160

Training episode 4080
Time steps: 298, Penalties: 101, Reward: -1186

Training episode 4081
Time steps: 422, Penalties: 151, Reward: -1760

Training episode 4082
Time steps: 481, Penalties: 157, Reward: -1873

Training episode 4083
Time steps: 1983, Penalties: 594, Reward: -7308

Training episode 4084
Time steps: 1158, Penalties: 361, Reward: -4386

Training episode 4085

Training episode 4194
Time steps: 1636, Penalties: 534, Reward: -6421

Training episode 4195
Time steps: 984, Penalties: 304, Reward: -3699

Training episode 4196
Time steps: 1362, Penalties: 450, Reward: -5391

Training episode 4197
Time steps: 227, Penalties: 67, Reward: -809

Training episode 4198
Time steps: 1706, Penalties: 578, Reward: -6887

Training episode 4199
Time steps: 1736, Penalties: 576, Reward: -6899

Training episode 4200
Time steps: 1252, Penalties: 426, Reward: -5065

Training episode 4201
Time steps: 2831, Penalties: 909, Reward: -10991

Training episode 4202
Time steps: 334, Penalties: 101, Reward: -1222

Training episode 4203
Time steps: 906, Penalties: 296, Reward: -3549

Training episode 4204
Time steps: 1050, Penalties: 332, Reward: -4017

Training episode 4205
Time steps: 286, Penalties: 91, Reward: -1084

Training episode 4206
Time steps: 696, Penalties: 226, Reward: -2709

Training episode 4207
Time steps: 188, Penalties: 65, Reward: -752

Training episode 

Training episode 4323
Time steps: 3257, Penalties: 1003, Reward: -12263

Training episode 4324
Time steps: 632, Penalties: 178, Reward: -2213

Training episode 4325
Time steps: 2373, Penalties: 782, Reward: -9390

Training episode 4326
Time steps: 751, Penalties: 245, Reward: -2935

Training episode 4327
Time steps: 777, Penalties: 252, Reward: -3024

Training episode 4328
Time steps: 50, Penalties: 16, Reward: -173

Training episode 4329
Time steps: 795, Penalties: 273, Reward: -3231

Training episode 4330
Time steps: 389, Penalties: 110, Reward: -1358

Training episode 4331
Time steps: 1000, Penalties: 340, Reward: -4039

Training episode 4332
Time steps: 648, Penalties: 195, Reward: -2382

Training episode 4333
Time steps: 122, Penalties: 35, Reward: -416

Training episode 4334
Time steps: 157, Penalties: 50, Reward: -586

Training episode 4335
Time steps: 1749, Penalties: 610, Reward: -7218

Training episode 4336
Time steps: 3321, Penalties: 1039, Reward: -12651

Training episode 4

Training episode 4454
Time steps: 338, Penalties: 111, Reward: -1316

Training episode 4455
Time steps: 584, Penalties: 179, Reward: -2174

Training episode 4456
Time steps: 1374, Penalties: 416, Reward: -5097

Training episode 4457
Time steps: 196, Penalties: 52, Reward: -643

Training episode 4458
Time steps: 871, Penalties: 262, Reward: -3208

Training episode 4459
Time steps: 1683, Penalties: 524, Reward: -6378

Training episode 4460
Time steps: 4187, Penalties: 1334, Reward: -16172

Training episode 4461
Time steps: 1379, Penalties: 446, Reward: -5372

Training episode 4462
Time steps: 175, Penalties: 55, Reward: -649

Training episode 4463
Time steps: 622, Penalties: 198, Reward: -2383

Training episode 4464
Time steps: 689, Penalties: 250, Reward: -2918

Training episode 4465
Time steps: 874, Penalties: 296, Reward: -3517

Training episode 4466
Time steps: 996, Penalties: 323, Reward: -3882

Training episode 4467
Time steps: 593, Penalties: 191, Reward: -2291

Training episode 4

Training episode 4582
Time steps: 1519, Penalties: 492, Reward: -5926

Training episode 4583
Time steps: 1087, Penalties: 350, Reward: -4216

Training episode 4584
Time steps: 1048, Penalties: 307, Reward: -3790

Training episode 4585
Time steps: 3204, Penalties: 1035, Reward: -12498

Training episode 4586
Time steps: 863, Penalties: 270, Reward: -3272

Training episode 4587
Time steps: 210, Penalties: 70, Reward: -819

Training episode 4588
Time steps: 1608, Penalties: 514, Reward: -6213

Training episode 4589
Time steps: 2258, Penalties: 723, Reward: -8744

Training episode 4590
Time steps: 2612, Penalties: 846, Reward: -10205

Training episode 4591
Time steps: 1894, Penalties: 613, Reward: -7390

Training episode 4592
Time steps: 253, Penalties: 78, Reward: -934

Training episode 4593
Time steps: 1646, Penalties: 543, Reward: -6512

Training episode 4594
Time steps: 339, Penalties: 108, Reward: -1290

Training episode 4595
Time steps: 403, Penalties: 141, Reward: -1651

Training epi

Training episode 4713
Time steps: 1482, Penalties: 468, Reward: -5673

Training episode 4714
Time steps: 3073, Penalties: 974, Reward: -11818

Training episode 4715
Time steps: 1253, Penalties: 405, Reward: -4877

Training episode 4716
Time steps: 185, Penalties: 67, Reward: -767

Training episode 4717
Time steps: 1510, Penalties: 485, Reward: -5854

Training episode 4718
Time steps: 4691, Penalties: 1566, Reward: -18764

Training episode 4719
Time steps: 1427, Penalties: 458, Reward: -5528

Training episode 4720
Time steps: 38, Penalties: 12, Reward: -125

Training episode 4721
Time steps: 195, Penalties: 71, Reward: -813

Training episode 4722
Time steps: 722, Penalties: 239, Reward: -2852

Training episode 4723
Time steps: 1266, Penalties: 433, Reward: -5142

Training episode 4724
Time steps: 454, Penalties: 129, Reward: -1594

Training episode 4725
Time steps: 1581, Penalties: 516, Reward: -6204

Training episode 4726
Time steps: 238, Penalties: 81, Reward: -946

Training episode 4

Training episode 4832
Time steps: 1157, Penalties: 364, Reward: -4412

Training episode 4833
Time steps: 485, Penalties: 156, Reward: -1868

Training episode 4834
Time steps: 1094, Penalties: 353, Reward: -4250

Training episode 4835
Time steps: 718, Penalties: 250, Reward: -2947

Training episode 4836
Time steps: 1043, Penalties: 321, Reward: -3911

Training episode 4837
Time steps: 1480, Penalties: 507, Reward: -6022

Training episode 4838
Time steps: 1403, Penalties: 440, Reward: -5342

Training episode 4839
Time steps: 847, Penalties: 279, Reward: -3337

Training episode 4840
Time steps: 242, Penalties: 77, Reward: -914

Training episode 4841
Time steps: 1843, Penalties: 601, Reward: -7231

Training episode 4842
Time steps: 417, Penalties: 141, Reward: -1665

Training episode 4843
Time steps: 132, Penalties: 46, Reward: -525

Training episode 4844
Time steps: 4439, Penalties: 1445, Reward: -17423

Training episode 4845
Time steps: 754, Penalties: 251, Reward: -2992

Training episod

Training episode 4955
Time steps: 463, Penalties: 151, Reward: -1801

Training episode 4956
Time steps: 1543, Penalties: 491, Reward: -5941

Training episode 4957
Time steps: 2858, Penalties: 910, Reward: -11027

Training episode 4958
Time steps: 281, Penalties: 85, Reward: -1025

Training episode 4959
Time steps: 2465, Penalties: 825, Reward: -9869

Training episode 4960
Time steps: 369, Penalties: 118, Reward: -1410

Training episode 4961
Time steps: 2500, Penalties: 822, Reward: -9877

Training episode 4962
Time steps: 1414, Penalties: 466, Reward: -5587

Training episode 4963
Time steps: 361, Penalties: 119, Reward: -1411

Training episode 4964
Time steps: 16, Penalties: 6, Reward: -49

Training episode 4965
Time steps: 865, Penalties: 290, Reward: -3454

Training episode 4966
Time steps: 2825, Penalties: 940, Reward: -11264

Training episode 4967
Time steps: 120, Penalties: 47, Reward: -522

Training episode 4968
Time steps: 496, Penalties: 175, Reward: -2050

Training episode 496

Training episode 5082
Time steps: 1903, Penalties: 600, Reward: -7282

Training episode 5083
Time steps: 1207, Penalties: 378, Reward: -4588

Training episode 5084
Time steps: 307, Penalties: 83, Reward: -1033

Training episode 5085
Time steps: 3373, Penalties: 1145, Reward: -13657

Training episode 5086
Time steps: 76, Penalties: 20, Reward: -235

Training episode 5087
Time steps: 374, Penalties: 110, Reward: -1343

Training episode 5088
Time steps: 574, Penalties: 194, Reward: -2299

Training episode 5089
Time steps: 747, Penalties: 235, Reward: -2841

Training episode 5090
Time steps: 689, Penalties: 225, Reward: -2693

Training episode 5091
Time steps: 647, Penalties: 200, Reward: -2426

Training episode 5092
Time steps: 755, Penalties: 257, Reward: -3047

Training episode 5093
Time steps: 310, Penalties: 101, Reward: -1198

Training episode 5094
Time steps: 445, Penalties: 154, Reward: -1810

Training episode 5095
Time steps: 907, Penalties: 289, Reward: -3487

Training episode 50

Training episode 5208
Time steps: 1002, Penalties: 320, Reward: -3861

Training episode 5209
Time steps: 324, Penalties: 108, Reward: -1275

Training episode 5210
Time steps: 1352, Penalties: 423, Reward: -5138

Training episode 5211
Time steps: 757, Penalties: 245, Reward: -2941

Training episode 5212
Time steps: 1102, Penalties: 367, Reward: -4384

Training episode 5213
Time steps: 1476, Penalties: 480, Reward: -5775

Training episode 5214
Time steps: 511, Penalties: 145, Reward: -1795

Training episode 5215
Time steps: 93, Penalties: 33, Reward: -369

Training episode 5216
Time steps: 407, Penalties: 126, Reward: -1520

Training episode 5217
Time steps: 298, Penalties: 102, Reward: -1195

Training episode 5218
Time steps: 1399, Penalties: 455, Reward: -5473

Training episode 5219
Time steps: 2996, Penalties: 973, Reward: -11732

Training episode 5220
Time steps: 635, Penalties: 216, Reward: -2558

Training episode 5221
Time steps: 2013, Penalties: 657, Reward: -7905

Training episod

Training episode 5340
Time steps: 1748, Penalties: 561, Reward: -6776

Training episode 5341
Time steps: 80, Penalties: 17, Reward: -212

Training episode 5342
Time steps: 295, Penalties: 92, Reward: -1102

Training episode 5343
Time steps: 470, Penalties: 126, Reward: -1583

Training episode 5344
Time steps: 722, Penalties: 225, Reward: -2726

Training episode 5345
Time steps: 118, Penalties: 34, Reward: -403

Training episode 5346
Time steps: 991, Penalties: 312, Reward: -3778

Training episode 5347
Time steps: 2397, Penalties: 780, Reward: -9396

Training episode 5348
Time steps: 10, Penalties: 1, Reward: 2

Training episode 5349
Time steps: 118, Penalties: 36, Reward: -421

Training episode 5350
Time steps: 74, Penalties: 19, Reward: -224

Training episode 5351
Time steps: 397, Penalties: 130, Reward: -1546

Training episode 5352
Time steps: 70, Penalties: 25, Reward: -274

Training episode 5353
Time steps: 72, Penalties: 11, Reward: -150

Training episode 5354
Time steps: 918, Pen

Training episode 5468
Time steps: 1197, Penalties: 355, Reward: -4371

Training episode 5469
Time steps: 232, Penalties: 80, Reward: -931

Training episode 5470
Time steps: 1151, Penalties: 363, Reward: -4397

Training episode 5471
Time steps: 764, Penalties: 242, Reward: -2921

Training episode 5472
Time steps: 644, Penalties: 212, Reward: -2531

Training episode 5473
Time steps: 7390, Penalties: 2419, Reward: -29140

Training episode 5474
Time steps: 586, Penalties: 154, Reward: -1951

Training episode 5475
Time steps: 1153, Penalties: 394, Reward: -4678

Training episode 5476
Time steps: 123, Penalties: 33, Reward: -399

Training episode 5477
Time steps: 5875, Penalties: 1914, Reward: -23080

Training episode 5478
Time steps: 521, Penalties: 157, Reward: -1913

Training episode 5479
Time steps: 209, Penalties: 66, Reward: -782

Training episode 5480
Time steps: 2157, Penalties: 661, Reward: -8085

Training episode 5481
Time steps: 1223, Penalties: 394, Reward: -4748

Training episod

Training episode 5592
Time steps: 1584, Penalties: 485, Reward: -5928

Training episode 5593
Time steps: 479, Penalties: 147, Reward: -1781

Training episode 5594
Time steps: 460, Penalties: 142, Reward: -1717

Training episode 5595
Time steps: 2037, Penalties: 664, Reward: -7992

Training episode 5596
Time steps: 804, Penalties: 269, Reward: -3204

Training episode 5597
Time steps: 1836, Penalties: 563, Reward: -6882

Training episode 5598
Time steps: 1801, Penalties: 569, Reward: -6901

Training episode 5599
Time steps: 588, Penalties: 183, Reward: -2214

Training episode 5600
Time steps: 347, Penalties: 111, Reward: -1325

Training episode 5601
Time steps: 95, Penalties: 29, Reward: -335

Training episode 5602
Time steps: 122, Penalties: 33, Reward: -398

Training episode 5603
Time steps: 266, Penalties: 71, Reward: -884

Training episode 5604
Time steps: 774, Penalties: 247, Reward: -2976

Training episode 5605
Time steps: 29, Penalties: 14, Reward: -134

Training episode 5606
Time

Training episode 5720
Time steps: 1240, Penalties: 428, Reward: -5071

Training episode 5721
Time steps: 120, Penalties: 40, Reward: -459

Training episode 5722
Time steps: 97, Penalties: 30, Reward: -346

Training episode 5723
Time steps: 78, Penalties: 19, Reward: -228

Training episode 5724
Time steps: 984, Penalties: 329, Reward: -3924

Training episode 5725
Time steps: 1211, Penalties: 417, Reward: -4943

Training episode 5726
Time steps: 170, Penalties: 46, Reward: -563

Training episode 5727
Time steps: 282, Penalties: 93, Reward: -1098

Training episode 5728
Time steps: 134, Penalties: 33, Reward: -410

Training episode 5729
Time steps: 2153, Penalties: 679, Reward: -8243

Training episode 5730
Time steps: 151, Penalties: 49, Reward: -571

Training episode 5731
Time steps: 1228, Penalties: 391, Reward: -4726

Training episode 5732
Time steps: 623, Penalties: 191, Reward: -2321

Training episode 5733
Time steps: 141, Penalties: 40, Reward: -480

Training episode 5734
Time steps:

Training episode 5840
Time steps: 1094, Penalties: 366, Reward: -4367

Training episode 5841
Time steps: 1143, Penalties: 325, Reward: -4047

Training episode 5842
Time steps: 712, Penalties: 237, Reward: -2824

Training episode 5843
Time steps: 1404, Penalties: 446, Reward: -5397

Training episode 5844
Time steps: 356, Penalties: 112, Reward: -1343

Training episode 5845
Time steps: 1022, Penalties: 338, Reward: -4043

Training episode 5846
Time steps: 1157, Penalties: 371, Reward: -4475

Training episode 5847
Time steps: 416, Penalties: 144, Reward: -1691

Training episode 5848
Time steps: 1812, Penalties: 592, Reward: -7119

Training episode 5849
Time steps: 822, Penalties: 256, Reward: -3105

Training episode 5850
Time steps: 515, Penalties: 173, Reward: -2051

Training episode 5851
Time steps: 654, Penalties: 211, Reward: -2532

Training episode 5852
Time steps: 907, Penalties: 264, Reward: -3262

Training episode 5853
Time steps: 2283, Penalties: 773, Reward: -9219

Training epis

Training episode 5959
Time steps: 234, Penalties: 76, Reward: -897

Training episode 5960
Time steps: 167, Penalties: 41, Reward: -515

Training episode 5961
Time steps: 270, Penalties: 77, Reward: -942

Training episode 5962
Time steps: 107, Penalties: 33, Reward: -383

Training episode 5963
Time steps: 606, Penalties: 196, Reward: -2349

Training episode 5964
Time steps: 642, Penalties: 200, Reward: -2421

Training episode 5965
Time steps: 1372, Penalties: 477, Reward: -5644

Training episode 5966
Time steps: 289, Penalties: 87, Reward: -1051

Training episode 5967
Time steps: 677, Penalties: 221, Reward: -2645

Training episode 5968
Time steps: 2549, Penalties: 841, Reward: -10097

Training episode 5969
Time steps: 738, Penalties: 231, Reward: -2796

Training episode 5970
Time steps: 211, Penalties: 85, Reward: -955

Training episode 5971
Time steps: 896, Penalties: 300, Reward: -3575

Training episode 5972
Time steps: 642, Penalties: 182, Reward: -2259

Training episode 5973
Time s

Training episode 6080
Time steps: 3126, Penalties: 990, Reward: -12015

Training episode 6081
Time steps: 519, Penalties: 170, Reward: -2028

Training episode 6082
Time steps: 790, Penalties: 265, Reward: -3154

Training episode 6083
Time steps: 386, Penalties: 106, Reward: -1319

Training episode 6084
Time steps: 210, Penalties: 66, Reward: -783

Training episode 6085
Time steps: 60, Penalties: 24, Reward: -255

Training episode 6086
Time steps: 1878, Penalties: 607, Reward: -7320

Training episode 6087
Time steps: 1015, Penalties: 304, Reward: -3730

Training episode 6088
Time steps: 2454, Penalties: 817, Reward: -9786

Training episode 6089
Time steps: 68, Penalties: 21, Reward: -236

Training episode 6090
Time steps: 146, Penalties: 49, Reward: -566

Training episode 6091
Time steps: 116, Penalties: 36, Reward: -419

Training episode 6092
Time steps: 339, Penalties: 108, Reward: -1290

Training episode 6093
Time steps: 678, Penalties: 214, Reward: -2583

Training episode 6094
Time 

Training episode 6212
Time steps: 1290, Penalties: 380, Reward: -4689

Training episode 6213
Time steps: 548, Penalties: 182, Reward: -2165

Training episode 6214
Time steps: 442, Penalties: 124, Reward: -1537

Training episode 6215
Time steps: 3830, Penalties: 1236, Reward: -14933

Training episode 6216
Time steps: 1382, Penalties: 439, Reward: -5312

Training episode 6217
Time steps: 613, Penalties: 198, Reward: -2374

Training episode 6218
Time steps: 1139, Penalties: 345, Reward: -4223

Training episode 6219
Time steps: 738, Penalties: 239, Reward: -2868

Training episode 6220
Time steps: 208, Penalties: 63, Reward: -754

Training episode 6221
Time steps: 218, Penalties: 68, Reward: -809

Training episode 6222
Time steps: 1392, Penalties: 486, Reward: -5745

Training episode 6223
Time steps: 329, Penalties: 98, Reward: -1190

Training episode 6224
Time steps: 151, Penalties: 42, Reward: -508

Training episode 6225
Time steps: 317, Penalties: 110, Reward: -1286

Training episode 622


Training episode 6338
Time steps: 6062, Penalties: 1973, Reward: -23798

Training episode 6339
Time steps: 2228, Penalties: 699, Reward: -8498

Training episode 6340
Time steps: 722, Penalties: 254, Reward: -2987

Training episode 6341
Time steps: 696, Penalties: 216, Reward: -2619

Training episode 6342
Time steps: 246, Penalties: 76, Reward: -909

Training episode 6343
Time steps: 398, Penalties: 143, Reward: -1664

Training episode 6344
Time steps: 1378, Penalties: 437, Reward: -5290

Training episode 6345
Time steps: 1258, Penalties: 377, Reward: -4630

Training episode 6346
Time steps: 133, Penalties: 43, Reward: -499

Training episode 6347
Time steps: 365, Penalties: 114, Reward: -1370

Training episode 6348
Time steps: 1288, Penalties: 451, Reward: -5326

Training episode 6349
Time steps: 3473, Penalties: 1100, Reward: -13352

Training episode 6350
Time steps: 712, Penalties: 219, Reward: -2662

Training episode 6351
Time steps: 175, Penalties: 57, Reward: -667

Training episod

Training episode 6466
Time steps: 6108, Penalties: 2021, Reward: -24276

Training episode 6467
Time steps: 1283, Penalties: 410, Reward: -4952

Training episode 6468
Time steps: 754, Penalties: 249, Reward: -2974

Training episode 6469
Time steps: 372, Penalties: 103, Reward: -1278

Training episode 6470
Time steps: 1977, Penalties: 661, Reward: -7905

Training episode 6471
Time steps: 269, Penalties: 72, Reward: -896

Training episode 6472
Time steps: 492, Penalties: 145, Reward: -1776

Training episode 6473
Time steps: 563, Penalties: 180, Reward: -2162

Training episode 6474
Time steps: 1395, Penalties: 499, Reward: -5865

Training episode 6475
Time steps: 281, Penalties: 92, Reward: -1088

Training episode 6476
Time steps: 112, Penalties: 34, Reward: -397

Training episode 6477
Time steps: 662, Penalties: 198, Reward: -2423

Training episode 6478
Time steps: 1745, Penalties: 582, Reward: -6962

Training episode 6479
Time steps: 277, Penalties: 100, Reward: -1156

Training episode 6

Training episode 6598
Time steps: 3471, Penalties: 1138, Reward: -13692

Training episode 6599
Time steps: 2305, Penalties: 777, Reward: -9277

Training episode 6600
Time steps: 135, Penalties: 34, Reward: -420

Training episode 6601
Time steps: 74, Penalties: 25, Reward: -278

Training episode 6602
Time steps: 198, Penalties: 65, Reward: -762

Training episode 6603
Time steps: 728, Penalties: 224, Reward: -2723

Training episode 6604
Time steps: 435, Penalties: 137, Reward: -1647

Training episode 6605
Time steps: 577, Penalties: 217, Reward: -2509

Training episode 6606
Time steps: 322, Penalties: 95, Reward: -1156

Training episode 6607
Time steps: 286, Penalties: 89, Reward: -1066

Training episode 6608
Time steps: 126, Penalties: 34, Reward: -411

Training episode 6609
Time steps: 65, Penalties: 15, Reward: -179

Training episode 6610
Time steps: 675, Penalties: 220, Reward: -2634

Training episode 6611
Time steps: 3980, Penalties: 1360, Reward: -16199

Training episode 6612
Time 

Training episode 6727
Time steps: 1070, Penalties: 372, Reward: -4397

Training episode 6728
Time steps: 409, Penalties: 128, Reward: -1540

Training episode 6729
Time steps: 2600, Penalties: 852, Reward: -10247

Training episode 6730
Time steps: 563, Penalties: 192, Reward: -2270

Training episode 6731
Time steps: 1350, Penalties: 427, Reward: -5172

Training episode 6732
Time steps: 155, Penalties: 52, Reward: -602

Training episode 6733
Time steps: 1418, Penalties: 463, Reward: -5564

Training episode 6734
Time steps: 651, Penalties: 208, Reward: -2502

Training episode 6735
Time steps: 586, Penalties: 184, Reward: -2221

Training episode 6736
Time steps: 67, Penalties: 18, Reward: -208

Training episode 6737
Time steps: 78, Penalties: 11, Reward: -156

Training episode 6738
Time steps: 809, Penalties: 256, Reward: -3092

Training episode 6739
Time steps: 60, Penalties: 20, Reward: -219

Training episode 6740
Time steps: 424, Penalties: 126, Reward: -1537

Training episode 6741
Time

Training episode 6851
Time steps: 2242, Penalties: 710, Reward: -8611

Training episode 6852
Time steps: 69, Penalties: 15, Reward: -183

Training episode 6853
Time steps: 946, Penalties: 301, Reward: -3634

Training episode 6854
Time steps: 533, Penalties: 167, Reward: -2015

Training episode 6855
Time steps: 438, Penalties: 137, Reward: -1650

Training episode 6856
Time steps: 328, Penalties: 89, Reward: -1108

Training episode 6857
Time steps: 502, Penalties: 158, Reward: -1903

Training episode 6858
Time steps: 439, Penalties: 149, Reward: -1759

Training episode 6859
Time steps: 533, Penalties: 172, Reward: -2060

Training episode 6860
Time steps: 604, Penalties: 199, Reward: -2374

Training episode 6861
Time steps: 85, Penalties: 31, Reward: -343

Training episode 6862
Time steps: 1593, Penalties: 543, Reward: -6459

Training episode 6863
Time steps: 507, Penalties: 166, Reward: -1980

Training episode 6864
Time steps: 156, Penalties: 46, Reward: -549

Training episode 6865
Time 

Training episode 6969
Time steps: 2188, Penalties: 700, Reward: -8467

Training episode 6970
Time steps: 280, Penalties: 88, Reward: -1051

Training episode 6971
Time steps: 92, Penalties: 20, Reward: -251

Training episode 6972
Time steps: 412, Penalties: 139, Reward: -1642

Training episode 6973
Time steps: 872, Penalties: 249, Reward: -3092

Training episode 6974
Time steps: 388, Penalties: 112, Reward: -1375

Training episode 6975
Time steps: 297, Penalties: 107, Reward: -1239

Training episode 6976
Time steps: 439, Penalties: 158, Reward: -1840

Training episode 6977
Time steps: 700, Penalties: 212, Reward: -2587

Training episode 6978
Time steps: 376, Penalties: 110, Reward: -1345

Training episode 6979
Time steps: 2986, Penalties: 945, Reward: -11470

Training episode 6980
Time steps: 1559, Penalties: 511, Reward: -6137

Training episode 6981
Time steps: 1191, Penalties: 399, Reward: -4761

Training episode 6982
Time steps: 910, Penalties: 308, Reward: -3661

Training episode 69

Training episode 7100
Time steps: 158, Penalties: 55, Reward: -632

Training episode 7101
Time steps: 1464, Penalties: 476, Reward: -5727

Training episode 7102
Time steps: 247, Penalties: 73, Reward: -883

Training episode 7103
Time steps: 591, Penalties: 174, Reward: -2136

Training episode 7104
Time steps: 649, Penalties: 203, Reward: -2455

Training episode 7105
Time steps: 82, Penalties: 26, Reward: -295

Training episode 7106
Time steps: 86, Penalties: 15, Reward: -200

Training episode 7107
Time steps: 2973, Penalties: 948, Reward: -11484

Training episode 7108
Time steps: 31, Penalties: 6, Reward: -64

Training episode 7109
Time steps: 2786, Penalties: 925, Reward: -11090

Training episode 7110
Time steps: 564, Penalties: 173, Reward: -2100

Training episode 7111
Time steps: 177, Penalties: 58, Reward: -678

Training episode 7112
Time steps: 697, Penalties: 232, Reward: -2764

Training episode 7113
Time steps: 1162, Penalties: 338, Reward: -4183

Training episode 7114
Time step

Training episode 7224
Time steps: 1484, Penalties: 459, Reward: -5594

Training episode 7225
Time steps: 1341, Penalties: 452, Reward: -5388

Training episode 7226
Time steps: 1521, Penalties: 477, Reward: -5793

Training episode 7227
Time steps: 3359, Penalties: 1106, Reward: -13292

Training episode 7228
Time steps: 100, Penalties: 34, Reward: -385

Training episode 7229
Time steps: 922, Penalties: 277, Reward: -3394

Training episode 7230
Time steps: 9401, Penalties: 3050, Reward: -36830

Training episode 7231
Time steps: 138, Penalties: 42, Reward: -495

Training episode 7232
Time steps: 595, Penalties: 166, Reward: -2068

Training episode 7233
Time steps: 990, Penalties: 327, Reward: -3912

Training episode 7234
Time steps: 961, Penalties: 325, Reward: -3865

Training episode 7235
Time steps: 1877, Penalties: 625, Reward: -7481

Training episode 7236
Time steps: 219, Penalties: 60, Reward: -738

Training episode 7237
Time steps: 796, Penalties: 231, Reward: -2854

Training episode

Training episode 7343
Time steps: 752, Penalties: 229, Reward: -2792

Training episode 7344
Time steps: 349, Penalties: 105, Reward: -1273

Training episode 7345
Time steps: 923, Penalties: 289, Reward: -3503

Training episode 7346
Time steps: 1058, Penalties: 325, Reward: -3962

Training episode 7347
Time steps: 485, Penalties: 147, Reward: -1787

Training episode 7348
Time steps: 274, Penalties: 89, Reward: -1054

Training episode 7349
Time steps: 137, Penalties: 36, Reward: -440

Training episode 7350
Time steps: 721, Penalties: 234, Reward: -2806

Training episode 7351
Time steps: 2245, Penalties: 764, Reward: -9100

Training episode 7352
Time steps: 342, Penalties: 118, Reward: -1383

Training episode 7353
Time steps: 766, Penalties: 261, Reward: -3094

Training episode 7354
Time steps: 47, Penalties: 11, Reward: -125

Training episode 7355
Time steps: 1373, Penalties: 453, Reward: -5429

Training episode 7356
Time steps: 343, Penalties: 101, Reward: -1231

Training episode 7357
T

Training episode 7474
Time steps: 1257, Penalties: 413, Reward: -4953

Training episode 7475
Time steps: 135, Penalties: 45, Reward: -519

Training episode 7476
Time steps: 197, Penalties: 76, Reward: -860

Training episode 7477
Time steps: 1113, Penalties: 387, Reward: -4575

Training episode 7478
Time steps: 257, Penalties: 90, Reward: -1046

Training episode 7479
Time steps: 520, Penalties: 187, Reward: -2182

Training episode 7480
Time steps: 1138, Penalties: 342, Reward: -4195

Training episode 7481
Time steps: 172, Penalties: 49, Reward: -592

Training episode 7482
Time steps: 72, Penalties: 23, Reward: -258

Training episode 7483
Time steps: 2653, Penalties: 860, Reward: -10372

Training episode 7484
Time steps: 107, Penalties: 29, Reward: -347

Training episode 7485
Time steps: 1368, Penalties: 464, Reward: -5523

Training episode 7486
Time steps: 146, Penalties: 38, Reward: -467

Training episode 7487
Time steps: 37, Penalties: 11, Reward: -115

Training episode 7488
Time step

Training episode 7612
Time steps: 1517, Penalties: 496, Reward: -5960

Training episode 7613
Time steps: 35, Penalties: 10, Reward: -104

Training episode 7614
Time steps: 145, Penalties: 43, Reward: -511

Training episode 7615
Time steps: 130, Penalties: 43, Reward: -496

Training episode 7616
Time steps: 172, Penalties: 47, Reward: -574

Training episode 7617
Time steps: 1941, Penalties: 627, Reward: -7563

Training episode 7618
Time steps: 640, Penalties: 215, Reward: -2554

Training episode 7619
Time steps: 514, Penalties: 184, Reward: -2149

Training episode 7620
Time steps: 752, Penalties: 255, Reward: -3026

Training episode 7621
Time steps: 1954, Penalties: 643, Reward: -7720

Training episode 7622
Time steps: 863, Penalties: 259, Reward: -3173

Training episode 7623
Time steps: 507, Penalties: 158, Reward: -1908

Training episode 7624
Time steps: 388, Penalties: 122, Reward: -1465

Training episode 7625
Time steps: 783, Penalties: 249, Reward: -3003

Training episode 7626
Time

Training episode 7730
Time steps: 545, Penalties: 157, Reward: -1937

Training episode 7731
Time steps: 237, Penalties: 78, Reward: -918

Training episode 7732
Time steps: 552, Penalties: 167, Reward: -2034

Training episode 7733
Time steps: 979, Penalties: 314, Reward: -3784

Training episode 7734
Time steps: 476, Penalties: 165, Reward: -1940

Training episode 7735
Time steps: 109, Penalties: 29, Reward: -349

Training episode 7736
Time steps: 1329, Penalties: 433, Reward: -5205

Training episode 7737
Time steps: 546, Penalties: 160, Reward: -1965

Training episode 7738
Time steps: 393, Penalties: 109, Reward: -1353

Training episode 7739
Time steps: 1609, Penalties: 539, Reward: -6439

Training episode 7740
Time steps: 138, Penalties: 34, Reward: -423

Training episode 7741
Time steps: 144, Penalties: 48, Reward: -555

Training episode 7742
Time steps: 816, Penalties: 280, Reward: -3315

Training episode 7743
Time steps: 850, Penalties: 265, Reward: -3214

Training episode 7744
Time


Training episode 7857
Time steps: 918, Penalties: 310, Reward: -3687

Training episode 7858
Time steps: 79, Penalties: 24, Reward: -274

Training episode 7859
Time steps: 614, Penalties: 213, Reward: -2510

Training episode 7860
Time steps: 1644, Penalties: 539, Reward: -6474

Training episode 7861
Time steps: 447, Penalties: 147, Reward: -1749

Training episode 7862
Time steps: 227, Penalties: 73, Reward: -863

Training episode 7863
Time steps: 157, Penalties: 53, Reward: -613

Training episode 7864
Time steps: 2517, Penalties: 789, Reward: -9597

Training episode 7865
Time steps: 483, Penalties: 144, Reward: -1758

Training episode 7866
Time steps: 508, Penalties: 174, Reward: -2053

Training episode 7867
Time steps: 538, Penalties: 181, Reward: -2146

Training episode 7868
Time steps: 207, Penalties: 72, Reward: -834

Training episode 7869
Time steps: 1349, Penalties: 424, Reward: -5144

Training episode 7870
Time steps: 1564, Penalties: 512, Reward: -6151

Training episode 7871
Ti

Training episode 7982
Time steps: 108, Penalties: 33, Reward: -384

Training episode 7983
Time steps: 475, Penalties: 136, Reward: -1678

Training episode 7984
Time steps: 343, Penalties: 113, Reward: -1339

Training episode 7985
Time steps: 470, Penalties: 142, Reward: -1727

Training episode 7986
Time steps: 539, Penalties: 169, Reward: -2039

Training episode 7987
Time steps: 1136, Penalties: 359, Reward: -4346

Training episode 7988
Time steps: 10, Penalties: 1, Reward: 2

Training episode 7989
Time steps: 428, Penalties: 130, Reward: -1577

Training episode 7990
Time steps: 146, Penalties: 46, Reward: -539

Training episode 7991
Time steps: 783, Penalties: 257, Reward: -3075

Training episode 7992
Time steps: 1099, Penalties: 374, Reward: -4444

Training episode 7993
Time steps: 4063, Penalties: 1390, Reward: -16552

Training episode 7994
Time steps: 2143, Penalties: 701, Reward: -8431

Training episode 7995
Time steps: 465, Penalties: 154, Reward: -1830

Training episode 7996
Tim


Training episode 8100
Time steps: 1295, Penalties: 419, Reward: -5045

Training episode 8101
Time steps: 381, Penalties: 127, Reward: -1503

Training episode 8102
Time steps: 7, Penalties: 0, Reward: 14

Training episode 8103
Time steps: 125, Penalties: 37, Reward: -437

Training episode 8104
Time steps: 203, Penalties: 66, Reward: -776

Training episode 8105
Time steps: 58, Penalties: 16, Reward: -181

Training episode 8106
Time steps: 964, Penalties: 312, Reward: -3751

Training episode 8107
Time steps: 2014, Penalties: 680, Reward: -8113

Training episode 8108
Time steps: 476, Penalties: 155, Reward: -1850

Training episode 8109
Time steps: 1304, Penalties: 415, Reward: -5018

Training episode 8110
Time steps: 487, Penalties: 172, Reward: -2014

Training episode 8111
Time steps: 979, Penalties: 324, Reward: -3874

Training episode 8112
Time steps: 100, Penalties: 34, Reward: -385

Training episode 8113
Time steps: 239, Penalties: 77, Reward: -911

Training episode 8114
Time steps: 

Training episode 8218
Time steps: 4297, Penalties: 1364, Reward: -16552

Training episode 8219
Time steps: 986, Penalties: 310, Reward: -3755

Training episode 8220
Time steps: 731, Penalties: 239, Reward: -2861

Training episode 8221
Time steps: 606, Penalties: 196, Reward: -2349

Training episode 8222
Time steps: 1360, Penalties: 444, Reward: -5335

Training episode 8223
Time steps: 640, Penalties: 225, Reward: -2644

Training episode 8224
Time steps: 284, Penalties: 93, Reward: -1100

Training episode 8225
Time steps: 108, Penalties: 26, Reward: -321

Training episode 8226
Time steps: 1503, Penalties: 511, Reward: -6081

Training episode 8227
Time steps: 94, Penalties: 32, Reward: -361

Training episode 8228
Time steps: 359, Penalties: 108, Reward: -1310

Training episode 8229
Time steps: 187, Penalties: 43, Reward: -553

Training episode 8230
Time steps: 397, Penalties: 104, Reward: -1312

Training episode 8231
Time steps: 398, Penalties: 102, Reward: -1295

Training episode 8232
T

Training episode 8341
Time steps: 2480, Penalties: 803, Reward: -9686

Training episode 8342
Time steps: 858, Penalties: 246, Reward: -3051

Training episode 8343
Time steps: 1253, Penalties: 435, Reward: -5147

Training episode 8344
Time steps: 1092, Penalties: 347, Reward: -4194

Training episode 8345
Time steps: 814, Penalties: 279, Reward: -3304

Training episode 8346
Time steps: 676, Penalties: 214, Reward: -2581

Training episode 8347
Time steps: 274, Penalties: 90, Reward: -1063

Training episode 8348
Time steps: 1134, Penalties: 361, Reward: -4362

Training episode 8349
Time steps: 288, Penalties: 97, Reward: -1140

Training episode 8350
Time steps: 723, Penalties: 235, Reward: -2817

Training episode 8351
Time steps: 1146, Penalties: 348, Reward: -4257

Training episode 8352
Time steps: 2275, Penalties: 741, Reward: -8923

Training episode 8353
Time steps: 2185, Penalties: 735, Reward: -8779

Training episode 8354
Time steps: 979, Penalties: 301, Reward: -3667

Training episod

Training episode 8460
Time steps: 986, Penalties: 311, Reward: -3764

Training episode 8461
Time steps: 528, Penalties: 167, Reward: -2010

Training episode 8462
Time steps: 392, Penalties: 139, Reward: -1622

Training episode 8463
Time steps: 527, Penalties: 167, Reward: -2009

Training episode 8464
Time steps: 1227, Penalties: 414, Reward: -4932

Training episode 8465
Time steps: 819, Penalties: 255, Reward: -3093

Training episode 8466
Time steps: 224, Penalties: 71, Reward: -842

Training episode 8467
Time steps: 514, Penalties: 162, Reward: -1951

Training episode 8468
Time steps: 1626, Penalties: 518, Reward: -6267

Training episode 8469
Time steps: 139, Penalties: 40, Reward: -478

Training episode 8470
Time steps: 377, Penalties: 131, Reward: -1535

Training episode 8471
Time steps: 290, Penalties: 86, Reward: -1043

Training episode 8472
Time steps: 191, Penalties: 62, Reward: -728

Training episode 8473
Time steps: 2118, Penalties: 692, Reward: -8325

Training episode 8474
Ti

Training episode 8579
Time steps: 1044, Penalties: 352, Reward: -4191

Training episode 8580
Time steps: 452, Penalties: 147, Reward: -1754

Training episode 8581
Time steps: 299, Penalties: 90, Reward: -1088

Training episode 8582
Time steps: 617, Penalties: 203, Reward: -2423

Training episode 8583
Time steps: 836, Penalties: 276, Reward: -3299

Training episode 8584
Time steps: 143, Penalties: 33, Reward: -419

Training episode 8585
Time steps: 1272, Penalties: 398, Reward: -4833

Training episode 8586
Time steps: 1297, Penalties: 447, Reward: -5299

Training episode 8587
Time steps: 245, Penalties: 85, Reward: -989

Training episode 8588
Time steps: 472, Penalties: 155, Reward: -1846

Training episode 8589
Time steps: 1401, Penalties: 441, Reward: -5349

Training episode 8590
Time steps: 116, Penalties: 32, Reward: -383

Training episode 8591
Time steps: 811, Penalties: 257, Reward: -3103

Training episode 8592
Time steps: 254, Penalties: 72, Reward: -881

Training episode 8593
Tim

Training episode 8699
Time steps: 2072, Penalties: 698, Reward: -8333

Training episode 8700
Time steps: 438, Penalties: 132, Reward: -1605

Training episode 8701
Time steps: 954, Penalties: 307, Reward: -3696

Training episode 8702
Time steps: 2156, Penalties: 678, Reward: -8237

Training episode 8703
Time steps: 1190, Penalties: 360, Reward: -4409

Training episode 8704
Time steps: 1371, Penalties: 420, Reward: -5130

Training episode 8705
Time steps: 849, Penalties: 293, Reward: -3465

Training episode 8706
Time steps: 136, Penalties: 42, Reward: -493

Training episode 8707
Time steps: 839, Penalties: 262, Reward: -3176

Training episode 8708
Time steps: 323, Penalties: 102, Reward: -1220

Training episode 8709
Time steps: 1406, Penalties: 451, Reward: -5444

Training episode 8710
Time steps: 509, Penalties: 160, Reward: -1928

Training episode 8711
Time steps: 137, Penalties: 40, Reward: -476

Training episode 8712
Time steps: 14, Penalties: 1, Reward: -2

Training episode 8713
Tim

Training episode 8832
Time steps: 766, Penalties: 238, Reward: -2887

Training episode 8833
Time steps: 1869, Penalties: 616, Reward: -7392

Training episode 8834
Time steps: 464, Penalties: 155, Reward: -1838

Training episode 8835
Time steps: 1008, Penalties: 343, Reward: -4074

Training episode 8836
Time steps: 836, Penalties: 249, Reward: -3056

Training episode 8837
Time steps: 693, Penalties: 206, Reward: -2526

Training episode 8838
Time steps: 1576, Penalties: 533, Reward: -6352

Training episode 8839
Time steps: 1128, Penalties: 372, Reward: -4455

Training episode 8840
Time steps: 577, Penalties: 180, Reward: -2176

Training episode 8841
Time steps: 806, Penalties: 282, Reward: -3323

Training episode 8842
Time steps: 66, Penalties: 19, Reward: -216

Training episode 8843
Time steps: 309, Penalties: 116, Reward: -1332

Training episode 8844
Time steps: 1122, Penalties: 378, Reward: -4503

Training episode 8845
Time steps: 594, Penalties: 195, Reward: -2328

Training episode 8

Training episode 8964
Time steps: 1291, Penalties: 408, Reward: -4942

Training episode 8965
Time steps: 1380, Penalties: 437, Reward: -5292

Training episode 8966
Time steps: 1333, Penalties: 455, Reward: -5407

Training episode 8967
Time steps: 291, Penalties: 80, Reward: -990

Training episode 8968
Time steps: 772, Penalties: 230, Reward: -2821

Training episode 8969
Time steps: 601, Penalties: 198, Reward: -2362

Training episode 8970
Time steps: 827, Penalties: 251, Reward: -3065

Training episode 8971
Time steps: 1014, Penalties: 348, Reward: -4125

Training episode 8972
Time steps: 1101, Penalties: 388, Reward: -4572

Training episode 8973
Time steps: 1643, Penalties: 510, Reward: -6212

Training episode 8974
Time steps: 119, Penalties: 45, Reward: -503

Training episode 8975
Time steps: 554, Penalties: 186, Reward: -2207

Training episode 8976
Time steps: 216, Penalties: 61, Reward: -744

Training episode 8977
Time steps: 618, Penalties: 220, Reward: -2577

Training episode 897

Training episode 9087
Time steps: 1668, Penalties: 537, Reward: -6480

Training episode 9088
Time steps: 1382, Penalties: 467, Reward: -5564

Training episode 9089
Time steps: 410, Penalties: 128, Reward: -1541

Training episode 9090
Time steps: 2400, Penalties: 744, Reward: -9075

Training episode 9091
Time steps: 214, Penalties: 66, Reward: -787

Training episode 9092
Time steps: 2331, Penalties: 730, Reward: -8880

Training episode 9093
Time steps: 1900, Penalties: 609, Reward: -7360

Training episode 9094
Time steps: 2553, Penalties: 817, Reward: -9885

Training episode 9095
Time steps: 583, Penalties: 186, Reward: -2236

Training episode 9096
Time steps: 20, Penalties: 2, Reward: -17

Training episode 9097
Time steps: 1986, Penalties: 682, Reward: -8103

Training episode 9098
Time steps: 497, Penalties: 150, Reward: -1826

Training episode 9099
Time steps: 2180, Penalties: 723, Reward: -8666

Training episode 9100
Time steps: 190, Penalties: 65, Reward: -754

Training episode 9101

Training episode 9208
Time steps: 1416, Penalties: 448, Reward: -5427

Training episode 9209
Time steps: 999, Penalties: 323, Reward: -3885

Training episode 9210
Time steps: 477, Penalties: 148, Reward: -1788

Training episode 9211
Time steps: 162, Penalties: 47, Reward: -564

Training episode 9212
Time steps: 341, Penalties: 114, Reward: -1346

Training episode 9213
Time steps: 1828, Penalties: 584, Reward: -7063

Training episode 9214
Time steps: 630, Penalties: 203, Reward: -2436

Training episode 9215
Time steps: 777, Penalties: 253, Reward: -3033

Training episode 9216
Time steps: 1963, Penalties: 634, Reward: -7648

Training episode 9217
Time steps: 2200, Penalties: 725, Reward: -8704

Training episode 9218
Time steps: 848, Penalties: 285, Reward: -3392

Training episode 9219
Time steps: 152, Penalties: 35, Reward: -446

Training episode 9220
Time steps: 1115, Penalties: 376, Reward: -4478

Training episode 9221
Time steps: 1184, Penalties: 384, Reward: -4619

Training episode 9

Training episode 9329
Time steps: 2110, Penalties: 672, Reward: -8137

Training episode 9330
Time steps: 760, Penalties: 261, Reward: -3088

Training episode 9331
Time steps: 645, Penalties: 211, Reward: -2523

Training episode 9332
Time steps: 519, Penalties: 168, Reward: -2010

Training episode 9333
Time steps: 182, Penalties: 46, Reward: -575

Training episode 9334
Time steps: 410, Penalties: 136, Reward: -1613

Training episode 9335
Time steps: 1090, Penalties: 334, Reward: -4075

Training episode 9336
Time steps: 286, Penalties: 92, Reward: -1093

Training episode 9337
Time steps: 139, Penalties: 50, Reward: -568

Training episode 9338
Time steps: 511, Penalties: 168, Reward: -2002

Training episode 9339
Time steps: 2608, Penalties: 826, Reward: -10021

Training episode 9340
Time steps: 1101, Penalties: 368, Reward: -4392

Training episode 9341
Time steps: 414, Penalties: 144, Reward: -1689

Training episode 9342
Time steps: 437, Penalties: 150, Reward: -1766

Training episode 934

Training episode 9458
Time steps: 1275, Penalties: 427, Reward: -5097

Training episode 9459
Time steps: 727, Penalties: 246, Reward: -2920

Training episode 9460
Time steps: 841, Penalties: 269, Reward: -3241

Training episode 9461
Time steps: 538, Penalties: 159, Reward: -1948

Training episode 9462
Time steps: 790, Penalties: 259, Reward: -3100

Training episode 9463
Time steps: 504, Penalties: 138, Reward: -1725

Training episode 9464
Time steps: 124, Penalties: 45, Reward: -508

Training episode 9465
Time steps: 1367, Penalties: 448, Reward: -5378

Training episode 9466
Time steps: 118, Penalties: 35, Reward: -412

Training episode 9467
Time steps: 3762, Penalties: 1238, Reward: -14883

Training episode 9468
Time steps: 1853, Penalties: 573, Reward: -6989

Training episode 9469
Time steps: 483, Penalties: 162, Reward: -1920

Training episode 9470
Time steps: 429, Penalties: 137, Reward: -1641

Training episode 9471
Time steps: 3224, Penalties: 1070, Reward: -12833

Training episod


Training episode 9582
Time steps: 878, Penalties: 274, Reward: -3323

Training episode 9583
Time steps: 191, Penalties: 67, Reward: -773

Training episode 9584
Time steps: 119, Penalties: 29, Reward: -359

Training episode 9585
Time steps: 62, Penalties: 13, Reward: -158

Training episode 9586
Time steps: 1310, Penalties: 441, Reward: -5258

Training episode 9587
Time steps: 740, Penalties: 221, Reward: -2708

Training episode 9588
Time steps: 137, Penalties: 31, Reward: -395

Training episode 9589
Time steps: 435, Penalties: 115, Reward: -1449

Training episode 9590
Time steps: 998, Penalties: 305, Reward: -3722

Training episode 9591
Time steps: 1067, Penalties: 350, Reward: -4196

Training episode 9592
Time steps: 1404, Penalties: 477, Reward: -5676

Training episode 9593
Time steps: 59, Penalties: 18, Reward: -200

Training episode 9594
Time steps: 1334, Penalties: 422, Reward: -5111

Training episode 9595
Time steps: 686, Penalties: 205, Reward: -2510

Training episode 9596
Time 


Training episode 9712
Time steps: 330, Penalties: 101, Reward: -1218

Training episode 9713
Time steps: 1556, Penalties: 475, Reward: -5810

Training episode 9714
Time steps: 262, Penalties: 67, Reward: -844

Training episode 9715
Time steps: 628, Penalties: 196, Reward: -2371

Training episode 9716
Time steps: 896, Penalties: 281, Reward: -3404

Training episode 9717
Time steps: 757, Penalties: 267, Reward: -3139

Training episode 9718
Time steps: 556, Penalties: 186, Reward: -2209

Training episode 9719
Time steps: 698, Penalties: 203, Reward: -2504

Training episode 9720
Time steps: 312, Penalties: 95, Reward: -1146

Training episode 9721
Time steps: 326, Penalties: 101, Reward: -1214

Training episode 9722
Time steps: 189, Penalties: 68, Reward: -780

Training episode 9723
Time steps: 631, Penalties: 193, Reward: -2347

Training episode 9724
Time steps: 1875, Penalties: 605, Reward: -7299

Training episode 9725
Time steps: 315, Penalties: 99, Reward: -1185

Training episode 9726
T


Training episode 9836
Time steps: 2547, Penalties: 885, Reward: -10491

Training episode 9837
Time steps: 1025, Penalties: 334, Reward: -4010

Training episode 9838
Time steps: 1748, Penalties: 557, Reward: -6740

Training episode 9839
Time steps: 454, Penalties: 145, Reward: -1738

Training episode 9840
Time steps: 1848, Penalties: 562, Reward: -6885

Training episode 9841
Time steps: 366, Penalties: 113, Reward: -1362

Training episode 9842
Time steps: 481, Penalties: 153, Reward: -1837

Training episode 9843
Time steps: 272, Penalties: 88, Reward: -1043

Training episode 9844
Time steps: 2063, Penalties: 645, Reward: -7847

Training episode 9845
Time steps: 997, Penalties: 310, Reward: -3766

Training episode 9846
Time steps: 1625, Penalties: 515, Reward: -6239

Training episode 9847
Time steps: 763, Penalties: 242, Reward: -2920

Training episode 9848
Time steps: 170, Penalties: 47, Reward: -572

Training episode 9849
Time steps: 1229, Penalties: 417, Reward: -4961

Training episo


Training episode 9956
Time steps: 3145, Penalties: 1080, Reward: -12844

Training episode 9957
Time steps: 560, Penalties: 189, Reward: -2240

Training episode 9958
Time steps: 732, Penalties: 246, Reward: -2925

Training episode 9959
Time steps: 2193, Penalties: 710, Reward: -8562

Training episode 9960
Time steps: 419, Penalties: 138, Reward: -1640

Training episode 9961
Time steps: 1292, Penalties: 375, Reward: -4646

Training episode 9962
Time steps: 642, Penalties: 200, Reward: -2421

Training episode 9963
Time steps: 1227, Penalties: 418, Reward: -4968

Training episode 9964
Time steps: 421, Penalties: 129, Reward: -1561

Training episode 9965
Time steps: 1785, Penalties: 592, Reward: -7092

Training episode 9966
Time steps: 1610, Penalties: 515, Reward: -6224

Training episode 9967
Time steps: 4080, Penalties: 1341, Reward: -16128

Training episode 9968
Time steps: 658, Penalties: 224, Reward: -2653

Training episode 9969
Time steps: 447, Penalties: 149, Reward: -1767

Training

## Evaluation

In [5]:
NUM_EPISODES = 100


def evaluate_agent(q_table, env, num_trials):
    total_epochs, total_penalties = 0, 0

    print("Running episodes...")
    for _ in range(num_trials):
        state = env.reset()
        epochs, num_penalties, reward = 0, 0, 0

        while reward != 20:
            next_action = select_optimal_action(q_table,
                                                state,
                                                env.action_space)
            state, reward, _, _ = env.step(next_action)

            if reward == -10:
                num_penalties += 1

            epochs += 1

        total_penalties += num_penalties
        total_epochs += epochs

    average_time = total_epochs / float(num_trials)
    average_penalties = total_penalties / float(num_trials)
    print("Evaluation results after {} trials".format(num_trials))
    print("Average time steps taken: {}".format(average_time))
    print("Average number of penalties incurred: {}".format(average_penalties))


env = gym.make("Taxi-v3")
with open("q_table.pickle", 'rb') as f:
    q_table = pickle.load(f)
evaluate_agent(q_table, env, NUM_EPISODES)

Running episodes...
Evaluation results after 100 trials
Average time steps taken: 807.75
Average number of penalties incurred: 262.44
