## Taxi-V3 using SARSA made by Andrea Bolla - 4482930

For a learning agent in any Reinforcement Learning algorithm it’s policy can be of two types:

    On Policy: the learning agent learns the value function according to the current action derived from the policy currently being used 
    
    Off Policy: the learning agent learns the value function according to the action derived from another policy 

SARSA technique is an On Policy and uses the action performed by the current policy to learn the Q-value.

The equation for SARSA depends on the current state, current action, reward obtained, next state and next action. SARSA stands for State Action Reward State Action which symbolizes the tuple (s, a, r, s’, a’).

# Step 1: Importing the required libraries

In [1]:
import sys
sys.path.append('/Users/boez/opt/anaconda3/lib/python3.9/site-packages')

import gym
import numpy as np
import random

sys: is used for the path of my libraries (NB change it with yours)

gym: is a standard API for reinforcement learning, and has diverse collection of reference environments

numpy: is a Python library used for working with arrays

random:  is a module to implement pseudo-random number generators

# Step 2: Building the environment

In [2]:
env = gym.make("Taxi-v3", render_mode="human").env

env.reset()
env.render()

print("Action Space {}".format(env.action_space))

print("State Space {}".format(env.observation_space))

Action Space Discrete(6)
State Space Discrete(500)


env.reset: Resets the environment and returns a random initial state.

env.step(action): Step the environment by one timestep. Returns:
    
    observation: Observations of the environment

    reward: If your action was beneficial or not

    done: Indicates if we have successfully picked up and dropped off a passenger, also called one episode

    truncated: if episode truncates due to a time limit or a reason that is not defined as part of the task MDP.

    info: Additional info such as performance and latency for debugging purposes

env.render: Renders one frame of the environment from 0-5 where:

    0 = south
    1 = north
    2 = east
    3 = west
    4 = pickup
    5 = dropoff

The 500 states correspond to a encoding of the taxi's location, the passenger's location, and the destination location

In [3]:
env.P[123]

{0: [(1.0, 223, -1, False)],
 1: [(1.0, 23, -1, False)],
 2: [(1.0, 123, -1, False)],
 3: [(1.0, 103, -1, False)],
 4: [(1.0, 123, -10, False)],
 5: [(1.0, 123, -10, False)]}

The 0-5 corresponds to the actions (south, north, east, west, pickup, dropoff) the taxi can perform at our current state in the illustration.

In this env, probability is always 1.0.

The nextstate is the state we would be in if we take the action at this index of the dict

All the movement actions have a -1 reward 
the pickup/dropoff actions have -10 reward in this particular state. 
If we are in a state where the taxi has a passenger and is on top of the right destination, we would see a reward of 20 at the dropoff action 

done is used to tell us when we have successfully dropped off a passenger in the right location. 

Each successfull dropoff is the end of an episode

# Step 3: Initializing different parameters

In [4]:
# Defining parameters
epsilon = 1 # Total exploration and no exploitation
alpha = 0.3
gamma = 0.95

# Training parameters
max_episodes = 100000  # number of episodes to use for training
max_steps = 100   # maximum number of steps per episode

#Initializing the Q-table 500x6
Q = np.zeros((env.observation_space.n, env.action_space.n))

The equation for SARSA depends on the current state, current action, reward obtained, next state and next action.

Q(s_t,a_t) = Q(s_t,a_t) + alpha* ( r_(t+1) + gamma* Q(s_(t+1),a_(t+1)) - Q(s_t,a_t) )

    
Where: 
     
    ϵ (epsilon) is the paramenter which choose between exploration (choosing a random action) and exploitation (choosing actions based on already learned Q-values). 
    
    α (alpha) is the learning rate, it is the extent to which our Q-values are being updated in every iteration.
    
    γ (gamma) is the discount factor determines how much importance we want to give to future rewards. A high value for the discount factor (close to 1) captures the long-term effective award, insted a discount factor of 0 makes our agent consider only immediate reward (greedy).

# Step 4: Defining functions for the learning process

In [5]:
# Function to choose the next action
def choose_action(state):

    action=0

    if np.random.uniform(0, 1) < epsilon:

        action = env.action_space.sample()   # explore

    else:
        
        action = np.argmax(Q[state, :])      # exploit
        
    return action

# Function to update the Q-value
def update(state, state2, reward, action, action2):
    
    predict = Q[state, action]
    target = reward + gamma * Q[state2, action2]
    
    Q[state, action] = Q[state, action] + alpha * (target - predict)

Choose_action() allow the agent to choose the next action, it all depens on the random number and the epsilon value, if epsilon is bigger than the random number the agent will choose to explore, otherwise to exploit and choose in the update Q table the best action.

Update() is used to update the Q table, following the SARSA equation, when the agent choose to explore.

# Step 5: Training the learning agent

In [6]:
# Initializing parameters
score = 0
goal = 0 
episode = 0


# Starting the SARSA learning
for episode in range(max_episodes):

    # Print each episode
    print("Episode: ", episode)
    print("Goal reached: ", goal)
    print("Score: ", score)
    print(" ")
    
    state1 = env.reset()
    state1 = state1[0]
    action1 = choose_action(state1)
    
    # Reset parameters
    s = 0
    score = 0
    done = False
    
    # Decreasing epsilon
    epsilon -= 0.005
    
    if epsilon < 0.05:
        epsilon = 0.05    # epsilon min: total exploitation (95%)
 

    for s in range(max_steps):
        
        # Rendering
        env.render()
         
        # Get the next state
        state2, reward, done, truncate, info = env.step(action1)
 
        # Choose the next action
        action2 = choose_action(state2)
         
        # Update the Q-value
        update(state1, state2, reward, action1, action2)
 
        state1 = state2
        action1 = action2
         
        # Updating parameters
        s += 1
        score += reward
        
        if reward==20:
            goal +=1
         
        #If at the end of learning process
        if done:
            break

Episode:  0
Goal reached:  0
Score:  0
 
Episode:  1
Goal reached:  0
Score:  -415
 
Episode:  2
Goal reached:  0
Score:  -433
 
Episode:  3
Goal reached:  0
Score:  -433
 
Episode:  4
Goal reached:  0
Score:  -397
 
Episode:  5
Goal reached:  0
Score:  -379
 
Episode:  6
Goal reached:  0
Score:  -442
 
Episode:  7
Goal reached:  0
Score:  -397
 
Episode:  8
Goal reached:  0
Score:  -361
 
Episode:  9
Goal reached:  0
Score:  -415
 
Episode:  10
Goal reached:  0
Score:  -424
 
Episode:  11
Goal reached:  0
Score:  -298
 
Episode:  12
Goal reached:  0
Score:  -352
 
Episode:  13
Goal reached:  0
Score:  -406
 
Episode:  14
Goal reached:  0
Score:  -361
 
Episode:  15
Goal reached:  0
Score:  -334
 
Episode:  16
Goal reached:  0
Score:  -298
 
Episode:  17
Goal reached:  0
Score:  -388
 
Episode:  18
Goal reached:  0
Score:  -388
 
Episode:  19
Goal reached:  0
Score:  -361
 
Episode:  20
Goal reached:  0
Score:  -370
 
Episode:  21
Goal reached:  0
Score:  -316
 
Episode:  22
Goal reach

Episode:  181
Goal reached:  11
Score:  -56
 
Episode:  182
Goal reached:  11
Score:  -109
 
Episode:  183
Goal reached:  12
Score:  -95
 
Episode:  184
Goal reached:  12
Score:  -136
 
Episode:  185
Goal reached:  13
Score:  -56
 
Episode:  186
Goal reached:  13
Score:  -136
 
Episode:  187
Goal reached:  14
Score:  -91
 
Episode:  188
Goal reached:  15
Score:  -93
 
Episode:  189
Goal reached:  15
Score:  -136
 
Episode:  190
Goal reached:  16
Score:  -91
 
Episode:  191
Goal reached:  17
Score:  3
 
Episode:  192
Goal reached:  18
Score:  -63
 
Episode:  193
Goal reached:  18
Score:  -118
 
Episode:  194
Goal reached:  18
Score:  -118
 
Episode:  195
Goal reached:  18
Score:  -100
 
Episode:  196
Goal reached:  19
Score:  -23
 
Episode:  197
Goal reached:  19
Score:  -109
 
Episode:  198
Goal reached:  20
Score:  -36
 
Episode:  199
Goal reached:  21
Score:  -11
 
Episode:  200
Goal reached:  21
Score:  -118
 
Episode:  201
Goal reached:  22
Score:  -50
 
Episode:  202
Goal reached:

Episode:  358
Goal reached:  141
Score:  2
 
Episode:  359
Goal reached:  142
Score:  -27
 
Episode:  360
Goal reached:  143
Score:  -25
 
Episode:  361
Goal reached:  144
Score:  -29
 
Episode:  362
Goal reached:  145
Score:  7
 
Episode:  363
Goal reached:  146
Score:  9
 
Episode:  364
Goal reached:  147
Score:  -7
 
Episode:  365
Goal reached:  148
Score:  -24
 
Episode:  366
Goal reached:  149
Score:  -10
 
Episode:  367
Goal reached:  150
Score:  12
 
Episode:  368
Goal reached:  151
Score:  -21
 
Episode:  369
Goal reached:  152
Score:  -14
 
Episode:  370
Goal reached:  153
Score:  -89
 
Episode:  371
Goal reached:  154
Score:  7
 
Episode:  372
Goal reached:  154
Score:  -109
 
Episode:  373
Goal reached:  155
Score:  8
 
Episode:  374
Goal reached:  156
Score:  -85
 
Episode:  375
Goal reached:  157
Score:  -2
 
Episode:  376
Goal reached:  158
Score:  12
 
Episode:  377
Goal reached:  159
Score:  -49
 
Episode:  378
Goal reached:  160
Score:  -35
 
Episode:  379
Goal reached

Episode:  535
Goal reached:  314
Score:  6
 
Episode:  536
Goal reached:  315
Score:  -13
 
Episode:  537
Goal reached:  316
Score:  5
 
Episode:  538
Goal reached:  317
Score:  4
 
Episode:  539
Goal reached:  318
Score:  -1
 
Episode:  540
Goal reached:  319
Score:  10
 
Episode:  541
Goal reached:  320
Score:  -20
 
Episode:  542
Goal reached:  321
Score:  6
 
Episode:  543
Goal reached:  322
Score:  -3
 
Episode:  544
Goal reached:  323
Score:  10
 
Episode:  545
Goal reached:  324
Score:  -27
 
Episode:  546
Goal reached:  325
Score:  -19
 
Episode:  547
Goal reached:  326
Score:  7
 
Episode:  548
Goal reached:  327
Score:  6
 
Episode:  549
Goal reached:  328
Score:  9
 
Episode:  550
Goal reached:  329
Score:  5
 
Episode:  551
Goal reached:  330
Score:  -13
 
Episode:  552
Goal reached:  331
Score:  5
 
Episode:  553
Goal reached:  332
Score:  -4
 
Episode:  554
Goal reached:  333
Score:  -7
 
Episode:  555
Goal reached:  334
Score:  13
 
Episode:  556
Goal reached:  335
Score

Episode:  714
Goal reached:  493
Score:  4
 
Episode:  715
Goal reached:  494
Score:  7
 
Episode:  716
Goal reached:  495
Score:  12
 
Episode:  717
Goal reached:  496
Score:  7
 
Episode:  718
Goal reached:  497
Score:  9
 
Episode:  719
Goal reached:  498
Score:  -23
 
Episode:  720
Goal reached:  499
Score:  -9
 
Episode:  721
Goal reached:  500
Score:  4
 
Episode:  722
Goal reached:  501
Score:  -9
 
Episode:  723
Goal reached:  502
Score:  1
 
Episode:  724
Goal reached:  503
Score:  8
 
Episode:  725
Goal reached:  504
Score:  -27
 
Episode:  726
Goal reached:  505
Score:  1
 
Episode:  727
Goal reached:  506
Score:  6
 
Episode:  728
Goal reached:  507
Score:  8
 
Episode:  729
Goal reached:  508
Score:  -3
 
Episode:  730
Goal reached:  509
Score:  6
 
Episode:  731
Goal reached:  510
Score:  -7
 
Episode:  732
Goal reached:  511
Score:  -6
 
Episode:  733
Goal reached:  512
Score:  4
 
Episode:  734
Goal reached:  513
Score:  13
 
Episode:  735
Goal reached:  514
Score:  10


Episode:  894
Goal reached:  673
Score:  14
 
Episode:  895
Goal reached:  674
Score:  -14
 
Episode:  896
Goal reached:  675
Score:  7
 
Episode:  897
Goal reached:  676
Score:  12
 
Episode:  898
Goal reached:  677
Score:  1
 
Episode:  899
Goal reached:  678
Score:  -9
 
Episode:  900
Goal reached:  679
Score:  -11
 
Episode:  901
Goal reached:  680
Score:  6
 
Episode:  902
Goal reached:  681
Score:  -37
 
Episode:  903
Goal reached:  682
Score:  7
 
Episode:  904
Goal reached:  683
Score:  6
 
Episode:  905
Goal reached:  684
Score:  -50
 
Episode:  906
Goal reached:  685
Score:  -82
 
Episode:  907
Goal reached:  686
Score:  11
 
Episode:  908
Goal reached:  687
Score:  8
 
Episode:  909
Goal reached:  688
Score:  1
 
Episode:  910
Goal reached:  689
Score:  8
 
Episode:  911
Goal reached:  690
Score:  6
 
Episode:  912
Goal reached:  691
Score:  -14
 
Episode:  913
Goal reached:  692
Score:  8
 
Episode:  914
Goal reached:  693
Score:  4
 
Episode:  915
Goal reached:  694
Score:

Episode:  1073
Goal reached:  852
Score:  6
 
Episode:  1074
Goal reached:  853
Score:  0
 
Episode:  1075
Goal reached:  854
Score:  11
 
Episode:  1076
Goal reached:  855
Score:  10
 
Episode:  1077
Goal reached:  856
Score:  1
 
Episode:  1078
Goal reached:  857
Score:  -8
 
Episode:  1079
Goal reached:  858
Score:  8
 
Episode:  1080
Goal reached:  859
Score:  2
 
Episode:  1081
Goal reached:  860
Score:  11
 
Episode:  1082
Goal reached:  861
Score:  13
 
Episode:  1083
Goal reached:  862
Score:  -4
 
Episode:  1084
Goal reached:  863
Score:  0
 
Episode:  1085
Goal reached:  864
Score:  7
 
Episode:  1086
Goal reached:  865
Score:  -3
 
Episode:  1087
Goal reached:  866
Score:  6
 
Episode:  1088
Goal reached:  867
Score:  9
 
Episode:  1089
Goal reached:  868
Score:  6
 
Episode:  1090
Goal reached:  869
Score:  6
 
Episode:  1091
Goal reached:  870
Score:  -7
 
Episode:  1092
Goal reached:  871
Score:  11
 
Episode:  1093
Goal reached:  872
Score:  -2
 
Episode:  1094
Goal reac

Episode:  1249
Goal reached:  1028
Score:  -1
 
Episode:  1250
Goal reached:  1029
Score:  11
 
Episode:  1251
Goal reached:  1030
Score:  1
 
Episode:  1252
Goal reached:  1031
Score:  7
 
Episode:  1253
Goal reached:  1032
Score:  1
 
Episode:  1254
Goal reached:  1033
Score:  10
 
Episode:  1255
Goal reached:  1034
Score:  11
 
Episode:  1256
Goal reached:  1035
Score:  7
 
Episode:  1257
Goal reached:  1036
Score:  3
 
Episode:  1258
Goal reached:  1037
Score:  6
 
Episode:  1259
Goal reached:  1038
Score:  9
 
Episode:  1260
Goal reached:  1039
Score:  6
 
Episode:  1261
Goal reached:  1040
Score:  7
 
Episode:  1262
Goal reached:  1041
Score:  8
 
Episode:  1263
Goal reached:  1042
Score:  -4
 
Episode:  1264
Goal reached:  1043
Score:  11
 
Episode:  1265
Goal reached:  1044
Score:  11
 
Episode:  1266
Goal reached:  1045
Score:  -15
 
Episode:  1267
Goal reached:  1046
Score:  13
 
Episode:  1268
Goal reached:  1047
Score:  4
 
Episode:  1269
Goal reached:  1048
Score:  8
 
Epi

Episode:  1423
Goal reached:  1202
Score:  8
 
Episode:  1424
Goal reached:  1203
Score:  12
 
Episode:  1425
Goal reached:  1204
Score:  10
 
Episode:  1426
Goal reached:  1205
Score:  5
 
Episode:  1427
Goal reached:  1206
Score:  -7
 
Episode:  1428
Goal reached:  1207
Score:  9
 
Episode:  1429
Goal reached:  1208
Score:  7
 
Episode:  1430
Goal reached:  1209
Score:  -5
 
Episode:  1431
Goal reached:  1210
Score:  10
 
Episode:  1432
Goal reached:  1211
Score:  9
 
Episode:  1433
Goal reached:  1212
Score:  8
 
Episode:  1434
Goal reached:  1213
Score:  -23
 
Episode:  1435
Goal reached:  1214
Score:  10
 
Episode:  1436
Goal reached:  1215
Score:  4
 
Episode:  1437
Goal reached:  1216
Score:  -5
 
Episode:  1438
Goal reached:  1217
Score:  9
 
Episode:  1439
Goal reached:  1218
Score:  6
 
Episode:  1440
Goal reached:  1219
Score:  10
 
Episode:  1441
Goal reached:  1220
Score:  7
 
Episode:  1442
Goal reached:  1221
Score:  4
 
Episode:  1443
Goal reached:  1222
Score:  10
 
Ep

Episode:  1597
Goal reached:  1376
Score:  8
 
Episode:  1598
Goal reached:  1377
Score:  6
 
Episode:  1599
Goal reached:  1378
Score:  2
 
Episode:  1600
Goal reached:  1379
Score:  7
 
Episode:  1601
Goal reached:  1380
Score:  13
 
Episode:  1602
Goal reached:  1381
Score:  9
 
Episode:  1603
Goal reached:  1382
Score:  8
 
Episode:  1604
Goal reached:  1383
Score:  8
 
Episode:  1605
Goal reached:  1384
Score:  10
 
Episode:  1606
Goal reached:  1385
Score:  10
 
Episode:  1607
Goal reached:  1386
Score:  9
 
Episode:  1608
Goal reached:  1387
Score:  7
 
Episode:  1609
Goal reached:  1388
Score:  -3
 
Episode:  1610
Goal reached:  1389
Score:  -3
 
Episode:  1611
Goal reached:  1390
Score:  7
 
Episode:  1612
Goal reached:  1391
Score:  8
 
Episode:  1613
Goal reached:  1392
Score:  3
 
Episode:  1614
Goal reached:  1393
Score:  -3
 
Episode:  1615
Goal reached:  1394
Score:  6
 
Episode:  1616
Goal reached:  1395
Score:  7
 
Episode:  1617
Goal reached:  1396
Score:  14
 
Episod

Episode:  1771
Goal reached:  1550
Score:  12
 
Episode:  1772
Goal reached:  1551
Score:  6
 
Episode:  1773
Goal reached:  1552
Score:  -4
 
Episode:  1774
Goal reached:  1553
Score:  -5
 
Episode:  1775
Goal reached:  1554
Score:  6
 
Episode:  1776
Goal reached:  1555
Score:  10
 
Episode:  1777
Goal reached:  1556
Score:  -1
 
Episode:  1778
Goal reached:  1557
Score:  5
 
Episode:  1779
Goal reached:  1558
Score:  6
 
Episode:  1780
Goal reached:  1559
Score:  10
 
Episode:  1781
Goal reached:  1560
Score:  9
 
Episode:  1782
Goal reached:  1561
Score:  0
 
Episode:  1783
Goal reached:  1562
Score:  10
 
Episode:  1784
Goal reached:  1563
Score:  7
 
Episode:  1785
Goal reached:  1564
Score:  -7
 
Episode:  1786
Goal reached:  1565
Score:  -10
 
Episode:  1787
Goal reached:  1566
Score:  7
 
Episode:  1788
Goal reached:  1567
Score:  6
 
Episode:  1789
Goal reached:  1568
Score:  10
 
Episode:  1790
Goal reached:  1569
Score:  4
 
Episode:  1791
Goal reached:  1570
Score:  11
 
E

Episode:  1944
Goal reached:  1723
Score:  11
 
Episode:  1945
Goal reached:  1724
Score:  0
 
Episode:  1946
Goal reached:  1725
Score:  6
 
Episode:  1947
Goal reached:  1726
Score:  4
 
Episode:  1948
Goal reached:  1727
Score:  -4
 
Episode:  1949
Goal reached:  1728
Score:  7
 
Episode:  1950
Goal reached:  1729
Score:  6
 
Episode:  1951
Goal reached:  1730
Score:  1
 
Episode:  1952
Goal reached:  1731
Score:  8
 
Episode:  1953
Goal reached:  1732
Score:  8
 
Episode:  1954
Goal reached:  1733
Score:  0
 
Episode:  1955
Goal reached:  1734
Score:  3
 
Episode:  1956
Goal reached:  1735
Score:  7
 
Episode:  1957
Goal reached:  1736
Score:  7
 
Episode:  1958
Goal reached:  1737
Score:  4
 
Episode:  1959
Goal reached:  1738
Score:  10
 
Episode:  1960
Goal reached:  1739
Score:  1
 
Episode:  1961
Goal reached:  1740
Score:  8
 
Episode:  1962
Goal reached:  1741
Score:  10
 
Episode:  1963
Goal reached:  1742
Score:  9
 
Episode:  1964
Goal reached:  1743
Score:  7
 
Episode: 

Episode:  2117
Goal reached:  1896
Score:  -6
 
Episode:  2118
Goal reached:  1897
Score:  -13
 
Episode:  2119
Goal reached:  1898
Score:  7
 
Episode:  2120
Goal reached:  1899
Score:  7
 
Episode:  2121
Goal reached:  1900
Score:  1
 
Episode:  2122
Goal reached:  1901
Score:  10
 
Episode:  2123
Goal reached:  1902
Score:  0
 
Episode:  2124
Goal reached:  1903
Score:  10
 
Episode:  2125
Goal reached:  1904
Score:  10
 
Episode:  2126
Goal reached:  1905
Score:  3
 
Episode:  2127
Goal reached:  1906
Score:  11
 
Episode:  2128
Goal reached:  1907
Score:  -1
 
Episode:  2129
Goal reached:  1908
Score:  -14
 
Episode:  2130
Goal reached:  1909
Score:  6
 
Episode:  2131
Goal reached:  1910
Score:  7
 
Episode:  2132
Goal reached:  1911
Score:  -1
 
Episode:  2133
Goal reached:  1912
Score:  10
 
Episode:  2134
Goal reached:  1913
Score:  6
 
Episode:  2135
Goal reached:  1914
Score:  6
 
Episode:  2136
Goal reached:  1915
Score:  9
 
Episode:  2137
Goal reached:  1916
Score:  5
 
E

Episode:  2290
Goal reached:  2069
Score:  9
 
Episode:  2291
Goal reached:  2070
Score:  8
 
Episode:  2292
Goal reached:  2071
Score:  10
 
Episode:  2293
Goal reached:  2072
Score:  4
 
Episode:  2294
Goal reached:  2073
Score:  7
 
Episode:  2295
Goal reached:  2074
Score:  6
 
Episode:  2296
Goal reached:  2075
Score:  8
 
Episode:  2297
Goal reached:  2076
Score:  2
 
Episode:  2298
Goal reached:  2077
Score:  8
 
Episode:  2299
Goal reached:  2078
Score:  7
 
Episode:  2300
Goal reached:  2079
Score:  7
 
Episode:  2301
Goal reached:  2080
Score:  5
 
Episode:  2302
Goal reached:  2081
Score:  -1
 
Episode:  2303
Goal reached:  2082
Score:  0
 
Episode:  2304
Goal reached:  2083
Score:  -12
 
Episode:  2305
Goal reached:  2084
Score:  8
 
Episode:  2306
Goal reached:  2085
Score:  9
 
Episode:  2307
Goal reached:  2086
Score:  6
 
Episode:  2308
Goal reached:  2087
Score:  6
 
Episode:  2309
Goal reached:  2088
Score:  7
 
Episode:  2310
Goal reached:  2089
Score:  14
 
Episode:

Episode:  2463
Goal reached:  2242
Score:  7
 
Episode:  2464
Goal reached:  2243
Score:  -3
 
Episode:  2465
Goal reached:  2244
Score:  6
 
Episode:  2466
Goal reached:  2245
Score:  8
 
Episode:  2467
Goal reached:  2246
Score:  8
 
Episode:  2468
Goal reached:  2247
Score:  -7
 
Episode:  2469
Goal reached:  2248
Score:  8
 
Episode:  2470
Goal reached:  2249
Score:  7
 
Episode:  2471
Goal reached:  2250
Score:  2
 
Episode:  2472
Goal reached:  2251
Score:  9
 
Episode:  2473
Goal reached:  2252
Score:  5
 
Episode:  2474
Goal reached:  2253
Score:  -4
 
Episode:  2475
Goal reached:  2254
Score:  -6
 
Episode:  2476
Goal reached:  2255
Score:  12
 
Episode:  2477
Goal reached:  2256
Score:  10
 
Episode:  2478
Goal reached:  2257
Score:  4
 
Episode:  2479
Goal reached:  2258
Score:  5
 
Episode:  2480
Goal reached:  2259
Score:  5
 
Episode:  2481
Goal reached:  2260
Score:  11
 
Episode:  2482
Goal reached:  2261
Score:  -6
 
Episode:  2483
Goal reached:  2262
Score:  7
 
Episo

Episode:  2637
Goal reached:  2416
Score:  11
 
Episode:  2638
Goal reached:  2417
Score:  4
 
Episode:  2639
Goal reached:  2418
Score:  4
 
Episode:  2640
Goal reached:  2419
Score:  13
 
Episode:  2641
Goal reached:  2420
Score:  6
 
Episode:  2642
Goal reached:  2421
Score:  4
 
Episode:  2643
Goal reached:  2422
Score:  8
 
Episode:  2644
Goal reached:  2423
Score:  5
 
Episode:  2645
Goal reached:  2424
Score:  3
 
Episode:  2646
Goal reached:  2425
Score:  6
 
Episode:  2647
Goal reached:  2426
Score:  8
 
Episode:  2648
Goal reached:  2427
Score:  7
 
Episode:  2649
Goal reached:  2428
Score:  3
 
Episode:  2650
Goal reached:  2429
Score:  7
 
Episode:  2651
Goal reached:  2430
Score:  9
 
Episode:  2652
Goal reached:  2431
Score:  9
 
Episode:  2653
Goal reached:  2432
Score:  1
 
Episode:  2654
Goal reached:  2433
Score:  -3
 
Episode:  2655
Goal reached:  2434
Score:  9
 
Episode:  2656
Goal reached:  2435
Score:  5
 
Episode:  2657
Goal reached:  2436
Score:  12
 
Episode: 

Episode:  2810
Goal reached:  2589
Score:  9
 
Episode:  2811
Goal reached:  2590
Score:  9
 
Episode:  2812
Goal reached:  2591
Score:  9
 
Episode:  2813
Goal reached:  2592
Score:  9
 
Episode:  2814
Goal reached:  2593
Score:  5
 
Episode:  2815
Goal reached:  2594
Score:  -7
 
Episode:  2816
Goal reached:  2595
Score:  3
 
Episode:  2817
Goal reached:  2596
Score:  14
 
Episode:  2818
Goal reached:  2597
Score:  9
 
Episode:  2819
Goal reached:  2598
Score:  1
 
Episode:  2820
Goal reached:  2599
Score:  12
 
Episode:  2821
Goal reached:  2600
Score:  5
 
Episode:  2822
Goal reached:  2601
Score:  11
 
Episode:  2823
Goal reached:  2602
Score:  5
 
Episode:  2824
Goal reached:  2603
Score:  8
 
Episode:  2825
Goal reached:  2604
Score:  8
 
Episode:  2826
Goal reached:  2605
Score:  6
 
Episode:  2827
Goal reached:  2606
Score:  9
 
Episode:  2828
Goal reached:  2607
Score:  1
 
Episode:  2829
Goal reached:  2608
Score:  0
 
Episode:  2830
Goal reached:  2609
Score:  -5
 
Episode:

Episode:  2983
Goal reached:  2762
Score:  8
 
Episode:  2984
Goal reached:  2763
Score:  4
 
Episode:  2985
Goal reached:  2764
Score:  9
 
Episode:  2986
Goal reached:  2765
Score:  -7
 
Episode:  2987
Goal reached:  2766
Score:  5
 
Episode:  2988
Goal reached:  2767
Score:  8
 
Episode:  2989
Goal reached:  2768
Score:  7
 
Episode:  2990
Goal reached:  2769
Score:  8
 
Episode:  2991
Goal reached:  2770
Score:  7
 
Episode:  2992
Goal reached:  2771
Score:  3
 
Episode:  2993
Goal reached:  2772
Score:  10
 
Episode:  2994
Goal reached:  2773
Score:  15
 
Episode:  2995
Goal reached:  2774
Score:  10
 
Episode:  2996
Goal reached:  2775
Score:  -7
 
Episode:  2997
Goal reached:  2776
Score:  -3
 
Episode:  2998
Goal reached:  2777
Score:  4
 
Episode:  2999
Goal reached:  2778
Score:  -3
 
Episode:  3000
Goal reached:  2779
Score:  -8
 
Episode:  3001
Goal reached:  2780
Score:  8
 
Episode:  3002
Goal reached:  2781
Score:  -3
 
Episode:  3003
Goal reached:  2782
Score:  -2
 
Epi

Episode:  3156
Goal reached:  2935
Score:  -3
 
Episode:  3157
Goal reached:  2936
Score:  4
 
Episode:  3158
Goal reached:  2937
Score:  10
 
Episode:  3159
Goal reached:  2938
Score:  6
 
Episode:  3160
Goal reached:  2939
Score:  3
 
Episode:  3161
Goal reached:  2940
Score:  -2
 
Episode:  3162
Goal reached:  2941
Score:  4
 
Episode:  3163
Goal reached:  2942
Score:  8
 
Episode:  3164
Goal reached:  2943
Score:  13
 
Episode:  3165
Goal reached:  2944
Score:  6
 
Episode:  3166
Goal reached:  2945
Score:  -17
 
Episode:  3167
Goal reached:  2946
Score:  5
 
Episode:  3168
Goal reached:  2947
Score:  7
 
Episode:  3169
Goal reached:  2948
Score:  3
 
Episode:  3170
Goal reached:  2949
Score:  4
 
Episode:  3171
Goal reached:  2950
Score:  3
 
Episode:  3172
Goal reached:  2951
Score:  4
 
Episode:  3173
Goal reached:  2952
Score:  7
 
Episode:  3174
Goal reached:  2953
Score:  8
 
Episode:  3175
Goal reached:  2954
Score:  5
 
Episode:  3176
Goal reached:  2955
Score:  -1
 
Episod

Episode:  3330
Goal reached:  3109
Score:  6
 
Episode:  3331
Goal reached:  3110
Score:  5
 
Episode:  3332
Goal reached:  3111
Score:  10
 
Episode:  3333
Goal reached:  3112
Score:  9
 
Episode:  3334
Goal reached:  3113
Score:  12
 
Episode:  3335
Goal reached:  3114
Score:  10
 
Episode:  3336
Goal reached:  3115
Score:  13
 
Episode:  3337
Goal reached:  3116
Score:  -3
 
Episode:  3338
Goal reached:  3117
Score:  5
 
Episode:  3339
Goal reached:  3118
Score:  4
 
Episode:  3340
Goal reached:  3119
Score:  -2
 
Episode:  3341
Goal reached:  3120
Score:  -2
 
Episode:  3342
Goal reached:  3121
Score:  -4
 
Episode:  3343
Goal reached:  3122
Score:  7
 
Episode:  3344
Goal reached:  3123
Score:  9
 
Episode:  3345
Goal reached:  3124
Score:  15
 
Episode:  3346
Goal reached:  3125
Score:  -2
 
Episode:  3347
Goal reached:  3126
Score:  8
 
Episode:  3348
Goal reached:  3127
Score:  10
 
Episode:  3349
Goal reached:  3128
Score:  10
 
Episode:  3350
Goal reached:  3129
Score:  6
 
E

Episode:  3503
Goal reached:  3282
Score:  5
 
Episode:  3504
Goal reached:  3283
Score:  4
 
Episode:  3505
Goal reached:  3284
Score:  6
 
Episode:  3506
Goal reached:  3285
Score:  4
 
Episode:  3507
Goal reached:  3286
Score:  4
 
Episode:  3508
Goal reached:  3287
Score:  7
 
Episode:  3509
Goal reached:  3288
Score:  -12
 
Episode:  3510
Goal reached:  3289
Score:  9
 
Episode:  3511
Goal reached:  3290
Score:  -14
 
Episode:  3512
Goal reached:  3291
Score:  6
 
Episode:  3513
Goal reached:  3292
Score:  10
 
Episode:  3514
Goal reached:  3293
Score:  -5
 
Episode:  3515
Goal reached:  3294
Score:  4
 
Episode:  3516
Goal reached:  3295
Score:  7
 
Episode:  3517
Goal reached:  3296
Score:  9
 
Episode:  3518
Goal reached:  3297
Score:  8
 
Episode:  3519
Goal reached:  3298
Score:  -4
 
Episode:  3520
Goal reached:  3299
Score:  10
 
Episode:  3521
Goal reached:  3300
Score:  -1
 
Episode:  3522
Goal reached:  3301
Score:  13
 
Episode:  3523
Goal reached:  3302
Score:  5
 
Epi

Episode:  3677
Goal reached:  3456
Score:  5
 
Episode:  3678
Goal reached:  3457
Score:  4
 
Episode:  3679
Goal reached:  3458
Score:  11
 
Episode:  3680
Goal reached:  3459
Score:  11
 
Episode:  3681
Goal reached:  3460
Score:  2
 
Episode:  3682
Goal reached:  3461
Score:  4
 
Episode:  3683
Goal reached:  3462
Score:  4
 
Episode:  3684
Goal reached:  3463
Score:  -4
 
Episode:  3685
Goal reached:  3464
Score:  7
 
Episode:  3686
Goal reached:  3465
Score:  4
 
Episode:  3687
Goal reached:  3466
Score:  10
 
Episode:  3688
Goal reached:  3467
Score:  9
 
Episode:  3689
Goal reached:  3468
Score:  6
 
Episode:  3690
Goal reached:  3469
Score:  3
 
Episode:  3691
Goal reached:  3470
Score:  6
 
Episode:  3692
Goal reached:  3471
Score:  3
 
Episode:  3693
Goal reached:  3472
Score:  8
 
Episode:  3694
Goal reached:  3473
Score:  9
 
Episode:  3695
Goal reached:  3474
Score:  4
 
Episode:  3696
Goal reached:  3475
Score:  12
 
Episode:  3697
Goal reached:  3476
Score:  11
 
Episode

Episode:  3850
Goal reached:  3629
Score:  5
 
Episode:  3851
Goal reached:  3630
Score:  9
 
Episode:  3852
Goal reached:  3631
Score:  10
 
Episode:  3853
Goal reached:  3632
Score:  8
 
Episode:  3854
Goal reached:  3633
Score:  -5
 
Episode:  3855
Goal reached:  3634
Score:  3
 
Episode:  3856
Goal reached:  3635
Score:  6
 
Episode:  3857
Goal reached:  3636
Score:  6
 
Episode:  3858
Goal reached:  3637
Score:  6
 
Episode:  3859
Goal reached:  3638
Score:  4
 
Episode:  3860
Goal reached:  3639
Score:  8
 
Episode:  3861
Goal reached:  3640
Score:  -6
 
Episode:  3862
Goal reached:  3641
Score:  -3
 
Episode:  3863
Goal reached:  3642
Score:  4
 
Episode:  3864
Goal reached:  3643
Score:  5
 
Episode:  3865
Goal reached:  3644
Score:  9
 
Episode:  3866
Goal reached:  3645
Score:  -5
 
Episode:  3867
Goal reached:  3646
Score:  5
 
Episode:  3868
Goal reached:  3647
Score:  9
 
Episode:  3869
Goal reached:  3648
Score:  12
 
Episode:  3870
Goal reached:  3649
Score:  -2
 
Episod

Episode:  4023
Goal reached:  3802
Score:  6
 
Episode:  4024
Goal reached:  3803
Score:  6
 
Episode:  4025
Goal reached:  3804
Score:  -6
 
Episode:  4026
Goal reached:  3805
Score:  8
 
Episode:  4027
Goal reached:  3806
Score:  12
 
Episode:  4028
Goal reached:  3807
Score:  -4
 
Episode:  4029
Goal reached:  3808
Score:  13
 
Episode:  4030
Goal reached:  3809
Score:  7
 
Episode:  4031
Goal reached:  3810
Score:  8
 
Episode:  4032
Goal reached:  3811
Score:  12
 
Episode:  4033
Goal reached:  3812
Score:  10
 
Episode:  4034
Goal reached:  3813
Score:  7
 
Episode:  4035
Goal reached:  3814
Score:  9
 
Episode:  4036
Goal reached:  3815
Score:  2
 


KeyboardInterrupt: 

To train the agent we use the function see in the step 4. 

We start having a pure exploration to fill the Q table but with the episodes increse we want that the explotation increse too. To obtain that we are decresing epsilon every episode. After 200 episode (20000 step) we will be in a pure explotation state.


Each episode require a different number of steps to complete the task, so we don’t always receive the same reward (each move is -1 point).

The minimum reward will be 3 (20 - 17), it takes 16 moves + 1 pick up action if we initialised the taxi and passenger at opposite corners with a drop-off location being the same corner as the taxi’s original position. 

The maximum reward is 15, if we take the two closest colour squares (Red and Yellow), the agent would use 1 pick up + 4 moves.

# Result

The program should run for hundreds/thousands of episodes (many hours) to learn and have good result, so I will attach some results.

At the beginning we have pure exploration, the agent will explore every situation making a lot of mistakes.

    Episode:  1
    Goal reached:  0
    Score:  -415
    
After 100 episodes we are in a 50/50 between exploration and explotation, due to the decreasing value of epsilon. The agent is still learning but it's possible to reach some goal with big negative score.

    Episode:  100
    Goal reached:  0
    Score:  -325

After 200 episodes we can see that the agent is still making mistakes but it start to reach some goals more frequently. Now, we are in pure explotation, the agent should have learned enough to take good decision.

    Episode:  200
    Goal reached:  21
    Score:  -118

We see that in the following next 100 episode (201 to 300) it reached the goal more often.

    Episode:  300
    Goal reached:  92
    Score:  -31

Then, after 400 episodes and more the agent knows perfectly how to interact in every situation taking the best action and the shortest way to reach the goal. It's rare that he doesn't reach the goal. 

    Episode:  401
    Goal reached:  182
    Score:  -2
    
    Episode:  4000
    Goal reached:  3779
    Score:  10