# Model Free Monte Carlo : 


Monte Carlo Methods can learn directly from episodes. It is a model free approach, so it is usefull when we do not have knowledge about the environment (MDP).

It needs to learn from an entire episode. To evaluate the value of each state, you have to compute the mean return (rewards). The Monte Carlo methods will be used to evaluate a policy (Policy Evaluation), then the policy can be improved with Policy Improvement (greedy approach). In pratice you will save every state, rewards of each episode and then compute the value of the state iteratively based on the formula : 
$V^\pi\left(s_{t}\right) \leftarrow V^\pi\left(s_{t}\right)+\alpha\left(r_{k}+r_{k+1}+\cdots+r_{N}-V^\pi\left(s_{t}\right)\right) = V^\pi\left(s_{t}\right)+\alpha\left(G_k - V^\pi\left(s_{t}\right)\right) $

$G_k = \gamma G_{k+1} + r_{k+1}$ with $G_k$ the cumulative rewards.


<img src="img/MonteCarloControl.png" >   

# Monte Carlo Control

In [1]:
import gym
import numpy as np

In [2]:
env = gym.make("MountainCar-v0")
env._max_episode_steps = 5000
"""
Actions:
        Type: Discrete(3)
        Num    Action
        0      Accelerate to the Left
        1      Don't accelerate
        2      Accelerate to the Right
"""

def convert2discrete(observations, N=30):
    """
    Observation:
        Type: Box(2)
        Num    Observation               Min            Max
        0      Car Position              -1.2           0.6
        1      Car Velocity              -0.07          0.07

    """
    position = observations[0]
    velocity = observations[1]
    
    min_pos = -1.2
    max_pos = 0.6
    step_pos = (max_pos - min_pos)/(N-1)
    
    min_vel = -0.07
    max_vel = 0.07
    step_vel = (max_vel - min_vel)/(N-1)
    
    new_position = (position - min_pos)//step_pos
    
    new_velocity = (velocity - min_vel)//step_vel
    
    return int(new_position), int(new_velocity)
    
    
    


epsilon=1.0
counts_steps = []
success = 0.
numberExp = 10000
discount = 0.95
discr = 20
q = np.zeros((discr,discr,3))
N = np.zeros((discr, discr,3))
R = np.zeros((discr, discr,3))

In [3]:
for experiment in range(0, numberExp):
    Rewards = []
    S = []
    observation = env.reset()
    
    new_p, new_v = convert2discrete(observation, N=discr)
    # Generate episode and save state, action
    for curr_step in range(0, env._max_episode_steps):
        
        ## Select an action 
        if np.random.uniform()<=epsilon:
            action = env.action_space.sample() # your agent here (this takes random actions)
        else:
            action = np.argmax(q[new_p, new_v,:])
            
        # get new observation
        new_observation, reward, done, info = env.step(action)
        
        # save state
        S.append( (new_p, new_v, action) )
        
        
        Rewards.append(reward)
        # discretise states
        new_p, new_v = convert2discrete(new_observation, N=discr)
        
        if done:
            #print(info)
            if 'TimeLimit.truncated' not in info:
                Rewards[-1] = env._max_episode_steps
                success += 1
                if success > 10:
                    epsilon = max(epsilon * 0.99,0.05)
            break
    # update value
    i = len(S) - 1
    G = 0
    while i >= 0:
        G = discount * G + Rewards[i]
        if S[i] not in set(S[:i]):
            N[S[i][0], S[i][1], S[i][2]] += 1 # counter needed for computing the average later
            R[S[i][0], S[i][1], S[i][2]] += G # Cumulative reward with discount save depending of the state
            q[S[i][0], S[i][1], S[i][2]] = R[S[i][0], S[i][1], S[i][2]] / N[S[i][0], S[i][1], S[i][2]] # update the q value
        i -=1

    counts_steps.append(curr_step)
    print(f"rate of success : {success}/{experiment+1}, epsilon={epsilon}, num_steps:{curr_step}")
    if np.mean(counts_steps[-100:])<2500:
        env._max_episode_steps = 1500
    elif np.mean(counts_steps[-100:])<1000:
        env._max_episode_steps = 800
    elif np.mean(counts_steps[-100:])<500:
        env._max_episode_steps = 400
    elif np.mean(counts_steps[-100:])<300:
        env._max_episode_steps = 200
env.close()
np.save("monte_carlo_moutaincar", q)

rate of success : 0.0/1, epsilon=1.0, num_steps:4999
rate of success : 0.0/2, epsilon=1.0, num_steps:4999
rate of success : 0.0/3, epsilon=1.0, num_steps:4999
rate of success : 0.0/4, epsilon=1.0, num_steps:4999
rate of success : 0.0/5, epsilon=1.0, num_steps:4999
rate of success : 0.0/6, epsilon=1.0, num_steps:4999
rate of success : 0.0/7, epsilon=1.0, num_steps:4999
rate of success : 0.0/8, epsilon=1.0, num_steps:4999
rate of success : 0.0/9, epsilon=1.0, num_steps:4999
rate of success : 0.0/10, epsilon=1.0, num_steps:4999
rate of success : 0.0/11, epsilon=1.0, num_steps:4999
rate of success : 0.0/12, epsilon=1.0, num_steps:4999
rate of success : 0.0/13, epsilon=1.0, num_steps:4999
rate of success : 0.0/14, epsilon=1.0, num_steps:4999
rate of success : 0.0/15, epsilon=1.0, num_steps:4999
rate of success : 1.0/16, epsilon=1.0, num_steps:2928
rate of success : 1.0/17, epsilon=1.0, num_steps:4999
rate of success : 1.0/18, epsilon=1.0, num_steps:4999
rate of success : 1.0/19, epsilon=1.0

rate of success : 10.0/152, epsilon=1.0, num_steps:4999
rate of success : 10.0/153, epsilon=1.0, num_steps:4999
rate of success : 11.0/154, epsilon=0.99, num_steps:3760
rate of success : 11.0/155, epsilon=0.99, num_steps:4999
rate of success : 11.0/156, epsilon=0.99, num_steps:4999
rate of success : 11.0/157, epsilon=0.99, num_steps:4999
rate of success : 12.0/158, epsilon=0.9801, num_steps:3508
rate of success : 12.0/159, epsilon=0.9801, num_steps:4999
rate of success : 12.0/160, epsilon=0.9801, num_steps:4999
rate of success : 12.0/161, epsilon=0.9801, num_steps:4999
rate of success : 12.0/162, epsilon=0.9801, num_steps:4999
rate of success : 12.0/163, epsilon=0.9801, num_steps:4999
rate of success : 12.0/164, epsilon=0.9801, num_steps:4999
rate of success : 12.0/165, epsilon=0.9801, num_steps:4999
rate of success : 12.0/166, epsilon=0.9801, num_steps:4999
rate of success : 12.0/167, epsilon=0.9801, num_steps:4999
rate of success : 12.0/168, epsilon=0.9801, num_steps:4999
rate of suc

rate of success : 33.0/272, epsilon=0.7936142836436553, num_steps:4777
rate of success : 33.0/273, epsilon=0.7936142836436553, num_steps:4999
rate of success : 34.0/274, epsilon=0.7856781408072188, num_steps:4191
rate of success : 34.0/275, epsilon=0.7856781408072188, num_steps:4999
rate of success : 34.0/276, epsilon=0.7856781408072188, num_steps:4999
rate of success : 35.0/277, epsilon=0.7778213593991465, num_steps:1122
rate of success : 35.0/278, epsilon=0.7778213593991465, num_steps:4999
rate of success : 35.0/279, epsilon=0.7778213593991465, num_steps:4999
rate of success : 36.0/280, epsilon=0.7700431458051551, num_steps:3613
rate of success : 37.0/281, epsilon=0.7623427143471035, num_steps:997
rate of success : 37.0/282, epsilon=0.7623427143471035, num_steps:4999
rate of success : 38.0/283, epsilon=0.7547192872036325, num_steps:1835
rate of success : 39.0/284, epsilon=0.7471720943315961, num_steps:4897
rate of success : 40.0/285, epsilon=0.7397003733882802, num_steps:3252
rate of

rate of success : 102.0/387, epsilon=0.39667780642202527, num_steps:1499
rate of success : 102.0/388, epsilon=0.39667780642202527, num_steps:1499
rate of success : 102.0/389, epsilon=0.39667780642202527, num_steps:1499
rate of success : 102.0/390, epsilon=0.39667780642202527, num_steps:1499
rate of success : 102.0/391, epsilon=0.39667780642202527, num_steps:1499
rate of success : 102.0/392, epsilon=0.39667780642202527, num_steps:1499
rate of success : 102.0/393, epsilon=0.39667780642202527, num_steps:1499
rate of success : 102.0/394, epsilon=0.39667780642202527, num_steps:1499
rate of success : 102.0/395, epsilon=0.39667780642202527, num_steps:1499
rate of success : 102.0/396, epsilon=0.39667780642202527, num_steps:1499
rate of success : 102.0/397, epsilon=0.39667780642202527, num_steps:1499
rate of success : 102.0/398, epsilon=0.39667780642202527, num_steps:1499
rate of success : 102.0/399, epsilon=0.39667780642202527, num_steps:1499
rate of success : 102.0/400, epsilon=0.396677806422

rate of success : 103.0/502, epsilon=0.392711028357805, num_steps:1499
rate of success : 103.0/503, epsilon=0.392711028357805, num_steps:1499
rate of success : 103.0/504, epsilon=0.392711028357805, num_steps:1499
rate of success : 103.0/505, epsilon=0.392711028357805, num_steps:1499
rate of success : 103.0/506, epsilon=0.392711028357805, num_steps:1499
rate of success : 103.0/507, epsilon=0.392711028357805, num_steps:1499
rate of success : 103.0/508, epsilon=0.392711028357805, num_steps:1499
rate of success : 103.0/509, epsilon=0.392711028357805, num_steps:1499
rate of success : 103.0/510, epsilon=0.392711028357805, num_steps:1499
rate of success : 103.0/511, epsilon=0.392711028357805, num_steps:1499
rate of success : 103.0/512, epsilon=0.392711028357805, num_steps:1499
rate of success : 103.0/513, epsilon=0.392711028357805, num_steps:1499
rate of success : 103.0/514, epsilon=0.392711028357805, num_steps:1499
rate of success : 103.0/515, epsilon=0.392711028357805, num_steps:1499
rate o

rate of success : 103.0/618, epsilon=0.392711028357805, num_steps:1499
rate of success : 103.0/619, epsilon=0.392711028357805, num_steps:1499
rate of success : 103.0/620, epsilon=0.392711028357805, num_steps:1499
rate of success : 103.0/621, epsilon=0.392711028357805, num_steps:1499
rate of success : 103.0/622, epsilon=0.392711028357805, num_steps:1499
rate of success : 103.0/623, epsilon=0.392711028357805, num_steps:1499
rate of success : 103.0/624, epsilon=0.392711028357805, num_steps:1499
rate of success : 103.0/625, epsilon=0.392711028357805, num_steps:1499
rate of success : 103.0/626, epsilon=0.392711028357805, num_steps:1499
rate of success : 103.0/627, epsilon=0.392711028357805, num_steps:1499
rate of success : 103.0/628, epsilon=0.392711028357805, num_steps:1499
rate of success : 103.0/629, epsilon=0.392711028357805, num_steps:1499
rate of success : 103.0/630, epsilon=0.392711028357805, num_steps:1499
rate of success : 103.0/631, epsilon=0.392711028357805, num_steps:1499
rate o

rate of success : 158.0/745, epsilon=0.22594815553398728, num_steps:253
rate of success : 159.0/746, epsilon=0.22368867397864742, num_steps:338
rate of success : 160.0/747, epsilon=0.22145178723886094, num_steps:320
rate of success : 161.0/748, epsilon=0.21923726936647234, num_steps:406
rate of success : 162.0/749, epsilon=0.2170448966728076, num_steps:274
rate of success : 163.0/750, epsilon=0.21487444770607952, num_steps:256
rate of success : 164.0/751, epsilon=0.21272570322901874, num_steps:412
rate of success : 165.0/752, epsilon=0.21059844619672854, num_steps:328
rate of success : 166.0/753, epsilon=0.20849246173476127, num_steps:251
rate of success : 167.0/754, epsilon=0.20640753711741366, num_steps:400
rate of success : 168.0/755, epsilon=0.20434346174623952, num_steps:235
rate of success : 169.0/756, epsilon=0.20230002712877712, num_steps:364
rate of success : 170.0/757, epsilon=0.20027702685748935, num_steps:241
rate of success : 171.0/758, epsilon=0.19827425658891445, num_ste

rate of success : 277.0/864, epsilon=0.06832772446471178, num_steps:264
rate of success : 278.0/865, epsilon=0.06764444722006466, num_steps:174
rate of success : 279.0/866, epsilon=0.066968002747864, num_steps:173
rate of success : 280.0/867, epsilon=0.06629832272038537, num_steps:174
rate of success : 281.0/868, epsilon=0.06563533949318151, num_steps:154
rate of success : 282.0/869, epsilon=0.06497898609824969, num_steps:154
rate of success : 283.0/870, epsilon=0.0643291962372672, num_steps:248
rate of success : 284.0/871, epsilon=0.06368590427489453, num_steps:353
rate of success : 285.0/872, epsilon=0.06304904523214558, num_steps:168
rate of success : 286.0/873, epsilon=0.06241855477982412, num_steps:156
rate of success : 287.0/874, epsilon=0.06179436923202588, num_steps:343
rate of success : 288.0/875, epsilon=0.06117642553970562, num_steps:183
rate of success : 289.0/876, epsilon=0.06056466128430856, num_steps:234
rate of success : 290.0/877, epsilon=0.05995901467146548, num_steps

rate of success : 415.0/1002, epsilon=0.05, num_steps:324
rate of success : 416.0/1003, epsilon=0.05, num_steps:152
rate of success : 417.0/1004, epsilon=0.05, num_steps:172
rate of success : 418.0/1005, epsilon=0.05, num_steps:152
rate of success : 419.0/1006, epsilon=0.05, num_steps:173
rate of success : 420.0/1007, epsilon=0.05, num_steps:177
rate of success : 421.0/1008, epsilon=0.05, num_steps:155
rate of success : 422.0/1009, epsilon=0.05, num_steps:151
rate of success : 423.0/1010, epsilon=0.05, num_steps:227
rate of success : 424.0/1011, epsilon=0.05, num_steps:248
rate of success : 425.0/1012, epsilon=0.05, num_steps:177
rate of success : 426.0/1013, epsilon=0.05, num_steps:161
rate of success : 427.0/1014, epsilon=0.05, num_steps:167
rate of success : 428.0/1015, epsilon=0.05, num_steps:176
rate of success : 429.0/1016, epsilon=0.05, num_steps:230
rate of success : 430.0/1017, epsilon=0.05, num_steps:252
rate of success : 431.0/1018, epsilon=0.05, num_steps:172
rate of succes

rate of success : 579.0/1166, epsilon=0.05, num_steps:226
rate of success : 580.0/1167, epsilon=0.05, num_steps:166
rate of success : 581.0/1168, epsilon=0.05, num_steps:177
rate of success : 582.0/1169, epsilon=0.05, num_steps:239
rate of success : 583.0/1170, epsilon=0.05, num_steps:233
rate of success : 584.0/1171, epsilon=0.05, num_steps:319
rate of success : 585.0/1172, epsilon=0.05, num_steps:169
rate of success : 586.0/1173, epsilon=0.05, num_steps:151
rate of success : 587.0/1174, epsilon=0.05, num_steps:153
rate of success : 588.0/1175, epsilon=0.05, num_steps:242
rate of success : 589.0/1176, epsilon=0.05, num_steps:184
rate of success : 590.0/1177, epsilon=0.05, num_steps:165
rate of success : 591.0/1178, epsilon=0.05, num_steps:254
rate of success : 592.0/1179, epsilon=0.05, num_steps:266
rate of success : 593.0/1180, epsilon=0.05, num_steps:230
rate of success : 594.0/1181, epsilon=0.05, num_steps:258
rate of success : 595.0/1182, epsilon=0.05, num_steps:181
rate of succes

rate of success : 721.0/1308, epsilon=0.05, num_steps:190
rate of success : 722.0/1309, epsilon=0.05, num_steps:156
rate of success : 723.0/1310, epsilon=0.05, num_steps:174
rate of success : 724.0/1311, epsilon=0.05, num_steps:154
rate of success : 725.0/1312, epsilon=0.05, num_steps:151
rate of success : 726.0/1313, epsilon=0.05, num_steps:333
rate of success : 727.0/1314, epsilon=0.05, num_steps:171
rate of success : 728.0/1315, epsilon=0.05, num_steps:308
rate of success : 729.0/1316, epsilon=0.05, num_steps:167
rate of success : 730.0/1317, epsilon=0.05, num_steps:156
rate of success : 731.0/1318, epsilon=0.05, num_steps:171
rate of success : 732.0/1319, epsilon=0.05, num_steps:232
rate of success : 733.0/1320, epsilon=0.05, num_steps:179
rate of success : 734.0/1321, epsilon=0.05, num_steps:152
rate of success : 735.0/1322, epsilon=0.05, num_steps:263
rate of success : 736.0/1323, epsilon=0.05, num_steps:162
rate of success : 737.0/1324, epsilon=0.05, num_steps:189
rate of succes

rate of success : 882.0/1469, epsilon=0.05, num_steps:171
rate of success : 883.0/1470, epsilon=0.05, num_steps:320
rate of success : 884.0/1471, epsilon=0.05, num_steps:161
rate of success : 885.0/1472, epsilon=0.05, num_steps:249
rate of success : 886.0/1473, epsilon=0.05, num_steps:232
rate of success : 887.0/1474, epsilon=0.05, num_steps:159
rate of success : 888.0/1475, epsilon=0.05, num_steps:305
rate of success : 889.0/1476, epsilon=0.05, num_steps:156
rate of success : 890.0/1477, epsilon=0.05, num_steps:174
rate of success : 891.0/1478, epsilon=0.05, num_steps:177
rate of success : 892.0/1479, epsilon=0.05, num_steps:177
rate of success : 893.0/1480, epsilon=0.05, num_steps:145
rate of success : 894.0/1481, epsilon=0.05, num_steps:233
rate of success : 895.0/1482, epsilon=0.05, num_steps:170
rate of success : 896.0/1483, epsilon=0.05, num_steps:179
rate of success : 897.0/1484, epsilon=0.05, num_steps:187
rate of success : 898.0/1485, epsilon=0.05, num_steps:169
rate of succes

rate of success : 1031.0/1618, epsilon=0.05, num_steps:152
rate of success : 1032.0/1619, epsilon=0.05, num_steps:183
rate of success : 1033.0/1620, epsilon=0.05, num_steps:235
rate of success : 1034.0/1621, epsilon=0.05, num_steps:258
rate of success : 1035.0/1622, epsilon=0.05, num_steps:172
rate of success : 1036.0/1623, epsilon=0.05, num_steps:170
rate of success : 1037.0/1624, epsilon=0.05, num_steps:180
rate of success : 1038.0/1625, epsilon=0.05, num_steps:174
rate of success : 1039.0/1626, epsilon=0.05, num_steps:148
rate of success : 1040.0/1627, epsilon=0.05, num_steps:178
rate of success : 1041.0/1628, epsilon=0.05, num_steps:234
rate of success : 1042.0/1629, epsilon=0.05, num_steps:259
rate of success : 1043.0/1630, epsilon=0.05, num_steps:159
rate of success : 1044.0/1631, epsilon=0.05, num_steps:174
rate of success : 1045.0/1632, epsilon=0.05, num_steps:176
rate of success : 1046.0/1633, epsilon=0.05, num_steps:154
rate of success : 1047.0/1634, epsilon=0.05, num_steps:1

rate of success : 1172.0/1759, epsilon=0.05, num_steps:172
rate of success : 1173.0/1760, epsilon=0.05, num_steps:252
rate of success : 1174.0/1761, epsilon=0.05, num_steps:156
rate of success : 1175.0/1762, epsilon=0.05, num_steps:173
rate of success : 1176.0/1763, epsilon=0.05, num_steps:171
rate of success : 1177.0/1764, epsilon=0.05, num_steps:233
rate of success : 1178.0/1765, epsilon=0.05, num_steps:176
rate of success : 1179.0/1766, epsilon=0.05, num_steps:155
rate of success : 1180.0/1767, epsilon=0.05, num_steps:170
rate of success : 1181.0/1768, epsilon=0.05, num_steps:172
rate of success : 1182.0/1769, epsilon=0.05, num_steps:174
rate of success : 1183.0/1770, epsilon=0.05, num_steps:171
rate of success : 1184.0/1771, epsilon=0.05, num_steps:183
rate of success : 1185.0/1772, epsilon=0.05, num_steps:257
rate of success : 1186.0/1773, epsilon=0.05, num_steps:166
rate of success : 1187.0/1774, epsilon=0.05, num_steps:354
rate of success : 1188.0/1775, epsilon=0.05, num_steps:3

rate of success : 1324.0/1911, epsilon=0.05, num_steps:166
rate of success : 1325.0/1912, epsilon=0.05, num_steps:258
rate of success : 1326.0/1913, epsilon=0.05, num_steps:157
rate of success : 1327.0/1914, epsilon=0.05, num_steps:254
rate of success : 1328.0/1915, epsilon=0.05, num_steps:254
rate of success : 1329.0/1916, epsilon=0.05, num_steps:240
rate of success : 1330.0/1917, epsilon=0.05, num_steps:179
rate of success : 1331.0/1918, epsilon=0.05, num_steps:172
rate of success : 1332.0/1919, epsilon=0.05, num_steps:233
rate of success : 1333.0/1920, epsilon=0.05, num_steps:156
rate of success : 1334.0/1921, epsilon=0.05, num_steps:210
rate of success : 1335.0/1922, epsilon=0.05, num_steps:155
rate of success : 1336.0/1923, epsilon=0.05, num_steps:442
rate of success : 1337.0/1924, epsilon=0.05, num_steps:351
rate of success : 1338.0/1925, epsilon=0.05, num_steps:175
rate of success : 1339.0/1926, epsilon=0.05, num_steps:174
rate of success : 1340.0/1927, epsilon=0.05, num_steps:1

rate of success : 1467.0/2054, epsilon=0.05, num_steps:162
rate of success : 1468.0/2055, epsilon=0.05, num_steps:161
rate of success : 1469.0/2056, epsilon=0.05, num_steps:365
rate of success : 1470.0/2057, epsilon=0.05, num_steps:257
rate of success : 1471.0/2058, epsilon=0.05, num_steps:185
rate of success : 1472.0/2059, epsilon=0.05, num_steps:173
rate of success : 1473.0/2060, epsilon=0.05, num_steps:181
rate of success : 1474.0/2061, epsilon=0.05, num_steps:175
rate of success : 1475.0/2062, epsilon=0.05, num_steps:318
rate of success : 1476.0/2063, epsilon=0.05, num_steps:151
rate of success : 1477.0/2064, epsilon=0.05, num_steps:155
rate of success : 1478.0/2065, epsilon=0.05, num_steps:228
rate of success : 1479.0/2066, epsilon=0.05, num_steps:252
rate of success : 1480.0/2067, epsilon=0.05, num_steps:175
rate of success : 1481.0/2068, epsilon=0.05, num_steps:237
rate of success : 1482.0/2069, epsilon=0.05, num_steps:227
rate of success : 1483.0/2070, epsilon=0.05, num_steps:2

rate of success : 1617.0/2204, epsilon=0.05, num_steps:186
rate of success : 1618.0/2205, epsilon=0.05, num_steps:173
rate of success : 1619.0/2206, epsilon=0.05, num_steps:236
rate of success : 1620.0/2207, epsilon=0.05, num_steps:171
rate of success : 1621.0/2208, epsilon=0.05, num_steps:184
rate of success : 1622.0/2209, epsilon=0.05, num_steps:171
rate of success : 1623.0/2210, epsilon=0.05, num_steps:150
rate of success : 1624.0/2211, epsilon=0.05, num_steps:603
rate of success : 1625.0/2212, epsilon=0.05, num_steps:163
rate of success : 1626.0/2213, epsilon=0.05, num_steps:232
rate of success : 1627.0/2214, epsilon=0.05, num_steps:163
rate of success : 1628.0/2215, epsilon=0.05, num_steps:170
rate of success : 1629.0/2216, epsilon=0.05, num_steps:177
rate of success : 1630.0/2217, epsilon=0.05, num_steps:179
rate of success : 1631.0/2218, epsilon=0.05, num_steps:177
rate of success : 1632.0/2219, epsilon=0.05, num_steps:170
rate of success : 1633.0/2220, epsilon=0.05, num_steps:3

rate of success : 1763.0/2350, epsilon=0.05, num_steps:158
rate of success : 1764.0/2351, epsilon=0.05, num_steps:187
rate of success : 1765.0/2352, epsilon=0.05, num_steps:264
rate of success : 1766.0/2353, epsilon=0.05, num_steps:270
rate of success : 1767.0/2354, epsilon=0.05, num_steps:254
rate of success : 1768.0/2355, epsilon=0.05, num_steps:150
rate of success : 1769.0/2356, epsilon=0.05, num_steps:274
rate of success : 1770.0/2357, epsilon=0.05, num_steps:155
rate of success : 1771.0/2358, epsilon=0.05, num_steps:173
rate of success : 1772.0/2359, epsilon=0.05, num_steps:173
rate of success : 1773.0/2360, epsilon=0.05, num_steps:170
rate of success : 1774.0/2361, epsilon=0.05, num_steps:152
rate of success : 1775.0/2362, epsilon=0.05, num_steps:153
rate of success : 1776.0/2363, epsilon=0.05, num_steps:253
rate of success : 1777.0/2364, epsilon=0.05, num_steps:179
rate of success : 1778.0/2365, epsilon=0.05, num_steps:252
rate of success : 1779.0/2366, epsilon=0.05, num_steps:2

rate of success : 1922.0/2509, epsilon=0.05, num_steps:174
rate of success : 1923.0/2510, epsilon=0.05, num_steps:262
rate of success : 1924.0/2511, epsilon=0.05, num_steps:249
rate of success : 1925.0/2512, epsilon=0.05, num_steps:173
rate of success : 1926.0/2513, epsilon=0.05, num_steps:247
rate of success : 1927.0/2514, epsilon=0.05, num_steps:250
rate of success : 1928.0/2515, epsilon=0.05, num_steps:242
rate of success : 1929.0/2516, epsilon=0.05, num_steps:174
rate of success : 1930.0/2517, epsilon=0.05, num_steps:174
rate of success : 1931.0/2518, epsilon=0.05, num_steps:233
rate of success : 1932.0/2519, epsilon=0.05, num_steps:153
rate of success : 1933.0/2520, epsilon=0.05, num_steps:257
rate of success : 1934.0/2521, epsilon=0.05, num_steps:179
rate of success : 1935.0/2522, epsilon=0.05, num_steps:170
rate of success : 1936.0/2523, epsilon=0.05, num_steps:171
rate of success : 1937.0/2524, epsilon=0.05, num_steps:239
rate of success : 1938.0/2525, epsilon=0.05, num_steps:1

rate of success : 2069.0/2656, epsilon=0.05, num_steps:234
rate of success : 2070.0/2657, epsilon=0.05, num_steps:156
rate of success : 2071.0/2658, epsilon=0.05, num_steps:230
rate of success : 2072.0/2659, epsilon=0.05, num_steps:175
rate of success : 2073.0/2660, epsilon=0.05, num_steps:173
rate of success : 2074.0/2661, epsilon=0.05, num_steps:183
rate of success : 2075.0/2662, epsilon=0.05, num_steps:154
rate of success : 2076.0/2663, epsilon=0.05, num_steps:253
rate of success : 2077.0/2664, epsilon=0.05, num_steps:316
rate of success : 2078.0/2665, epsilon=0.05, num_steps:171
rate of success : 2079.0/2666, epsilon=0.05, num_steps:173
rate of success : 2080.0/2667, epsilon=0.05, num_steps:237
rate of success : 2081.0/2668, epsilon=0.05, num_steps:261
rate of success : 2082.0/2669, epsilon=0.05, num_steps:234
rate of success : 2083.0/2670, epsilon=0.05, num_steps:234
rate of success : 2084.0/2671, epsilon=0.05, num_steps:173
rate of success : 2085.0/2672, epsilon=0.05, num_steps:2

rate of success : 2218.0/2805, epsilon=0.05, num_steps:150
rate of success : 2219.0/2806, epsilon=0.05, num_steps:244
rate of success : 2220.0/2807, epsilon=0.05, num_steps:177
rate of success : 2221.0/2808, epsilon=0.05, num_steps:151
rate of success : 2222.0/2809, epsilon=0.05, num_steps:255
rate of success : 2223.0/2810, epsilon=0.05, num_steps:246
rate of success : 2224.0/2811, epsilon=0.05, num_steps:170
rate of success : 2225.0/2812, epsilon=0.05, num_steps:243
rate of success : 2226.0/2813, epsilon=0.05, num_steps:183
rate of success : 2227.0/2814, epsilon=0.05, num_steps:149
rate of success : 2228.0/2815, epsilon=0.05, num_steps:177
rate of success : 2229.0/2816, epsilon=0.05, num_steps:517
rate of success : 2230.0/2817, epsilon=0.05, num_steps:226
rate of success : 2231.0/2818, epsilon=0.05, num_steps:251
rate of success : 2232.0/2819, epsilon=0.05, num_steps:258
rate of success : 2233.0/2820, epsilon=0.05, num_steps:158
rate of success : 2234.0/2821, epsilon=0.05, num_steps:2

rate of success : 2378.0/2965, epsilon=0.05, num_steps:253
rate of success : 2379.0/2966, epsilon=0.05, num_steps:156
rate of success : 2380.0/2967, epsilon=0.05, num_steps:173
rate of success : 2381.0/2968, epsilon=0.05, num_steps:170
rate of success : 2382.0/2969, epsilon=0.05, num_steps:178
rate of success : 2383.0/2970, epsilon=0.05, num_steps:254
rate of success : 2384.0/2971, epsilon=0.05, num_steps:251
rate of success : 2385.0/2972, epsilon=0.05, num_steps:237
rate of success : 2386.0/2973, epsilon=0.05, num_steps:156
rate of success : 2387.0/2974, epsilon=0.05, num_steps:430
rate of success : 2388.0/2975, epsilon=0.05, num_steps:221
rate of success : 2389.0/2976, epsilon=0.05, num_steps:178
rate of success : 2390.0/2977, epsilon=0.05, num_steps:199
rate of success : 2391.0/2978, epsilon=0.05, num_steps:236
rate of success : 2392.0/2979, epsilon=0.05, num_steps:259
rate of success : 2393.0/2980, epsilon=0.05, num_steps:138
rate of success : 2394.0/2981, epsilon=0.05, num_steps:2

rate of success : 2523.0/3110, epsilon=0.05, num_steps:147
rate of success : 2524.0/3111, epsilon=0.05, num_steps:161
rate of success : 2525.0/3112, epsilon=0.05, num_steps:237
rate of success : 2526.0/3113, epsilon=0.05, num_steps:167
rate of success : 2527.0/3114, epsilon=0.05, num_steps:307
rate of success : 2528.0/3115, epsilon=0.05, num_steps:188
rate of success : 2529.0/3116, epsilon=0.05, num_steps:177
rate of success : 2530.0/3117, epsilon=0.05, num_steps:174
rate of success : 2531.0/3118, epsilon=0.05, num_steps:171
rate of success : 2532.0/3119, epsilon=0.05, num_steps:252
rate of success : 2533.0/3120, epsilon=0.05, num_steps:158
rate of success : 2534.0/3121, epsilon=0.05, num_steps:351
rate of success : 2535.0/3122, epsilon=0.05, num_steps:172
rate of success : 2536.0/3123, epsilon=0.05, num_steps:172
rate of success : 2537.0/3124, epsilon=0.05, num_steps:149
rate of success : 2538.0/3125, epsilon=0.05, num_steps:184
rate of success : 2539.0/3126, epsilon=0.05, num_steps:1

rate of success : 2669.0/3256, epsilon=0.05, num_steps:183
rate of success : 2670.0/3257, epsilon=0.05, num_steps:165
rate of success : 2671.0/3258, epsilon=0.05, num_steps:173
rate of success : 2672.0/3259, epsilon=0.05, num_steps:165
rate of success : 2673.0/3260, epsilon=0.05, num_steps:299
rate of success : 2674.0/3261, epsilon=0.05, num_steps:170
rate of success : 2675.0/3262, epsilon=0.05, num_steps:167
rate of success : 2676.0/3263, epsilon=0.05, num_steps:174
rate of success : 2677.0/3264, epsilon=0.05, num_steps:153
rate of success : 2678.0/3265, epsilon=0.05, num_steps:166
rate of success : 2679.0/3266, epsilon=0.05, num_steps:150
rate of success : 2680.0/3267, epsilon=0.05, num_steps:644
rate of success : 2681.0/3268, epsilon=0.05, num_steps:170
rate of success : 2682.0/3269, epsilon=0.05, num_steps:187
rate of success : 2683.0/3270, epsilon=0.05, num_steps:255
rate of success : 2684.0/3271, epsilon=0.05, num_steps:150
rate of success : 2685.0/3272, epsilon=0.05, num_steps:2

rate of success : 2815.0/3402, epsilon=0.05, num_steps:167
rate of success : 2816.0/3403, epsilon=0.05, num_steps:167
rate of success : 2817.0/3404, epsilon=0.05, num_steps:148
rate of success : 2818.0/3405, epsilon=0.05, num_steps:176
rate of success : 2819.0/3406, epsilon=0.05, num_steps:173
rate of success : 2820.0/3407, epsilon=0.05, num_steps:168
rate of success : 2821.0/3408, epsilon=0.05, num_steps:182
rate of success : 2822.0/3409, epsilon=0.05, num_steps:259
rate of success : 2823.0/3410, epsilon=0.05, num_steps:246
rate of success : 2824.0/3411, epsilon=0.05, num_steps:175
rate of success : 2825.0/3412, epsilon=0.05, num_steps:333
rate of success : 2826.0/3413, epsilon=0.05, num_steps:157
rate of success : 2827.0/3414, epsilon=0.05, num_steps:172
rate of success : 2828.0/3415, epsilon=0.05, num_steps:151
rate of success : 2829.0/3416, epsilon=0.05, num_steps:264
rate of success : 2830.0/3417, epsilon=0.05, num_steps:180
rate of success : 2831.0/3418, epsilon=0.05, num_steps:1

rate of success : 3111.0/3698, epsilon=0.05, num_steps:171
rate of success : 3112.0/3699, epsilon=0.05, num_steps:223
rate of success : 3113.0/3700, epsilon=0.05, num_steps:234
rate of success : 3114.0/3701, epsilon=0.05, num_steps:222
rate of success : 3115.0/3702, epsilon=0.05, num_steps:235
rate of success : 3116.0/3703, epsilon=0.05, num_steps:256
rate of success : 3117.0/3704, epsilon=0.05, num_steps:295
rate of success : 3118.0/3705, epsilon=0.05, num_steps:171
rate of success : 3119.0/3706, epsilon=0.05, num_steps:218
rate of success : 3120.0/3707, epsilon=0.05, num_steps:270
rate of success : 3121.0/3708, epsilon=0.05, num_steps:172
rate of success : 3122.0/3709, epsilon=0.05, num_steps:185
rate of success : 3123.0/3710, epsilon=0.05, num_steps:183
rate of success : 3124.0/3711, epsilon=0.05, num_steps:289
rate of success : 3125.0/3712, epsilon=0.05, num_steps:172
rate of success : 3126.0/3713, epsilon=0.05, num_steps:152
rate of success : 3127.0/3714, epsilon=0.05, num_steps:1

rate of success : 3251.0/3838, epsilon=0.05, num_steps:166
rate of success : 3252.0/3839, epsilon=0.05, num_steps:171
rate of success : 3253.0/3840, epsilon=0.05, num_steps:182
rate of success : 3254.0/3841, epsilon=0.05, num_steps:170
rate of success : 3255.0/3842, epsilon=0.05, num_steps:170
rate of success : 3256.0/3843, epsilon=0.05, num_steps:495
rate of success : 3257.0/3844, epsilon=0.05, num_steps:233
rate of success : 3258.0/3845, epsilon=0.05, num_steps:188
rate of success : 3259.0/3846, epsilon=0.05, num_steps:155
rate of success : 3260.0/3847, epsilon=0.05, num_steps:158
rate of success : 3261.0/3848, epsilon=0.05, num_steps:252
rate of success : 3262.0/3849, epsilon=0.05, num_steps:239
rate of success : 3263.0/3850, epsilon=0.05, num_steps:159
rate of success : 3264.0/3851, epsilon=0.05, num_steps:173
rate of success : 3265.0/3852, epsilon=0.05, num_steps:187
rate of success : 3266.0/3853, epsilon=0.05, num_steps:170
rate of success : 3267.0/3854, epsilon=0.05, num_steps:1

rate of success : 3531.0/4118, epsilon=0.05, num_steps:254
rate of success : 3532.0/4119, epsilon=0.05, num_steps:249
rate of success : 3533.0/4120, epsilon=0.05, num_steps:152
rate of success : 3534.0/4121, epsilon=0.05, num_steps:264
rate of success : 3535.0/4122, epsilon=0.05, num_steps:174
rate of success : 3536.0/4123, epsilon=0.05, num_steps:260
rate of success : 3537.0/4124, epsilon=0.05, num_steps:270
rate of success : 3538.0/4125, epsilon=0.05, num_steps:177
rate of success : 3539.0/4126, epsilon=0.05, num_steps:164
rate of success : 3540.0/4127, epsilon=0.05, num_steps:254
rate of success : 3541.0/4128, epsilon=0.05, num_steps:172
rate of success : 3542.0/4129, epsilon=0.05, num_steps:154
rate of success : 3543.0/4130, epsilon=0.05, num_steps:378
rate of success : 3544.0/4131, epsilon=0.05, num_steps:178
rate of success : 3545.0/4132, epsilon=0.05, num_steps:173
rate of success : 3546.0/4133, epsilon=0.05, num_steps:191
rate of success : 3547.0/4134, epsilon=0.05, num_steps:1

rate of success : 3689.0/4276, epsilon=0.05, num_steps:250
rate of success : 3690.0/4277, epsilon=0.05, num_steps:179
rate of success : 3691.0/4278, epsilon=0.05, num_steps:151
rate of success : 3692.0/4279, epsilon=0.05, num_steps:167
rate of success : 3693.0/4280, epsilon=0.05, num_steps:153
rate of success : 3694.0/4281, epsilon=0.05, num_steps:328
rate of success : 3695.0/4282, epsilon=0.05, num_steps:262
rate of success : 3696.0/4283, epsilon=0.05, num_steps:177
rate of success : 3697.0/4284, epsilon=0.05, num_steps:232
rate of success : 3698.0/4285, epsilon=0.05, num_steps:176
rate of success : 3699.0/4286, epsilon=0.05, num_steps:251
rate of success : 3700.0/4287, epsilon=0.05, num_steps:178
rate of success : 3701.0/4288, epsilon=0.05, num_steps:161
rate of success : 3702.0/4289, epsilon=0.05, num_steps:185
rate of success : 3703.0/4290, epsilon=0.05, num_steps:192
rate of success : 3704.0/4291, epsilon=0.05, num_steps:165
rate of success : 3705.0/4292, epsilon=0.05, num_steps:2

rate of success : 3829.0/4416, epsilon=0.05, num_steps:284
rate of success : 3830.0/4417, epsilon=0.05, num_steps:200
rate of success : 3831.0/4418, epsilon=0.05, num_steps:174
rate of success : 3832.0/4419, epsilon=0.05, num_steps:155
rate of success : 3833.0/4420, epsilon=0.05, num_steps:176
rate of success : 3834.0/4421, epsilon=0.05, num_steps:154
rate of success : 3835.0/4422, epsilon=0.05, num_steps:157
rate of success : 3836.0/4423, epsilon=0.05, num_steps:191
rate of success : 3837.0/4424, epsilon=0.05, num_steps:263
rate of success : 3838.0/4425, epsilon=0.05, num_steps:150
rate of success : 3839.0/4426, epsilon=0.05, num_steps:154
rate of success : 3840.0/4427, epsilon=0.05, num_steps:157
rate of success : 3841.0/4428, epsilon=0.05, num_steps:176
rate of success : 3842.0/4429, epsilon=0.05, num_steps:168
rate of success : 3843.0/4430, epsilon=0.05, num_steps:182
rate of success : 3844.0/4431, epsilon=0.05, num_steps:153
rate of success : 3845.0/4432, epsilon=0.05, num_steps:1

rate of success : 3978.0/4565, epsilon=0.05, num_steps:254
rate of success : 3979.0/4566, epsilon=0.05, num_steps:201
rate of success : 3980.0/4567, epsilon=0.05, num_steps:176
rate of success : 3981.0/4568, epsilon=0.05, num_steps:251
rate of success : 3982.0/4569, epsilon=0.05, num_steps:269
rate of success : 3983.0/4570, epsilon=0.05, num_steps:178
rate of success : 3984.0/4571, epsilon=0.05, num_steps:173
rate of success : 3985.0/4572, epsilon=0.05, num_steps:159
rate of success : 3986.0/4573, epsilon=0.05, num_steps:170
rate of success : 3987.0/4574, epsilon=0.05, num_steps:155
rate of success : 3988.0/4575, epsilon=0.05, num_steps:154
rate of success : 3989.0/4576, epsilon=0.05, num_steps:152
rate of success : 3990.0/4577, epsilon=0.05, num_steps:264
rate of success : 3991.0/4578, epsilon=0.05, num_steps:158
rate of success : 3992.0/4579, epsilon=0.05, num_steps:171
rate of success : 3993.0/4580, epsilon=0.05, num_steps:230
rate of success : 3994.0/4581, epsilon=0.05, num_steps:1

rate of success : 4117.0/4704, epsilon=0.05, num_steps:158
rate of success : 4118.0/4705, epsilon=0.05, num_steps:189
rate of success : 4119.0/4706, epsilon=0.05, num_steps:160
rate of success : 4120.0/4707, epsilon=0.05, num_steps:171
rate of success : 4121.0/4708, epsilon=0.05, num_steps:219
rate of success : 4122.0/4709, epsilon=0.05, num_steps:178
rate of success : 4123.0/4710, epsilon=0.05, num_steps:239
rate of success : 4124.0/4711, epsilon=0.05, num_steps:164
rate of success : 4125.0/4712, epsilon=0.05, num_steps:233
rate of success : 4126.0/4713, epsilon=0.05, num_steps:197
rate of success : 4127.0/4714, epsilon=0.05, num_steps:299
rate of success : 4128.0/4715, epsilon=0.05, num_steps:173
rate of success : 4129.0/4716, epsilon=0.05, num_steps:270
rate of success : 4130.0/4717, epsilon=0.05, num_steps:233
rate of success : 4131.0/4718, epsilon=0.05, num_steps:185
rate of success : 4132.0/4719, epsilon=0.05, num_steps:171
rate of success : 4133.0/4720, epsilon=0.05, num_steps:1

rate of success : 4267.0/4854, epsilon=0.05, num_steps:250
rate of success : 4268.0/4855, epsilon=0.05, num_steps:189
rate of success : 4269.0/4856, epsilon=0.05, num_steps:175
rate of success : 4270.0/4857, epsilon=0.05, num_steps:175
rate of success : 4271.0/4858, epsilon=0.05, num_steps:175
rate of success : 4272.0/4859, epsilon=0.05, num_steps:252
rate of success : 4273.0/4860, epsilon=0.05, num_steps:232
rate of success : 4274.0/4861, epsilon=0.05, num_steps:159
rate of success : 4275.0/4862, epsilon=0.05, num_steps:247
rate of success : 4276.0/4863, epsilon=0.05, num_steps:148
rate of success : 4277.0/4864, epsilon=0.05, num_steps:149
rate of success : 4278.0/4865, epsilon=0.05, num_steps:258
rate of success : 4279.0/4866, epsilon=0.05, num_steps:179
rate of success : 4280.0/4867, epsilon=0.05, num_steps:170
rate of success : 4281.0/4868, epsilon=0.05, num_steps:250
rate of success : 4282.0/4869, epsilon=0.05, num_steps:170
rate of success : 4283.0/4870, epsilon=0.05, num_steps:1

rate of success : 4413.0/5000, epsilon=0.05, num_steps:593
rate of success : 4414.0/5001, epsilon=0.05, num_steps:244
rate of success : 4415.0/5002, epsilon=0.05, num_steps:176
rate of success : 4416.0/5003, epsilon=0.05, num_steps:151
rate of success : 4417.0/5004, epsilon=0.05, num_steps:150
rate of success : 4418.0/5005, epsilon=0.05, num_steps:259
rate of success : 4419.0/5006, epsilon=0.05, num_steps:225
rate of success : 4420.0/5007, epsilon=0.05, num_steps:151
rate of success : 4421.0/5008, epsilon=0.05, num_steps:172
rate of success : 4422.0/5009, epsilon=0.05, num_steps:170
rate of success : 4423.0/5010, epsilon=0.05, num_steps:154
rate of success : 4424.0/5011, epsilon=0.05, num_steps:318
rate of success : 4425.0/5012, epsilon=0.05, num_steps:163
rate of success : 4426.0/5013, epsilon=0.05, num_steps:174
rate of success : 4427.0/5014, epsilon=0.05, num_steps:637
rate of success : 4428.0/5015, epsilon=0.05, num_steps:233
rate of success : 4429.0/5016, epsilon=0.05, num_steps:1

rate of success : 4553.0/5140, epsilon=0.05, num_steps:270
rate of success : 4554.0/5141, epsilon=0.05, num_steps:146
rate of success : 4555.0/5142, epsilon=0.05, num_steps:166
rate of success : 4556.0/5143, epsilon=0.05, num_steps:163
rate of success : 4557.0/5144, epsilon=0.05, num_steps:231
rate of success : 4558.0/5145, epsilon=0.05, num_steps:239
rate of success : 4559.0/5146, epsilon=0.05, num_steps:255
rate of success : 4560.0/5147, epsilon=0.05, num_steps:152
rate of success : 4561.0/5148, epsilon=0.05, num_steps:176
rate of success : 4562.0/5149, epsilon=0.05, num_steps:162
rate of success : 4563.0/5150, epsilon=0.05, num_steps:159
rate of success : 4564.0/5151, epsilon=0.05, num_steps:173
rate of success : 4565.0/5152, epsilon=0.05, num_steps:253
rate of success : 4566.0/5153, epsilon=0.05, num_steps:170
rate of success : 4567.0/5154, epsilon=0.05, num_steps:156
rate of success : 4568.0/5155, epsilon=0.05, num_steps:174
rate of success : 4569.0/5156, epsilon=0.05, num_steps:1

rate of success : 4698.0/5285, epsilon=0.05, num_steps:178
rate of success : 4699.0/5286, epsilon=0.05, num_steps:149
rate of success : 4700.0/5287, epsilon=0.05, num_steps:181
rate of success : 4701.0/5288, epsilon=0.05, num_steps:180
rate of success : 4702.0/5289, epsilon=0.05, num_steps:233
rate of success : 4703.0/5290, epsilon=0.05, num_steps:157
rate of success : 4704.0/5291, epsilon=0.05, num_steps:152
rate of success : 4705.0/5292, epsilon=0.05, num_steps:263
rate of success : 4706.0/5293, epsilon=0.05, num_steps:254
rate of success : 4707.0/5294, epsilon=0.05, num_steps:147
rate of success : 4708.0/5295, epsilon=0.05, num_steps:224
rate of success : 4709.0/5296, epsilon=0.05, num_steps:172
rate of success : 4710.0/5297, epsilon=0.05, num_steps:256
rate of success : 4711.0/5298, epsilon=0.05, num_steps:175
rate of success : 4712.0/5299, epsilon=0.05, num_steps:312
rate of success : 4713.0/5300, epsilon=0.05, num_steps:153
rate of success : 4714.0/5301, epsilon=0.05, num_steps:1

rate of success : 4847.0/5434, epsilon=0.05, num_steps:178
rate of success : 4848.0/5435, epsilon=0.05, num_steps:178
rate of success : 4849.0/5436, epsilon=0.05, num_steps:173
rate of success : 4850.0/5437, epsilon=0.05, num_steps:176
rate of success : 4851.0/5438, epsilon=0.05, num_steps:198
rate of success : 4852.0/5439, epsilon=0.05, num_steps:194
rate of success : 4853.0/5440, epsilon=0.05, num_steps:186
rate of success : 4854.0/5441, epsilon=0.05, num_steps:159
rate of success : 4855.0/5442, epsilon=0.05, num_steps:153
rate of success : 4856.0/5443, epsilon=0.05, num_steps:277
rate of success : 4857.0/5444, epsilon=0.05, num_steps:167
rate of success : 4858.0/5445, epsilon=0.05, num_steps:254
rate of success : 4859.0/5446, epsilon=0.05, num_steps:166
rate of success : 4860.0/5447, epsilon=0.05, num_steps:170
rate of success : 4861.0/5448, epsilon=0.05, num_steps:155
rate of success : 4862.0/5449, epsilon=0.05, num_steps:171
rate of success : 4863.0/5450, epsilon=0.05, num_steps:1

rate of success : 4996.0/5583, epsilon=0.05, num_steps:256
rate of success : 4997.0/5584, epsilon=0.05, num_steps:249
rate of success : 4998.0/5585, epsilon=0.05, num_steps:177
rate of success : 4999.0/5586, epsilon=0.05, num_steps:179
rate of success : 5000.0/5587, epsilon=0.05, num_steps:173
rate of success : 5001.0/5588, epsilon=0.05, num_steps:176
rate of success : 5002.0/5589, epsilon=0.05, num_steps:164
rate of success : 5003.0/5590, epsilon=0.05, num_steps:256
rate of success : 5004.0/5591, epsilon=0.05, num_steps:170
rate of success : 5005.0/5592, epsilon=0.05, num_steps:258
rate of success : 5006.0/5593, epsilon=0.05, num_steps:173
rate of success : 5007.0/5594, epsilon=0.05, num_steps:261
rate of success : 5008.0/5595, epsilon=0.05, num_steps:171
rate of success : 5009.0/5596, epsilon=0.05, num_steps:173
rate of success : 5010.0/5597, epsilon=0.05, num_steps:181
rate of success : 5011.0/5598, epsilon=0.05, num_steps:173
rate of success : 5012.0/5599, epsilon=0.05, num_steps:1

rate of success : 5142.0/5729, epsilon=0.05, num_steps:257
rate of success : 5143.0/5730, epsilon=0.05, num_steps:151
rate of success : 5144.0/5731, epsilon=0.05, num_steps:251
rate of success : 5145.0/5732, epsilon=0.05, num_steps:373
rate of success : 5146.0/5733, epsilon=0.05, num_steps:202
rate of success : 5147.0/5734, epsilon=0.05, num_steps:336
rate of success : 5148.0/5735, epsilon=0.05, num_steps:158
rate of success : 5149.0/5736, epsilon=0.05, num_steps:389
rate of success : 5150.0/5737, epsilon=0.05, num_steps:174
rate of success : 5151.0/5738, epsilon=0.05, num_steps:182
rate of success : 5152.0/5739, epsilon=0.05, num_steps:250
rate of success : 5153.0/5740, epsilon=0.05, num_steps:255
rate of success : 5154.0/5741, epsilon=0.05, num_steps:169
rate of success : 5155.0/5742, epsilon=0.05, num_steps:180
rate of success : 5156.0/5743, epsilon=0.05, num_steps:154
rate of success : 5157.0/5744, epsilon=0.05, num_steps:245
rate of success : 5158.0/5745, epsilon=0.05, num_steps:1

rate of success : 5287.0/5874, epsilon=0.05, num_steps:375
rate of success : 5288.0/5875, epsilon=0.05, num_steps:155
rate of success : 5289.0/5876, epsilon=0.05, num_steps:236
rate of success : 5290.0/5877, epsilon=0.05, num_steps:235
rate of success : 5291.0/5878, epsilon=0.05, num_steps:246
rate of success : 5292.0/5879, epsilon=0.05, num_steps:170
rate of success : 5293.0/5880, epsilon=0.05, num_steps:218
rate of success : 5294.0/5881, epsilon=0.05, num_steps:199
rate of success : 5295.0/5882, epsilon=0.05, num_steps:251
rate of success : 5296.0/5883, epsilon=0.05, num_steps:175
rate of success : 5297.0/5884, epsilon=0.05, num_steps:151
rate of success : 5298.0/5885, epsilon=0.05, num_steps:237
rate of success : 5299.0/5886, epsilon=0.05, num_steps:255
rate of success : 5300.0/5887, epsilon=0.05, num_steps:169
rate of success : 5301.0/5888, epsilon=0.05, num_steps:176
rate of success : 5302.0/5889, epsilon=0.05, num_steps:233
rate of success : 5303.0/5890, epsilon=0.05, num_steps:1

rate of success : 5427.0/6014, epsilon=0.05, num_steps:252
rate of success : 5428.0/6015, epsilon=0.05, num_steps:188
rate of success : 5429.0/6016, epsilon=0.05, num_steps:170
rate of success : 5430.0/6017, epsilon=0.05, num_steps:160
rate of success : 5431.0/6018, epsilon=0.05, num_steps:157
rate of success : 5432.0/6019, epsilon=0.05, num_steps:210
rate of success : 5433.0/6020, epsilon=0.05, num_steps:179
rate of success : 5434.0/6021, epsilon=0.05, num_steps:255
rate of success : 5435.0/6022, epsilon=0.05, num_steps:159
rate of success : 5436.0/6023, epsilon=0.05, num_steps:387
rate of success : 5437.0/6024, epsilon=0.05, num_steps:181
rate of success : 5438.0/6025, epsilon=0.05, num_steps:240
rate of success : 5439.0/6026, epsilon=0.05, num_steps:198
rate of success : 5440.0/6027, epsilon=0.05, num_steps:172
rate of success : 5441.0/6028, epsilon=0.05, num_steps:178
rate of success : 5442.0/6029, epsilon=0.05, num_steps:177
rate of success : 5443.0/6030, epsilon=0.05, num_steps:2

rate of success : 5570.0/6157, epsilon=0.05, num_steps:234
rate of success : 5571.0/6158, epsilon=0.05, num_steps:173
rate of success : 5572.0/6159, epsilon=0.05, num_steps:255
rate of success : 5573.0/6160, epsilon=0.05, num_steps:220
rate of success : 5574.0/6161, epsilon=0.05, num_steps:258
rate of success : 5575.0/6162, epsilon=0.05, num_steps:167
rate of success : 5576.0/6163, epsilon=0.05, num_steps:193
rate of success : 5577.0/6164, epsilon=0.05, num_steps:282
rate of success : 5578.0/6165, epsilon=0.05, num_steps:312
rate of success : 5579.0/6166, epsilon=0.05, num_steps:179
rate of success : 5580.0/6167, epsilon=0.05, num_steps:266
rate of success : 5581.0/6168, epsilon=0.05, num_steps:170
rate of success : 5582.0/6169, epsilon=0.05, num_steps:153
rate of success : 5583.0/6170, epsilon=0.05, num_steps:228
rate of success : 5584.0/6171, epsilon=0.05, num_steps:169
rate of success : 5585.0/6172, epsilon=0.05, num_steps:191
rate of success : 5586.0/6173, epsilon=0.05, num_steps:1

rate of success : 5731.0/6318, epsilon=0.05, num_steps:256
rate of success : 5732.0/6319, epsilon=0.05, num_steps:235
rate of success : 5733.0/6320, epsilon=0.05, num_steps:171
rate of success : 5734.0/6321, epsilon=0.05, num_steps:253
rate of success : 5735.0/6322, epsilon=0.05, num_steps:157
rate of success : 5736.0/6323, epsilon=0.05, num_steps:170
rate of success : 5737.0/6324, epsilon=0.05, num_steps:181
rate of success : 5738.0/6325, epsilon=0.05, num_steps:166
rate of success : 5739.0/6326, epsilon=0.05, num_steps:154
rate of success : 5740.0/6327, epsilon=0.05, num_steps:170
rate of success : 5741.0/6328, epsilon=0.05, num_steps:159
rate of success : 5742.0/6329, epsilon=0.05, num_steps:169
rate of success : 5743.0/6330, epsilon=0.05, num_steps:171
rate of success : 5744.0/6331, epsilon=0.05, num_steps:270
rate of success : 5745.0/6332, epsilon=0.05, num_steps:181
rate of success : 5746.0/6333, epsilon=0.05, num_steps:330
rate of success : 5747.0/6334, epsilon=0.05, num_steps:2

rate of success : 5873.0/6460, epsilon=0.05, num_steps:178
rate of success : 5874.0/6461, epsilon=0.05, num_steps:467
rate of success : 5875.0/6462, epsilon=0.05, num_steps:158
rate of success : 5876.0/6463, epsilon=0.05, num_steps:168
rate of success : 5877.0/6464, epsilon=0.05, num_steps:388
rate of success : 5878.0/6465, epsilon=0.05, num_steps:257
rate of success : 5879.0/6466, epsilon=0.05, num_steps:235
rate of success : 5880.0/6467, epsilon=0.05, num_steps:172
rate of success : 5881.0/6468, epsilon=0.05, num_steps:180
rate of success : 5882.0/6469, epsilon=0.05, num_steps:168
rate of success : 5883.0/6470, epsilon=0.05, num_steps:153
rate of success : 5884.0/6471, epsilon=0.05, num_steps:174
rate of success : 5885.0/6472, epsilon=0.05, num_steps:154
rate of success : 5886.0/6473, epsilon=0.05, num_steps:184
rate of success : 5887.0/6474, epsilon=0.05, num_steps:172
rate of success : 5888.0/6475, epsilon=0.05, num_steps:171
rate of success : 5889.0/6476, epsilon=0.05, num_steps:2

rate of success : 6019.0/6606, epsilon=0.05, num_steps:151
rate of success : 6020.0/6607, epsilon=0.05, num_steps:173
rate of success : 6021.0/6608, epsilon=0.05, num_steps:275
rate of success : 6022.0/6609, epsilon=0.05, num_steps:174
rate of success : 6023.0/6610, epsilon=0.05, num_steps:177
rate of success : 6024.0/6611, epsilon=0.05, num_steps:145
rate of success : 6025.0/6612, epsilon=0.05, num_steps:158
rate of success : 6026.0/6613, epsilon=0.05, num_steps:158
rate of success : 6027.0/6614, epsilon=0.05, num_steps:156
rate of success : 6028.0/6615, epsilon=0.05, num_steps:154
rate of success : 6029.0/6616, epsilon=0.05, num_steps:240
rate of success : 6030.0/6617, epsilon=0.05, num_steps:154
rate of success : 6031.0/6618, epsilon=0.05, num_steps:249
rate of success : 6032.0/6619, epsilon=0.05, num_steps:183
rate of success : 6033.0/6620, epsilon=0.05, num_steps:169
rate of success : 6034.0/6621, epsilon=0.05, num_steps:181
rate of success : 6035.0/6622, epsilon=0.05, num_steps:1

rate of success : 6169.0/6756, epsilon=0.05, num_steps:173
rate of success : 6170.0/6757, epsilon=0.05, num_steps:168
rate of success : 6171.0/6758, epsilon=0.05, num_steps:150
rate of success : 6172.0/6759, epsilon=0.05, num_steps:153
rate of success : 6173.0/6760, epsilon=0.05, num_steps:173
rate of success : 6174.0/6761, epsilon=0.05, num_steps:178
rate of success : 6175.0/6762, epsilon=0.05, num_steps:149
rate of success : 6176.0/6763, epsilon=0.05, num_steps:236
rate of success : 6177.0/6764, epsilon=0.05, num_steps:173
rate of success : 6178.0/6765, epsilon=0.05, num_steps:152
rate of success : 6179.0/6766, epsilon=0.05, num_steps:320
rate of success : 6180.0/6767, epsilon=0.05, num_steps:231
rate of success : 6181.0/6768, epsilon=0.05, num_steps:281
rate of success : 6182.0/6769, epsilon=0.05, num_steps:179
rate of success : 6183.0/6770, epsilon=0.05, num_steps:163
rate of success : 6184.0/6771, epsilon=0.05, num_steps:177
rate of success : 6185.0/6772, epsilon=0.05, num_steps:2

rate of success : 6318.0/6905, epsilon=0.05, num_steps:231
rate of success : 6319.0/6906, epsilon=0.05, num_steps:167
rate of success : 6320.0/6907, epsilon=0.05, num_steps:151
rate of success : 6321.0/6908, epsilon=0.05, num_steps:249
rate of success : 6322.0/6909, epsilon=0.05, num_steps:164
rate of success : 6323.0/6910, epsilon=0.05, num_steps:163
rate of success : 6324.0/6911, epsilon=0.05, num_steps:153
rate of success : 6325.0/6912, epsilon=0.05, num_steps:258
rate of success : 6326.0/6913, epsilon=0.05, num_steps:179
rate of success : 6327.0/6914, epsilon=0.05, num_steps:158
rate of success : 6328.0/6915, epsilon=0.05, num_steps:262
rate of success : 6329.0/6916, epsilon=0.05, num_steps:238
rate of success : 6330.0/6917, epsilon=0.05, num_steps:163
rate of success : 6331.0/6918, epsilon=0.05, num_steps:148
rate of success : 6332.0/6919, epsilon=0.05, num_steps:160
rate of success : 6333.0/6920, epsilon=0.05, num_steps:240
rate of success : 6334.0/6921, epsilon=0.05, num_steps:1

rate of success : 6467.0/7054, epsilon=0.05, num_steps:152
rate of success : 6468.0/7055, epsilon=0.05, num_steps:169
rate of success : 6469.0/7056, epsilon=0.05, num_steps:169
rate of success : 6470.0/7057, epsilon=0.05, num_steps:239
rate of success : 6471.0/7058, epsilon=0.05, num_steps:153
rate of success : 6472.0/7059, epsilon=0.05, num_steps:166
rate of success : 6473.0/7060, epsilon=0.05, num_steps:154
rate of success : 6474.0/7061, epsilon=0.05, num_steps:157
rate of success : 6475.0/7062, epsilon=0.05, num_steps:156
rate of success : 6476.0/7063, epsilon=0.05, num_steps:149
rate of success : 6477.0/7064, epsilon=0.05, num_steps:151
rate of success : 6478.0/7065, epsilon=0.05, num_steps:163
rate of success : 6479.0/7066, epsilon=0.05, num_steps:180
rate of success : 6480.0/7067, epsilon=0.05, num_steps:247
rate of success : 6481.0/7068, epsilon=0.05, num_steps:172
rate of success : 6482.0/7069, epsilon=0.05, num_steps:166
rate of success : 6483.0/7070, epsilon=0.05, num_steps:1

rate of success : 6623.0/7210, epsilon=0.05, num_steps:176
rate of success : 6624.0/7211, epsilon=0.05, num_steps:173
rate of success : 6625.0/7212, epsilon=0.05, num_steps:153
rate of success : 6626.0/7213, epsilon=0.05, num_steps:165
rate of success : 6627.0/7214, epsilon=0.05, num_steps:171
rate of success : 6628.0/7215, epsilon=0.05, num_steps:151
rate of success : 6629.0/7216, epsilon=0.05, num_steps:176
rate of success : 6630.0/7217, epsilon=0.05, num_steps:166
rate of success : 6631.0/7218, epsilon=0.05, num_steps:235
rate of success : 6632.0/7219, epsilon=0.05, num_steps:156
rate of success : 6633.0/7220, epsilon=0.05, num_steps:149
rate of success : 6634.0/7221, epsilon=0.05, num_steps:242
rate of success : 6635.0/7222, epsilon=0.05, num_steps:163
rate of success : 6636.0/7223, epsilon=0.05, num_steps:160
rate of success : 6637.0/7224, epsilon=0.05, num_steps:158
rate of success : 6638.0/7225, epsilon=0.05, num_steps:229
rate of success : 6639.0/7226, epsilon=0.05, num_steps:1

rate of success : 6788.0/7375, epsilon=0.05, num_steps:236
rate of success : 6789.0/7376, epsilon=0.05, num_steps:235
rate of success : 6790.0/7377, epsilon=0.05, num_steps:170
rate of success : 6791.0/7378, epsilon=0.05, num_steps:166
rate of success : 6792.0/7379, epsilon=0.05, num_steps:160
rate of success : 6793.0/7380, epsilon=0.05, num_steps:171
rate of success : 6794.0/7381, epsilon=0.05, num_steps:167
rate of success : 6795.0/7382, epsilon=0.05, num_steps:155
rate of success : 6796.0/7383, epsilon=0.05, num_steps:238
rate of success : 6797.0/7384, epsilon=0.05, num_steps:167
rate of success : 6798.0/7385, epsilon=0.05, num_steps:145
rate of success : 6799.0/7386, epsilon=0.05, num_steps:203
rate of success : 6800.0/7387, epsilon=0.05, num_steps:165
rate of success : 6801.0/7388, epsilon=0.05, num_steps:250
rate of success : 6802.0/7389, epsilon=0.05, num_steps:163
rate of success : 6803.0/7390, epsilon=0.05, num_steps:153
rate of success : 6804.0/7391, epsilon=0.05, num_steps:1

rate of success : 6947.0/7534, epsilon=0.05, num_steps:164
rate of success : 6948.0/7535, epsilon=0.05, num_steps:162
rate of success : 6949.0/7536, epsilon=0.05, num_steps:161
rate of success : 6950.0/7537, epsilon=0.05, num_steps:178
rate of success : 6951.0/7538, epsilon=0.05, num_steps:151
rate of success : 6952.0/7539, epsilon=0.05, num_steps:186
rate of success : 6953.0/7540, epsilon=0.05, num_steps:159
rate of success : 6954.0/7541, epsilon=0.05, num_steps:163
rate of success : 6955.0/7542, epsilon=0.05, num_steps:166
rate of success : 6956.0/7543, epsilon=0.05, num_steps:167
rate of success : 6957.0/7544, epsilon=0.05, num_steps:241
rate of success : 6958.0/7545, epsilon=0.05, num_steps:161
rate of success : 6959.0/7546, epsilon=0.05, num_steps:234
rate of success : 6960.0/7547, epsilon=0.05, num_steps:164
rate of success : 6961.0/7548, epsilon=0.05, num_steps:162
rate of success : 6962.0/7549, epsilon=0.05, num_steps:150
rate of success : 6963.0/7550, epsilon=0.05, num_steps:1

rate of success : 7087.0/7674, epsilon=0.05, num_steps:167
rate of success : 7088.0/7675, epsilon=0.05, num_steps:163
rate of success : 7089.0/7676, epsilon=0.05, num_steps:172
rate of success : 7090.0/7677, epsilon=0.05, num_steps:227
rate of success : 7091.0/7678, epsilon=0.05, num_steps:163
rate of success : 7092.0/7679, epsilon=0.05, num_steps:168
rate of success : 7093.0/7680, epsilon=0.05, num_steps:159
rate of success : 7094.0/7681, epsilon=0.05, num_steps:149
rate of success : 7095.0/7682, epsilon=0.05, num_steps:234
rate of success : 7096.0/7683, epsilon=0.05, num_steps:233
rate of success : 7097.0/7684, epsilon=0.05, num_steps:148
rate of success : 7098.0/7685, epsilon=0.05, num_steps:179
rate of success : 7099.0/7686, epsilon=0.05, num_steps:157
rate of success : 7100.0/7687, epsilon=0.05, num_steps:245
rate of success : 7101.0/7688, epsilon=0.05, num_steps:163
rate of success : 7102.0/7689, epsilon=0.05, num_steps:155
rate of success : 7103.0/7690, epsilon=0.05, num_steps:2

rate of success : 7228.0/7815, epsilon=0.05, num_steps:245
rate of success : 7229.0/7816, epsilon=0.05, num_steps:153
rate of success : 7230.0/7817, epsilon=0.05, num_steps:258
rate of success : 7231.0/7818, epsilon=0.05, num_steps:163
rate of success : 7232.0/7819, epsilon=0.05, num_steps:250
rate of success : 7233.0/7820, epsilon=0.05, num_steps:151
rate of success : 7234.0/7821, epsilon=0.05, num_steps:236
rate of success : 7235.0/7822, epsilon=0.05, num_steps:245
rate of success : 7236.0/7823, epsilon=0.05, num_steps:242
rate of success : 7237.0/7824, epsilon=0.05, num_steps:239
rate of success : 7238.0/7825, epsilon=0.05, num_steps:163
rate of success : 7239.0/7826, epsilon=0.05, num_steps:152
rate of success : 7240.0/7827, epsilon=0.05, num_steps:158
rate of success : 7241.0/7828, epsilon=0.05, num_steps:161
rate of success : 7242.0/7829, epsilon=0.05, num_steps:155
rate of success : 7243.0/7830, epsilon=0.05, num_steps:254
rate of success : 7244.0/7831, epsilon=0.05, num_steps:3

rate of success : 7395.0/7982, epsilon=0.05, num_steps:150
rate of success : 7396.0/7983, epsilon=0.05, num_steps:167
rate of success : 7397.0/7984, epsilon=0.05, num_steps:161
rate of success : 7398.0/7985, epsilon=0.05, num_steps:160
rate of success : 7399.0/7986, epsilon=0.05, num_steps:254
rate of success : 7400.0/7987, epsilon=0.05, num_steps:158
rate of success : 7401.0/7988, epsilon=0.05, num_steps:173
rate of success : 7402.0/7989, epsilon=0.05, num_steps:156
rate of success : 7403.0/7990, epsilon=0.05, num_steps:159
rate of success : 7404.0/7991, epsilon=0.05, num_steps:156
rate of success : 7405.0/7992, epsilon=0.05, num_steps:169
rate of success : 7406.0/7993, epsilon=0.05, num_steps:152
rate of success : 7407.0/7994, epsilon=0.05, num_steps:155
rate of success : 7408.0/7995, epsilon=0.05, num_steps:162
rate of success : 7409.0/7996, epsilon=0.05, num_steps:154
rate of success : 7410.0/7997, epsilon=0.05, num_steps:168
rate of success : 7411.0/7998, epsilon=0.05, num_steps:1

rate of success : 7557.0/8144, epsilon=0.05, num_steps:256
rate of success : 7558.0/8145, epsilon=0.05, num_steps:160
rate of success : 7559.0/8146, epsilon=0.05, num_steps:160
rate of success : 7560.0/8147, epsilon=0.05, num_steps:161
rate of success : 7561.0/8148, epsilon=0.05, num_steps:182
rate of success : 7562.0/8149, epsilon=0.05, num_steps:245
rate of success : 7563.0/8150, epsilon=0.05, num_steps:164
rate of success : 7564.0/8151, epsilon=0.05, num_steps:159
rate of success : 7565.0/8152, epsilon=0.05, num_steps:167
rate of success : 7566.0/8153, epsilon=0.05, num_steps:255
rate of success : 7567.0/8154, epsilon=0.05, num_steps:166
rate of success : 7568.0/8155, epsilon=0.05, num_steps:153
rate of success : 7569.0/8156, epsilon=0.05, num_steps:237
rate of success : 7570.0/8157, epsilon=0.05, num_steps:173
rate of success : 7571.0/8158, epsilon=0.05, num_steps:157
rate of success : 7572.0/8159, epsilon=0.05, num_steps:184
rate of success : 7573.0/8160, epsilon=0.05, num_steps:1

rate of success : 7710.0/8297, epsilon=0.05, num_steps:150
rate of success : 7711.0/8298, epsilon=0.05, num_steps:153
rate of success : 7712.0/8299, epsilon=0.05, num_steps:243
rate of success : 7713.0/8300, epsilon=0.05, num_steps:160
rate of success : 7714.0/8301, epsilon=0.05, num_steps:256
rate of success : 7715.0/8302, epsilon=0.05, num_steps:167
rate of success : 7716.0/8303, epsilon=0.05, num_steps:155
rate of success : 7717.0/8304, epsilon=0.05, num_steps:240
rate of success : 7718.0/8305, epsilon=0.05, num_steps:151
rate of success : 7719.0/8306, epsilon=0.05, num_steps:170
rate of success : 7720.0/8307, epsilon=0.05, num_steps:153
rate of success : 7721.0/8308, epsilon=0.05, num_steps:244
rate of success : 7722.0/8309, epsilon=0.05, num_steps:167
rate of success : 7723.0/8310, epsilon=0.05, num_steps:243
rate of success : 7724.0/8311, epsilon=0.05, num_steps:164
rate of success : 7725.0/8312, epsilon=0.05, num_steps:162
rate of success : 7726.0/8313, epsilon=0.05, num_steps:1

rate of success : 7872.0/8459, epsilon=0.05, num_steps:151
rate of success : 7873.0/8460, epsilon=0.05, num_steps:165
rate of success : 7874.0/8461, epsilon=0.05, num_steps:176
rate of success : 7875.0/8462, epsilon=0.05, num_steps:162
rate of success : 7876.0/8463, epsilon=0.05, num_steps:148
rate of success : 7877.0/8464, epsilon=0.05, num_steps:170
rate of success : 7878.0/8465, epsilon=0.05, num_steps:151
rate of success : 7879.0/8466, epsilon=0.05, num_steps:157
rate of success : 7880.0/8467, epsilon=0.05, num_steps:238
rate of success : 7881.0/8468, epsilon=0.05, num_steps:170
rate of success : 7882.0/8469, epsilon=0.05, num_steps:163
rate of success : 7883.0/8470, epsilon=0.05, num_steps:174
rate of success : 7884.0/8471, epsilon=0.05, num_steps:169
rate of success : 7885.0/8472, epsilon=0.05, num_steps:169
rate of success : 7886.0/8473, epsilon=0.05, num_steps:243
rate of success : 7887.0/8474, epsilon=0.05, num_steps:243
rate of success : 7888.0/8475, epsilon=0.05, num_steps:1

rate of success : 8024.0/8611, epsilon=0.05, num_steps:168
rate of success : 8025.0/8612, epsilon=0.05, num_steps:242
rate of success : 8026.0/8613, epsilon=0.05, num_steps:162
rate of success : 8027.0/8614, epsilon=0.05, num_steps:248
rate of success : 8028.0/8615, epsilon=0.05, num_steps:168
rate of success : 8029.0/8616, epsilon=0.05, num_steps:255
rate of success : 8030.0/8617, epsilon=0.05, num_steps:151
rate of success : 8031.0/8618, epsilon=0.05, num_steps:233
rate of success : 8032.0/8619, epsilon=0.05, num_steps:158
rate of success : 8033.0/8620, epsilon=0.05, num_steps:163
rate of success : 8034.0/8621, epsilon=0.05, num_steps:176
rate of success : 8035.0/8622, epsilon=0.05, num_steps:158
rate of success : 8036.0/8623, epsilon=0.05, num_steps:174
rate of success : 8037.0/8624, epsilon=0.05, num_steps:161
rate of success : 8038.0/8625, epsilon=0.05, num_steps:158
rate of success : 8039.0/8626, epsilon=0.05, num_steps:169
rate of success : 8040.0/8627, epsilon=0.05, num_steps:2

rate of success : 8185.0/8772, epsilon=0.05, num_steps:237
rate of success : 8186.0/8773, epsilon=0.05, num_steps:184
rate of success : 8187.0/8774, epsilon=0.05, num_steps:228
rate of success : 8188.0/8775, epsilon=0.05, num_steps:251
rate of success : 8189.0/8776, epsilon=0.05, num_steps:166
rate of success : 8190.0/8777, epsilon=0.05, num_steps:152
rate of success : 8191.0/8778, epsilon=0.05, num_steps:158
rate of success : 8192.0/8779, epsilon=0.05, num_steps:241
rate of success : 8193.0/8780, epsilon=0.05, num_steps:174
rate of success : 8194.0/8781, epsilon=0.05, num_steps:229
rate of success : 8195.0/8782, epsilon=0.05, num_steps:173
rate of success : 8196.0/8783, epsilon=0.05, num_steps:176
rate of success : 8197.0/8784, epsilon=0.05, num_steps:240
rate of success : 8198.0/8785, epsilon=0.05, num_steps:174
rate of success : 8199.0/8786, epsilon=0.05, num_steps:155
rate of success : 8200.0/8787, epsilon=0.05, num_steps:188
rate of success : 8201.0/8788, epsilon=0.05, num_steps:2

rate of success : 8346.0/8933, epsilon=0.05, num_steps:162
rate of success : 8347.0/8934, epsilon=0.05, num_steps:249
rate of success : 8348.0/8935, epsilon=0.05, num_steps:244
rate of success : 8349.0/8936, epsilon=0.05, num_steps:168
rate of success : 8350.0/8937, epsilon=0.05, num_steps:178
rate of success : 8351.0/8938, epsilon=0.05, num_steps:239
rate of success : 8352.0/8939, epsilon=0.05, num_steps:157
rate of success : 8353.0/8940, epsilon=0.05, num_steps:239
rate of success : 8354.0/8941, epsilon=0.05, num_steps:147
rate of success : 8355.0/8942, epsilon=0.05, num_steps:155
rate of success : 8356.0/8943, epsilon=0.05, num_steps:248
rate of success : 8357.0/8944, epsilon=0.05, num_steps:161
rate of success : 8358.0/8945, epsilon=0.05, num_steps:245
rate of success : 8359.0/8946, epsilon=0.05, num_steps:157
rate of success : 8360.0/8947, epsilon=0.05, num_steps:237
rate of success : 8361.0/8948, epsilon=0.05, num_steps:165
rate of success : 8362.0/8949, epsilon=0.05, num_steps:1

rate of success : 8503.0/9090, epsilon=0.05, num_steps:167
rate of success : 8504.0/9091, epsilon=0.05, num_steps:152
rate of success : 8505.0/9092, epsilon=0.05, num_steps:250
rate of success : 8506.0/9093, epsilon=0.05, num_steps:160
rate of success : 8507.0/9094, epsilon=0.05, num_steps:255
rate of success : 8508.0/9095, epsilon=0.05, num_steps:163
rate of success : 8509.0/9096, epsilon=0.05, num_steps:165
rate of success : 8510.0/9097, epsilon=0.05, num_steps:153
rate of success : 8511.0/9098, epsilon=0.05, num_steps:176
rate of success : 8512.0/9099, epsilon=0.05, num_steps:159
rate of success : 8513.0/9100, epsilon=0.05, num_steps:169
rate of success : 8514.0/9101, epsilon=0.05, num_steps:167
rate of success : 8515.0/9102, epsilon=0.05, num_steps:162
rate of success : 8516.0/9103, epsilon=0.05, num_steps:157
rate of success : 8517.0/9104, epsilon=0.05, num_steps:155
rate of success : 8518.0/9105, epsilon=0.05, num_steps:256
rate of success : 8519.0/9106, epsilon=0.05, num_steps:2

rate of success : 8662.0/9249, epsilon=0.05, num_steps:159
rate of success : 8663.0/9250, epsilon=0.05, num_steps:250
rate of success : 8664.0/9251, epsilon=0.05, num_steps:154
rate of success : 8665.0/9252, epsilon=0.05, num_steps:234
rate of success : 8666.0/9253, epsilon=0.05, num_steps:158
rate of success : 8667.0/9254, epsilon=0.05, num_steps:170
rate of success : 8668.0/9255, epsilon=0.05, num_steps:167
rate of success : 8669.0/9256, epsilon=0.05, num_steps:161
rate of success : 8670.0/9257, epsilon=0.05, num_steps:156
rate of success : 8671.0/9258, epsilon=0.05, num_steps:233
rate of success : 8672.0/9259, epsilon=0.05, num_steps:154
rate of success : 8673.0/9260, epsilon=0.05, num_steps:167
rate of success : 8674.0/9261, epsilon=0.05, num_steps:243
rate of success : 8675.0/9262, epsilon=0.05, num_steps:236
rate of success : 8676.0/9263, epsilon=0.05, num_steps:250
rate of success : 8677.0/9264, epsilon=0.05, num_steps:159
rate of success : 8678.0/9265, epsilon=0.05, num_steps:1

rate of success : 8821.0/9408, epsilon=0.05, num_steps:244
rate of success : 8822.0/9409, epsilon=0.05, num_steps:175
rate of success : 8823.0/9410, epsilon=0.05, num_steps:243
rate of success : 8824.0/9411, epsilon=0.05, num_steps:233
rate of success : 8825.0/9412, epsilon=0.05, num_steps:240
rate of success : 8826.0/9413, epsilon=0.05, num_steps:157
rate of success : 8827.0/9414, epsilon=0.05, num_steps:234
rate of success : 8828.0/9415, epsilon=0.05, num_steps:175
rate of success : 8829.0/9416, epsilon=0.05, num_steps:168
rate of success : 8830.0/9417, epsilon=0.05, num_steps:235
rate of success : 8831.0/9418, epsilon=0.05, num_steps:245
rate of success : 8832.0/9419, epsilon=0.05, num_steps:164
rate of success : 8833.0/9420, epsilon=0.05, num_steps:151
rate of success : 8834.0/9421, epsilon=0.05, num_steps:247
rate of success : 8835.0/9422, epsilon=0.05, num_steps:174
rate of success : 8836.0/9423, epsilon=0.05, num_steps:171
rate of success : 8837.0/9424, epsilon=0.05, num_steps:2

rate of success : 8982.0/9569, epsilon=0.05, num_steps:188
rate of success : 8983.0/9570, epsilon=0.05, num_steps:250
rate of success : 8984.0/9571, epsilon=0.05, num_steps:159
rate of success : 8985.0/9572, epsilon=0.05, num_steps:237
rate of success : 8986.0/9573, epsilon=0.05, num_steps:250
rate of success : 8987.0/9574, epsilon=0.05, num_steps:149
rate of success : 8988.0/9575, epsilon=0.05, num_steps:230
rate of success : 8989.0/9576, epsilon=0.05, num_steps:160
rate of success : 8990.0/9577, epsilon=0.05, num_steps:242
rate of success : 8991.0/9578, epsilon=0.05, num_steps:255
rate of success : 8992.0/9579, epsilon=0.05, num_steps:161
rate of success : 8993.0/9580, epsilon=0.05, num_steps:161
rate of success : 8994.0/9581, epsilon=0.05, num_steps:161
rate of success : 8995.0/9582, epsilon=0.05, num_steps:152
rate of success : 8996.0/9583, epsilon=0.05, num_steps:161
rate of success : 8997.0/9584, epsilon=0.05, num_steps:166
rate of success : 8998.0/9585, epsilon=0.05, num_steps:1

rate of success : 9144.0/9731, epsilon=0.05, num_steps:164
rate of success : 9145.0/9732, epsilon=0.05, num_steps:158
rate of success : 9146.0/9733, epsilon=0.05, num_steps:167
rate of success : 9147.0/9734, epsilon=0.05, num_steps:164
rate of success : 9148.0/9735, epsilon=0.05, num_steps:245
rate of success : 9149.0/9736, epsilon=0.05, num_steps:163
rate of success : 9150.0/9737, epsilon=0.05, num_steps:254
rate of success : 9151.0/9738, epsilon=0.05, num_steps:163
rate of success : 9152.0/9739, epsilon=0.05, num_steps:239
rate of success : 9153.0/9740, epsilon=0.05, num_steps:156
rate of success : 9154.0/9741, epsilon=0.05, num_steps:154
rate of success : 9155.0/9742, epsilon=0.05, num_steps:166
rate of success : 9156.0/9743, epsilon=0.05, num_steps:167
rate of success : 9157.0/9744, epsilon=0.05, num_steps:161
rate of success : 9158.0/9745, epsilon=0.05, num_steps:155
rate of success : 9159.0/9746, epsilon=0.05, num_steps:232
rate of success : 9160.0/9747, epsilon=0.05, num_steps:1

rate of success : 9304.0/9891, epsilon=0.05, num_steps:251
rate of success : 9305.0/9892, epsilon=0.05, num_steps:167
rate of success : 9306.0/9893, epsilon=0.05, num_steps:166
rate of success : 9307.0/9894, epsilon=0.05, num_steps:189
rate of success : 9308.0/9895, epsilon=0.05, num_steps:246
rate of success : 9309.0/9896, epsilon=0.05, num_steps:160
rate of success : 9310.0/9897, epsilon=0.05, num_steps:174
rate of success : 9311.0/9898, epsilon=0.05, num_steps:236
rate of success : 9312.0/9899, epsilon=0.05, num_steps:158
rate of success : 9313.0/9900, epsilon=0.05, num_steps:156
rate of success : 9314.0/9901, epsilon=0.05, num_steps:227
rate of success : 9315.0/9902, epsilon=0.05, num_steps:169
rate of success : 9316.0/9903, epsilon=0.05, num_steps:154
rate of success : 9317.0/9904, epsilon=0.05, num_steps:238
rate of success : 9318.0/9905, epsilon=0.05, num_steps:153
rate of success : 9319.0/9906, epsilon=0.05, num_steps:168
rate of success : 9320.0/9907, epsilon=0.05, num_steps:1

In [4]:
len(S), len(Rewards)

(161, 161)

In [5]:
counts_steps[-100:]

[227,
 169,
 154,
 238,
 153,
 168,
 174,
 160,
 159,
 238,
 156,
 240,
 168,
 162,
 173,
 169,
 247,
 162,
 164,
 169,
 249,
 167,
 149,
 253,
 163,
 232,
 182,
 170,
 240,
 153,
 159,
 237,
 172,
 160,
 170,
 181,
 236,
 164,
 152,
 249,
 168,
 159,
 174,
 163,
 239,
 164,
 151,
 244,
 157,
 166,
 149,
 166,
 171,
 174,
 241,
 241,
 156,
 161,
 247,
 251,
 246,
 168,
 236,
 235,
 256,
 247,
 155,
 170,
 158,
 161,
 166,
 156,
 251,
 173,
 168,
 169,
 251,
 161,
 234,
 242,
 170,
 172,
 165,
 154,
 235,
 170,
 152,
 152,
 172,
 173,
 165,
 249,
 256,
 168,
 160,
 158,
 165,
 163,
 148,
 160]

In [4]:
from moviepy.editor import ImageSequenceClip
def test_model(env, q, N, gif_name=None):
    frames = []
    observation = env.reset()
    new_p, new_v = convert2discrete(observation, N=discr)
    for curr_step in range(0, env._max_episode_steps):
        
        frame = env.render(mode='rgb_array')
        frames.append(frame) 
        action = np.argmax(q[new_p, new_v,:])
        new_observation, reward, done, info = env.step(action)
        new_p, new_v = convert2discrete(new_observation, N=N)
        if done:
            print(f"success : {curr_step}")
            break
    env.close()
    if gif_name is not None:
        clip = ImageSequenceClip(frames, fps=20)
        clip.write_gif(gif_name, fps=20)

q = np.load("monte_carlo_moutaincar.npy")
test_model(env,q,discr, "monte_carlo_moutaincar.gif")
        

t:   7%|████▍                                                              | 11/165 [00:00<00:01, 108.24it/s, now=None]

success : 164
MoviePy - Building file monte_carlo_moutaincar.gif with imageio.


                                                                                                                       