# Solution to OpenAI Gym's Cartpole Version 0

* Goal is to balance a pole on a moving cart. 
* Game is considered solved if able to obtain an average reward of 195 over 100 consecutive games.
* Solution below is a simple Q table look up using incremental update as described in Chapter 2.5 of Reinforcement Learning: An Introduction by Sutton and Barto
* Exploration is episolon greedy with epsilon starting at 0.99 and linearly decaying (with a minimum value of 0.05 during training)

In [16]:
import gym
import numpy as np
import random

env = gym.make('CartPole-v0')

Q                = {} #the Q-table
learningRate     = 0.1 #amount of update
randomActionRate = 0.99 
totalReward      = 0
episodes = 100000 #number of episodes to find Q table values
observation_round = 100 

for i_episode in range(episodes):
    observation = env.reset()
    episodeReward = 0

    for t in range(300):
        if t >= 199:
            print("200 successfull movements {}".format(i_episode))
        # Choose Action
        old_observation = observation #old_observation is used with reward from action to update Q table
        action = env.action_space.sample()
        
        random_threshold = randomActionRate - i_episode/episodes
        if random_threshold < 0.05:
            random_threshold = 0.05
            
        #choosing best Q table action if random value exceeds threshold, if not action is randomized (in line above)
        if(random.random() > random_threshold):
            bestReward = 0
            for possibleAction in [0, 1]:
                temp_state = []
                for i in observation:
                    temp_state.append(round(i*observation_round,0))
                temp_state.append(possibleAction)

                possibleReward = Q.get(tuple(temp_state[2:]),0.)
                if(possibleReward > bestReward):
                    action = possibleAction
                    bestReward = possibleReward

        # Update Q State
        observation, reward, done, info = env.step(action)
        
        temp_state = []
        for i in old_observation:
            temp_state.append(round(i*observation_round,0))
        temp_state.append(action)
        state = tuple(temp_state[2:]) #only using 2 of the four observations: (pole angle and pole velocity)
        
        #seeing whether going left or right with current obesrvation leads to a higher expected reward
        temp_state2 = []
        temp_state3 = []
        for i in observation:
            temp_state2.append(round(i*observation_round,0))
            temp_state3.append(round(i*observation_round,0))
        temp_state2.append(0)
        temp_state3.append(1)
        
        reward_0 = Q.get(tuple(temp_state2[2:]),0.)
        reward_1 = Q.get(tuple(temp_state3[2:]),0.)
        next_best_reward = max(reward_0,reward_1)
            
        if done:
            #negative rewards for states that lead to failure
            if t < 199:
                Q[state] = Q.get(state, 0) + learningRate * -1.
            if i_episode % 10 == 0:
                print("Episode finished after {} timesteps with {} reward".format(t+1, episodeReward),random_threshold)
            break
        
        #update the Q state
        Q[state] = Q.get(state, 0) + learningRate * ( reward + next_best_reward - Q.get(state, 0.) ) 

        episodeReward += reward
    
    totalReward += episodeReward

print("Total Reward: {}".format(totalReward), totalReward/episodes)

[2017-07-10 10:35:51,961] Making new env: CartPole-v0


Episode finished after 10 timesteps with 9.0 reward 0.99
Episode finished after 21 timesteps with 20.0 reward 0.9899
Episode finished after 18 timesteps with 17.0 reward 0.9898
Episode finished after 14 timesteps with 13.0 reward 0.9897
Episode finished after 34 timesteps with 33.0 reward 0.9896
Episode finished after 10 timesteps with 9.0 reward 0.9895
Episode finished after 13 timesteps with 12.0 reward 0.9894
Episode finished after 21 timesteps with 20.0 reward 0.9893
Episode finished after 20 timesteps with 19.0 reward 0.9892
Episode finished after 28 timesteps with 27.0 reward 0.9891
Episode finished after 24 timesteps with 23.0 reward 0.989
Episode finished after 31 timesteps with 30.0 reward 0.9889
Episode finished after 28 timesteps with 27.0 reward 0.9888
Episode finished after 33 timesteps with 32.0 reward 0.9887
Episode finished after 36 timesteps with 35.0 reward 0.9886
Episode finished after 44 timesteps with 43.0 reward 0.9885
Episode finished after 10 timesteps with 9.0 

Episode finished after 13 timesteps with 12.0 reward 0.9359999999999999
Episode finished after 42 timesteps with 41.0 reward 0.9359
Episode finished after 10 timesteps with 9.0 reward 0.9358
Episode finished after 12 timesteps with 11.0 reward 0.9357
Episode finished after 18 timesteps with 17.0 reward 0.9356
Episode finished after 23 timesteps with 22.0 reward 0.9355
Episode finished after 22 timesteps with 21.0 reward 0.9354
Episode finished after 11 timesteps with 10.0 reward 0.9353
Episode finished after 35 timesteps with 34.0 reward 0.9352
Episode finished after 14 timesteps with 13.0 reward 0.9351
Episode finished after 11 timesteps with 10.0 reward 0.9349999999999999
Episode finished after 22 timesteps with 21.0 reward 0.9349
Episode finished after 14 timesteps with 13.0 reward 0.9348
Episode finished after 42 timesteps with 41.0 reward 0.9347
Episode finished after 40 timesteps with 39.0 reward 0.9346
Episode finished after 18 timesteps with 17.0 reward 0.9345
Episode finished 

Episode finished after 18 timesteps with 17.0 reward 0.9097999999999999
Episode finished after 26 timesteps with 25.0 reward 0.9097
Episode finished after 12 timesteps with 11.0 reward 0.9096
Episode finished after 32 timesteps with 31.0 reward 0.9095
Episode finished after 13 timesteps with 12.0 reward 0.9094
Episode finished after 13 timesteps with 12.0 reward 0.9093
Episode finished after 34 timesteps with 33.0 reward 0.9092
Episode finished after 19 timesteps with 18.0 reward 0.9091
Episode finished after 28 timesteps with 27.0 reward 0.909
Episode finished after 48 timesteps with 47.0 reward 0.9089
Episode finished after 19 timesteps with 18.0 reward 0.9088
Episode finished after 27 timesteps with 26.0 reward 0.9087
Episode finished after 11 timesteps with 10.0 reward 0.9086
Episode finished after 36 timesteps with 35.0 reward 0.9085
Episode finished after 48 timesteps with 47.0 reward 0.9084
Episode finished after 23 timesteps with 22.0 reward 0.9083
Episode finished after 11 tim

Episode finished after 10 timesteps with 9.0 reward 0.8747
Episode finished after 13 timesteps with 12.0 reward 0.8746
Episode finished after 31 timesteps with 30.0 reward 0.8744999999999999
Episode finished after 38 timesteps with 37.0 reward 0.8744
Episode finished after 28 timesteps with 27.0 reward 0.8743
Episode finished after 20 timesteps with 19.0 reward 0.8742
Episode finished after 11 timesteps with 10.0 reward 0.8741
Episode finished after 18 timesteps with 17.0 reward 0.874
Episode finished after 31 timesteps with 30.0 reward 0.8739
Episode finished after 15 timesteps with 14.0 reward 0.8738
Episode finished after 29 timesteps with 28.0 reward 0.8737
Episode finished after 29 timesteps with 28.0 reward 0.8735999999999999
Episode finished after 51 timesteps with 50.0 reward 0.8734999999999999
Episode finished after 21 timesteps with 20.0 reward 0.8734
Episode finished after 20 timesteps with 19.0 reward 0.8733
Episode finished after 37 timesteps with 36.0 reward 0.8732
Episod

Episode finished after 12 timesteps with 11.0 reward 0.8375
Episode finished after 12 timesteps with 11.0 reward 0.8373999999999999
Episode finished after 15 timesteps with 14.0 reward 0.8372999999999999
Episode finished after 20 timesteps with 19.0 reward 0.8371999999999999
Episode finished after 26 timesteps with 25.0 reward 0.8371
Episode finished after 25 timesteps with 24.0 reward 0.837
Episode finished after 40 timesteps with 39.0 reward 0.8369
Episode finished after 12 timesteps with 11.0 reward 0.8368
Episode finished after 38 timesteps with 37.0 reward 0.8367
Episode finished after 21 timesteps with 20.0 reward 0.8366
Episode finished after 17 timesteps with 16.0 reward 0.8365
Episode finished after 26 timesteps with 25.0 reward 0.8364
Episode finished after 16 timesteps with 15.0 reward 0.8363
Episode finished after 21 timesteps with 20.0 reward 0.8362
Episode finished after 25 timesteps with 24.0 reward 0.8361
Episode finished after 59 timesteps with 58.0 reward 0.836
Episod

Episode finished after 46 timesteps with 45.0 reward 0.8092
Episode finished after 14 timesteps with 13.0 reward 0.8090999999999999
Episode finished after 26 timesteps with 25.0 reward 0.8089999999999999
Episode finished after 14 timesteps with 13.0 reward 0.8089
Episode finished after 20 timesteps with 19.0 reward 0.8088
Episode finished after 48 timesteps with 47.0 reward 0.8087
Episode finished after 34 timesteps with 33.0 reward 0.8086
Episode finished after 12 timesteps with 11.0 reward 0.8085
Episode finished after 45 timesteps with 44.0 reward 0.8084
Episode finished after 42 timesteps with 41.0 reward 0.8083
Episode finished after 17 timesteps with 16.0 reward 0.8082
Episode finished after 30 timesteps with 29.0 reward 0.8081
Episode finished after 39 timesteps with 38.0 reward 0.808
Episode finished after 11 timesteps with 10.0 reward 0.8079
Episode finished after 27 timesteps with 26.0 reward 0.8078
Episode finished after 24 timesteps with 23.0 reward 0.8077
Episode finished 

Episode finished after 20 timesteps with 19.0 reward 0.7794
Episode finished after 20 timesteps with 19.0 reward 0.7793
Episode finished after 17 timesteps with 16.0 reward 0.7792
Episode finished after 84 timesteps with 83.0 reward 0.7791
Episode finished after 37 timesteps with 36.0 reward 0.779
Episode finished after 51 timesteps with 50.0 reward 0.7788999999999999
Episode finished after 31 timesteps with 30.0 reward 0.7787999999999999
Episode finished after 38 timesteps with 37.0 reward 0.7787
Episode finished after 28 timesteps with 27.0 reward 0.7786
Episode finished after 18 timesteps with 17.0 reward 0.7785
Episode finished after 28 timesteps with 27.0 reward 0.7784
Episode finished after 18 timesteps with 17.0 reward 0.7783
Episode finished after 17 timesteps with 16.0 reward 0.7782
Episode finished after 39 timesteps with 38.0 reward 0.7781
Episode finished after 13 timesteps with 12.0 reward 0.778
Episode finished after 13 timesteps with 12.0 reward 0.7779
Episode finished a

Episode finished after 22 timesteps with 21.0 reward 0.7365999999999999
Episode finished after 13 timesteps with 12.0 reward 0.7364999999999999
Episode finished after 27 timesteps with 26.0 reward 0.7363999999999999
Episode finished after 27 timesteps with 26.0 reward 0.7363
Episode finished after 47 timesteps with 46.0 reward 0.7362
Episode finished after 45 timesteps with 44.0 reward 0.7361
Episode finished after 26 timesteps with 25.0 reward 0.736
Episode finished after 27 timesteps with 26.0 reward 0.7359
Episode finished after 19 timesteps with 18.0 reward 0.7358
Episode finished after 54 timesteps with 53.0 reward 0.7357
Episode finished after 15 timesteps with 14.0 reward 0.7356
Episode finished after 16 timesteps with 15.0 reward 0.7355
Episode finished after 38 timesteps with 37.0 reward 0.7354
Episode finished after 19 timesteps with 18.0 reward 0.7353000000000001
Episode finished after 13 timesteps with 12.0 reward 0.7352
Episode finished after 16 timesteps with 15.0 reward 

Episode finished after 35 timesteps with 34.0 reward 0.6875
Episode finished after 32 timesteps with 31.0 reward 0.6874
Episode finished after 27 timesteps with 26.0 reward 0.6873
Episode finished after 18 timesteps with 17.0 reward 0.6872
Episode finished after 32 timesteps with 31.0 reward 0.6871
Episode finished after 51 timesteps with 50.0 reward 0.687
Episode finished after 63 timesteps with 62.0 reward 0.6869000000000001
Episode finished after 35 timesteps with 34.0 reward 0.6868
Episode finished after 17 timesteps with 16.0 reward 0.6867
Episode finished after 25 timesteps with 24.0 reward 0.6866
Episode finished after 40 timesteps with 39.0 reward 0.6865
Episode finished after 50 timesteps with 49.0 reward 0.6864
Episode finished after 32 timesteps with 31.0 reward 0.6862999999999999
Episode finished after 20 timesteps with 19.0 reward 0.6861999999999999
Episode finished after 36 timesteps with 35.0 reward 0.6860999999999999
Episode finished after 12 timesteps with 11.0 reward 

Episode finished after 24 timesteps with 23.0 reward 0.6600999999999999
Episode finished after 13 timesteps with 12.0 reward 0.6599999999999999
Episode finished after 21 timesteps with 20.0 reward 0.6598999999999999
Episode finished after 19 timesteps with 18.0 reward 0.6597999999999999
Episode finished after 26 timesteps with 25.0 reward 0.6597
Episode finished after 27 timesteps with 26.0 reward 0.6596
Episode finished after 32 timesteps with 31.0 reward 0.6595
Episode finished after 22 timesteps with 21.0 reward 0.6594
Episode finished after 97 timesteps with 96.0 reward 0.6593
Episode finished after 22 timesteps with 21.0 reward 0.6592
Episode finished after 51 timesteps with 50.0 reward 0.6591
Episode finished after 53 timesteps with 52.0 reward 0.659
Episode finished after 56 timesteps with 55.0 reward 0.6589
Episode finished after 64 timesteps with 63.0 reward 0.6588
Episode finished after 40 timesteps with 39.0 reward 0.6587000000000001
Episode finished after 18 timesteps with 

Episode finished after 58 timesteps with 57.0 reward 0.6123000000000001
Episode finished after 26 timesteps with 25.0 reward 0.6122
Episode finished after 48 timesteps with 47.0 reward 0.6121
Episode finished after 42 timesteps with 41.0 reward 0.612
Episode finished after 32 timesteps with 31.0 reward 0.6119
Episode finished after 33 timesteps with 32.0 reward 0.6118
Episode finished after 9 timesteps with 8.0 reward 0.6116999999999999
Episode finished after 84 timesteps with 83.0 reward 0.6115999999999999
Episode finished after 35 timesteps with 34.0 reward 0.6114999999999999
Episode finished after 45 timesteps with 44.0 reward 0.6113999999999999
Episode finished after 18 timesteps with 17.0 reward 0.6113
Episode finished after 57 timesteps with 56.0 reward 0.6112
Episode finished after 26 timesteps with 25.0 reward 0.6111
Episode finished after 17 timesteps with 16.0 reward 0.611
Episode finished after 41 timesteps with 40.0 reward 0.6109
Episode finished after 29 timesteps with 28.

Episode finished after 35 timesteps with 34.0 reward 0.5859
Episode finished after 42 timesteps with 41.0 reward 0.5858
Episode finished after 17 timesteps with 16.0 reward 0.5857
Episode finished after 44 timesteps with 43.0 reward 0.5856
Episode finished after 33 timesteps with 32.0 reward 0.5854999999999999
Episode finished after 30 timesteps with 29.0 reward 0.5853999999999999
Episode finished after 43 timesteps with 42.0 reward 0.5852999999999999
Episode finished after 66 timesteps with 65.0 reward 0.5851999999999999
Episode finished after 53 timesteps with 52.0 reward 0.5851
Episode finished after 18 timesteps with 17.0 reward 0.585
Episode finished after 25 timesteps with 24.0 reward 0.5849
Episode finished after 38 timesteps with 37.0 reward 0.5848
Episode finished after 79 timesteps with 78.0 reward 0.5847
Episode finished after 15 timesteps with 14.0 reward 0.5846
Episode finished after 24 timesteps with 23.0 reward 0.5845
Episode finished after 32 timesteps with 31.0 reward 

Episode finished after 98 timesteps with 97.0 reward 0.5426
Episode finished after 33 timesteps with 32.0 reward 0.5425
Episode finished after 28 timesteps with 27.0 reward 0.5424
Episode finished after 36 timesteps with 35.0 reward 0.5423
Episode finished after 24 timesteps with 23.0 reward 0.5422
Episode finished after 107 timesteps with 106.0 reward 0.5421
Episode finished after 58 timesteps with 57.0 reward 0.542
Episode finished after 60 timesteps with 59.0 reward 0.5419
Episode finished after 64 timesteps with 63.0 reward 0.5418000000000001
Episode finished after 107 timesteps with 106.0 reward 0.5417000000000001
Episode finished after 12 timesteps with 11.0 reward 0.5416
Episode finished after 30 timesteps with 29.0 reward 0.5415
Episode finished after 122 timesteps with 121.0 reward 0.5414
Episode finished after 63 timesteps with 62.0 reward 0.5413
200 successfull movements 44875
Episode finished after 86 timesteps with 85.0 reward 0.5412
Episode finished after 27 timesteps wit

Episode finished after 53 timesteps with 52.0 reward 0.514
Episode finished after 76 timesteps with 75.0 reward 0.5139
Episode finished after 50 timesteps with 49.0 reward 0.5138
Episode finished after 69 timesteps with 68.0 reward 0.5137
Episode finished after 59 timesteps with 58.0 reward 0.5136000000000001
Episode finished after 42 timesteps with 41.0 reward 0.5135000000000001
Episode finished after 42 timesteps with 41.0 reward 0.5134
Episode finished after 21 timesteps with 20.0 reward 0.5133
Episode finished after 40 timesteps with 39.0 reward 0.5132
200 successfull movements 47686
Episode finished after 21 timesteps with 20.0 reward 0.5131
Episode finished after 44 timesteps with 43.0 reward 0.513
Episode finished after 43 timesteps with 42.0 reward 0.5128999999999999
Episode finished after 35 timesteps with 34.0 reward 0.5127999999999999
Episode finished after 49 timesteps with 48.0 reward 0.5126999999999999
Episode finished after 43 timesteps with 42.0 reward 0.5126
Episode fi

Episode finished after 44 timesteps with 43.0 reward 0.48950000000000005
Episode finished after 31 timesteps with 30.0 reward 0.48939999999999995
200 successfull movements 50064
Episode finished after 56 timesteps with 55.0 reward 0.48929999999999996
Episode finished after 51 timesteps with 50.0 reward 0.48919999999999997
Episode finished after 34 timesteps with 33.0 reward 0.4891
Episode finished after 30 timesteps with 29.0 reward 0.489
200 successfull movements 50102
Episode finished after 15 timesteps with 14.0 reward 0.4889
Episode finished after 18 timesteps with 17.0 reward 0.4888
Episode finished after 15 timesteps with 14.0 reward 0.4887
Episode finished after 37 timesteps with 36.0 reward 0.48860000000000003
Episode finished after 74 timesteps with 73.0 reward 0.48850000000000005
Episode finished after 74 timesteps with 73.0 reward 0.48839999999999995
Episode finished after 67 timesteps with 66.0 reward 0.48829999999999996
Episode finished after 16 timesteps with 15.0 reward 

Episode finished after 49 timesteps with 48.0 reward 0.4647
Episode finished after 40 timesteps with 39.0 reward 0.4646
200 successfull movements 52547
Episode finished after 46 timesteps with 45.0 reward 0.4645
200 successfull movements 52555
200 successfull movements 52559
Episode finished after 78 timesteps with 77.0 reward 0.46440000000000003
200 successfull movements 52566
Episode finished after 25 timesteps with 24.0 reward 0.46430000000000005
Episode finished after 24 timesteps with 23.0 reward 0.46419999999999995
Episode finished after 92 timesteps with 91.0 reward 0.46409999999999996
Episode finished after 24 timesteps with 23.0 reward 0.46399999999999997
200 successfull movements 52606
Episode finished after 112 timesteps with 111.0 reward 0.4639
Episode finished after 39 timesteps with 38.0 reward 0.4638
Episode finished after 27 timesteps with 26.0 reward 0.4637
200 successfull movements 52639
Episode finished after 40 timesteps with 39.0 reward 0.4636
Episode finished afte

Episode finished after 42 timesteps with 41.0 reward 0.4394
200 successfull movements 55070
Episode finished after 200 timesteps with 199.0 reward 0.4393
200 successfull movements 55072
Episode finished after 33 timesteps with 32.0 reward 0.43920000000000003
Episode finished after 26 timesteps with 25.0 reward 0.43910000000000005
Episode finished after 48 timesteps with 47.0 reward 0.43899999999999995
Episode finished after 55 timesteps with 54.0 reward 0.43889999999999996
Episode finished after 93 timesteps with 92.0 reward 0.43879999999999997
200 successfull movements 55128
Episode finished after 61 timesteps with 60.0 reward 0.4387
Episode finished after 94 timesteps with 93.0 reward 0.4386
200 successfull movements 55150
Episode finished after 200 timesteps with 199.0 reward 0.4385
200 successfull movements 55155
200 successfull movements 55158
200 successfull movements 55160
Episode finished after 200 timesteps with 199.0 reward 0.4384
Episode finished after 43 timesteps with 42.0

Episode finished after 131 timesteps with 130.0 reward 0.4123
200 successfull movements 57773
200 successfull movements 57780
Episode finished after 200 timesteps with 199.0 reward 0.4122
200 successfull movements 57782
200 successfull movements 57786
Episode finished after 115 timesteps with 114.0 reward 0.4121
200 successfull movements 57791
200 successfull movements 57794
Episode finished after 57 timesteps with 56.0 reward 0.41200000000000003
200 successfull movements 57802
200 successfull movements 57803
200 successfull movements 57808
Episode finished after 64 timesteps with 63.0 reward 0.41190000000000004
Episode finished after 48 timesteps with 47.0 reward 0.41179999999999994
Episode finished after 29 timesteps with 28.0 reward 0.41169999999999995
200 successfull movements 57831
Episode finished after 193 timesteps with 192.0 reward 0.41159999999999997
200 successfull movements 57843
200 successfull movements 57845
Episode finished after 23 timesteps with 22.0 reward 0.4115
200

Episode finished after 98 timesteps with 97.0 reward 0.39690000000000003
200 successfull movements 59315
200 successfull movements 59319
Episode finished after 15 timesteps with 14.0 reward 0.39680000000000004
200 successfull movements 59323
200 successfull movements 59324
Episode finished after 140 timesteps with 139.0 reward 0.39669999999999994
Episode finished after 16 timesteps with 15.0 reward 0.39659999999999995
200 successfull movements 59341
200 successfull movements 59350
Episode finished after 200 timesteps with 199.0 reward 0.39649999999999996
200 successfull movements 59355
200 successfull movements 59360
Episode finished after 200 timesteps with 199.0 reward 0.3964
Episode finished after 62 timesteps with 61.0 reward 0.3963
200 successfull movements 59373
200 successfull movements 59377
200 successfull movements 59378
Episode finished after 36 timesteps with 35.0 reward 0.3962
Episode finished after 54 timesteps with 53.0 reward 0.3961
200 successfull movements 59391
200 s

Episode finished after 63 timesteps with 62.0 reward 0.38349999999999995
200 successfull movements 60652
200 successfull movements 60658
Episode finished after 178 timesteps with 177.0 reward 0.38339999999999996
200 successfull movements 60666
Episode finished after 23 timesteps with 22.0 reward 0.3833
200 successfull movements 60672
200 successfull movements 60675
Episode finished after 21 timesteps with 20.0 reward 0.3832
200 successfull movements 60682
200 successfull movements 60684
Episode finished after 39 timesteps with 38.0 reward 0.3831
200 successfull movements 60694
200 successfull movements 60698
Episode finished after 97 timesteps with 96.0 reward 0.383
200 successfull movements 60705
200 successfull movements 60710
Episode finished after 200 timesteps with 199.0 reward 0.3829
Episode finished after 40 timesteps with 39.0 reward 0.38280000000000003
200 successfull movements 60726
200 successfull movements 60730
Episode finished after 200 timesteps with 199.0 reward 0.38270

200 successfull movements 61940
Episode finished after 200 timesteps with 199.0 reward 0.37060000000000004
200 successfull movements 61944
200 successfull movements 61946
Episode finished after 48 timesteps with 47.0 reward 0.37049999999999994
200 successfull movements 61953
200 successfull movements 61954
200 successfull movements 61955
200 successfull movements 61957
200 successfull movements 61960
Episode finished after 200 timesteps with 199.0 reward 0.37039999999999995
200 successfull movements 61961
200 successfull movements 61965
200 successfull movements 61968
Episode finished after 72 timesteps with 71.0 reward 0.37029999999999996
200 successfull movements 61980
Episode finished after 200 timesteps with 199.0 reward 0.3702
200 successfull movements 61981
200 successfull movements 61988
200 successfull movements 61989
Episode finished after 105 timesteps with 104.0 reward 0.3701
200 successfull movements 61996
200 successfull movements 61998
Episode finished after 180 timesteps

Episode finished after 200 timesteps with 199.0 reward 0.358
200 successfull movements 63203
200 successfull movements 63206
Episode finished after 157 timesteps with 156.0 reward 0.3579
200 successfull movements 63215
200 successfull movements 63217
Episode finished after 90 timesteps with 89.0 reward 0.3578
200 successfull movements 63223
200 successfull movements 63224
200 successfull movements 63225
200 successfull movements 63229
Episode finished after 92 timesteps with 91.0 reward 0.3577
200 successfull movements 63233
200 successfull movements 63235
200 successfull movements 63237
Episode finished after 170 timesteps with 169.0 reward 0.35760000000000003
200 successfull movements 63247
200 successfull movements 63248
Episode finished after 45 timesteps with 44.0 reward 0.35750000000000004
200 successfull movements 63253
200 successfull movements 63254
200 successfull movements 63257
200 successfull movements 63260
Episode finished after 200 timesteps with 199.0 reward 0.35739999

Episode finished after 200 timesteps with 199.0 reward 0.3415
200 successfull movements 64852
200 successfull movements 64859
Episode finished after 100 timesteps with 99.0 reward 0.34140000000000004
200 successfull movements 64861
200 successfull movements 64864
200 successfull movements 64866
200 successfull movements 64867
200 successfull movements 64870
Episode finished after 200 timesteps with 199.0 reward 0.34129999999999994
200 successfull movements 64873
200 successfull movements 64874
200 successfull movements 64875
Episode finished after 147 timesteps with 146.0 reward 0.34119999999999995
200 successfull movements 64884
200 successfull movements 64887
200 successfull movements 64888
200 successfull movements 64890
Episode finished after 200 timesteps with 199.0 reward 0.34109999999999996
200 successfull movements 64891
200 successfull movements 64893
200 successfull movements 64897
200 successfull movements 64899
200 successfull movements 64900
Episode finished after 200 time

200 successfull movements 65756
200 successfull movements 65758
200 successfull movements 65760
Episode finished after 200 timesteps with 199.0 reward 0.33240000000000003
200 successfull movements 65762
200 successfull movements 65764
200 successfull movements 65766
200 successfull movements 65767
200 successfull movements 65768
200 successfull movements 65770
Episode finished after 200 timesteps with 199.0 reward 0.33230000000000004
200 successfull movements 65772
200 successfull movements 65773
200 successfull movements 65775
200 successfull movements 65777
Episode finished after 37 timesteps with 36.0 reward 0.33219999999999994
200 successfull movements 65781
200 successfull movements 65784
200 successfull movements 65785
200 successfull movements 65789
Episode finished after 56 timesteps with 55.0 reward 0.33209999999999995
200 successfull movements 65796
200 successfull movements 65797
Episode finished after 89 timesteps with 88.0 reward 0.33199999999999996
200 successfull movemen

Episode finished after 33 timesteps with 32.0 reward 0.32809999999999995
200 successfull movements 66194
200 successfull movements 66195
200 successfull movements 66197
Episode finished after 53 timesteps with 52.0 reward 0.32799999999999996
200 successfull movements 66205
200 successfull movements 66207
Episode finished after 133 timesteps with 132.0 reward 0.32789999999999997
200 successfull movements 66211
200 successfull movements 66214
200 successfull movements 66215
200 successfull movements 66217
200 successfull movements 66218
200 successfull movements 66219
Episode finished after 106 timesteps with 105.0 reward 0.3278
200 successfull movements 66221
200 successfull movements 66223
200 successfull movements 66224
200 successfull movements 66226
200 successfull movements 66227
200 successfull movements 66228
200 successfull movements 66230
Episode finished after 200 timesteps with 199.0 reward 0.3277
200 successfull movements 66232
200 successfull movements 66233
200 successfull

200 successfull movements 66845
200 successfull movements 66848
Episode finished after 25 timesteps with 24.0 reward 0.3215
200 successfull movements 66851
200 successfull movements 66853
200 successfull movements 66854
200 successfull movements 66855
200 successfull movements 66860
Episode finished after 200 timesteps with 199.0 reward 0.3214
200 successfull movements 66861
200 successfull movements 66863
200 successfull movements 66866
Episode finished after 161 timesteps with 160.0 reward 0.32130000000000003
200 successfull movements 66871
200 successfull movements 66876
200 successfull movements 66878
200 successfull movements 66879
Episode finished after 73 timesteps with 72.0 reward 0.32120000000000004
200 successfull movements 66882
200 successfull movements 66884
200 successfull movements 66887
200 successfull movements 66888
Episode finished after 73 timesteps with 72.0 reward 0.32109999999999994
200 successfull movements 66892
200 successfull movements 66897
200 successfull m

200 successfull movements 68064
200 successfull movements 68065
200 successfull movements 68067
Episode finished after 147 timesteps with 146.0 reward 0.3093
200 successfull movements 68073
200 successfull movements 68075
200 successfull movements 68078
200 successfull movements 68080
Episode finished after 200 timesteps with 199.0 reward 0.30920000000000003
200 successfull movements 68082
200 successfull movements 68083
200 successfull movements 68084
200 successfull movements 68085
200 successfull movements 68086
200 successfull movements 68088
200 successfull movements 68090
Episode finished after 200 timesteps with 199.0 reward 0.30910000000000004
200 successfull movements 68091
200 successfull movements 68092
200 successfull movements 68094
200 successfull movements 68097
Episode finished after 106 timesteps with 105.0 reward 0.30899999999999994
200 successfull movements 68104
200 successfull movements 68105
Episode finished after 155 timesteps with 154.0 reward 0.3088999999999999

200 successfull movements 68795
200 successfull movements 68796
200 successfull movements 68800
Episode finished after 200 timesteps with 199.0 reward 0.30200000000000005
200 successfull movements 68801
200 successfull movements 68802
200 successfull movements 68803
200 successfull movements 68804
200 successfull movements 68806
200 successfull movements 68807
200 successfull movements 68809
Episode finished after 22 timesteps with 21.0 reward 0.30189999999999995
200 successfull movements 68812
200 successfull movements 68813
200 successfull movements 68814
200 successfull movements 68819
Episode finished after 99 timesteps with 98.0 reward 0.30179999999999996
200 successfull movements 68826
200 successfull movements 68827
200 successfull movements 68828
200 successfull movements 68829
Episode finished after 86 timesteps with 85.0 reward 0.30169999999999997
200 successfull movements 68835
200 successfull movements 68837
200 successfull movements 68838
200 successfull movements 68839
20

200 successfull movements 69550
Episode finished after 200 timesteps with 199.0 reward 0.2945
200 successfull movements 69551
200 successfull movements 69552
200 successfull movements 69553
200 successfull movements 69555
200 successfull movements 69556
200 successfull movements 69557
200 successfull movements 69559
Episode finished after 107 timesteps with 106.0 reward 0.2944
200 successfull movements 69562
200 successfull movements 69566
200 successfull movements 69567
200 successfull movements 69569
200 successfull movements 69570
Episode finished after 200 timesteps with 199.0 reward 0.2943
200 successfull movements 69575
200 successfull movements 69578
200 successfull movements 69580
Episode finished after 200 timesteps with 199.0 reward 0.2942
200 successfull movements 69583
200 successfull movements 69585
200 successfull movements 69589
200 successfull movements 69590
Episode finished after 200 timesteps with 199.0 reward 0.29410000000000003
200 successfull movements 69591
200 s

Episode finished after 200 timesteps with 199.0 reward 0.2875
200 successfull movements 70252
200 successfull movements 70253
200 successfull movements 70257
200 successfull movements 70258
200 successfull movements 70260
Episode finished after 200 timesteps with 199.0 reward 0.2874
200 successfull movements 70262
200 successfull movements 70263
200 successfull movements 70264
200 successfull movements 70268
200 successfull movements 70269
200 successfull movements 70270
Episode finished after 200 timesteps with 199.0 reward 0.2873
200 successfull movements 70271
200 successfull movements 70274
200 successfull movements 70275
200 successfull movements 70276
200 successfull movements 70277
200 successfull movements 70278
200 successfull movements 70279
200 successfull movements 70280
Episode finished after 200 timesteps with 199.0 reward 0.2872
200 successfull movements 70284
200 successfull movements 70286
200 successfull movements 70287
200 successfull movements 70288
200 successfull 

200 successfull movements 71033
200 successfull movements 71038
200 successfull movements 71039
200 successfull movements 71040
Episode finished after 200 timesteps with 199.0 reward 0.27959999999999996
200 successfull movements 71041
200 successfull movements 71043
200 successfull movements 71044
200 successfull movements 71045
200 successfull movements 71046
200 successfull movements 71047
200 successfull movements 71048
Episode finished after 49 timesteps with 48.0 reward 0.27949999999999997
200 successfull movements 71054
200 successfull movements 71055
200 successfull movements 71056
200 successfull movements 71057
200 successfull movements 71060
Episode finished after 200 timesteps with 199.0 reward 0.2794
200 successfull movements 71062
200 successfull movements 71063
200 successfull movements 71064
200 successfull movements 71065
200 successfull movements 71069
Episode finished after 153 timesteps with 152.0 reward 0.2793
200 successfull movements 71071
200 successfull movement

Episode finished after 200 timesteps with 199.0 reward 0.2702
200 successfull movements 71982
200 successfull movements 71984
200 successfull movements 71985
200 successfull movements 71986
200 successfull movements 71987
200 successfull movements 71988
200 successfull movements 71989
Episode finished after 140 timesteps with 139.0 reward 0.2701
200 successfull movements 71992
200 successfull movements 71993
200 successfull movements 71994
200 successfull movements 71996
200 successfull movements 71998
Episode finished after 40 timesteps with 39.0 reward 0.27
200 successfull movements 72001
200 successfull movements 72002
200 successfull movements 72006
200 successfull movements 72007
200 successfull movements 72009
Episode finished after 157 timesteps with 156.0 reward 0.26990000000000003
200 successfull movements 72011
200 successfull movements 72012
200 successfull movements 72013
200 successfull movements 72014
200 successfull movements 72015
200 successfull movements 72018
200 suc

200 successfull movements 72556
200 successfull movements 72557
Episode finished after 113 timesteps with 112.0 reward 0.26439999999999997
200 successfull movements 72561
200 successfull movements 72562
200 successfull movements 72564
200 successfull movements 72566
200 successfull movements 72568
200 successfull movements 72569
Episode finished after 86 timesteps with 85.0 reward 0.2643
200 successfull movements 72571
200 successfull movements 72574
200 successfull movements 72575
200 successfull movements 72576
200 successfull movements 72577
200 successfull movements 72579
200 successfull movements 72580
Episode finished after 200 timesteps with 199.0 reward 0.2642
200 successfull movements 72581
200 successfull movements 72582
200 successfull movements 72584
200 successfull movements 72585
200 successfull movements 72587
200 successfull movements 72588
200 successfull movements 72589
200 successfull movements 72590
Episode finished after 200 timesteps with 199.0 reward 0.2641
200 s

200 successfull movements 73096
200 successfull movements 73097
200 successfull movements 73098
200 successfull movements 73099
200 successfull movements 73100
Episode finished after 200 timesteps with 199.0 reward 0.259
200 successfull movements 73101
200 successfull movements 73103
200 successfull movements 73105
200 successfull movements 73107
200 successfull movements 73108
200 successfull movements 73109
Episode finished after 196 timesteps with 195.0 reward 0.2589
200 successfull movements 73111
200 successfull movements 73112
200 successfull movements 73113
200 successfull movements 73114
200 successfull movements 73115
200 successfull movements 73118
200 successfull movements 73119
Episode finished after 182 timesteps with 181.0 reward 0.25880000000000003
200 successfull movements 73123
200 successfull movements 73124
200 successfull movements 73125
200 successfull movements 73126
200 successfull movements 73127
200 successfull movements 73128
Episode finished after 27 timestep

200 successfull movements 73674
200 successfull movements 73676
200 successfull movements 73677
200 successfull movements 73679
Episode finished after 83 timesteps with 82.0 reward 0.2532
200 successfull movements 73681
200 successfull movements 73682
200 successfull movements 73683
200 successfull movements 73686
200 successfull movements 73688
Episode finished after 96 timesteps with 95.0 reward 0.2531
200 successfull movements 73691
200 successfull movements 73692
200 successfull movements 73693
200 successfull movements 73694
200 successfull movements 73695
200 successfull movements 73696
200 successfull movements 73697
200 successfull movements 73698
200 successfull movements 73699
200 successfull movements 73700
Episode finished after 200 timesteps with 199.0 reward 0.253
200 successfull movements 73701
200 successfull movements 73702
200 successfull movements 73703
200 successfull movements 73705
200 successfull movements 73707
Episode finished after 89 timesteps with 88.0 rewar

200 successfull movements 74245
200 successfull movements 74246
200 successfull movements 74248
200 successfull movements 74250
Episode finished after 200 timesteps with 199.0 reward 0.24749999999999994
200 successfull movements 74251
200 successfull movements 74252
200 successfull movements 74253
200 successfull movements 74254
200 successfull movements 74255
200 successfull movements 74257
200 successfull movements 74258
200 successfull movements 74260
Episode finished after 200 timesteps with 199.0 reward 0.24739999999999995
200 successfull movements 74261
200 successfull movements 74262
200 successfull movements 74264
200 successfull movements 74265
200 successfull movements 74266
200 successfull movements 74267
200 successfull movements 74268
200 successfull movements 74269
200 successfull movements 74270
Episode finished after 200 timesteps with 199.0 reward 0.24729999999999996
200 successfull movements 74271
200 successfull movements 74272
200 successfull movements 74273
200 suc

200 successfull movements 74883
200 successfull movements 74884
200 successfull movements 74885
200 successfull movements 74886
200 successfull movements 74889
200 successfull movements 74890
Episode finished after 200 timesteps with 199.0 reward 0.24109999999999998
200 successfull movements 74891
200 successfull movements 74893
200 successfull movements 74894
200 successfull movements 74895
200 successfull movements 74898
200 successfull movements 74899
Episode finished after 185 timesteps with 184.0 reward 0.241
200 successfull movements 74901
200 successfull movements 74903
200 successfull movements 74904
200 successfull movements 74905
200 successfull movements 74907
200 successfull movements 74908
200 successfull movements 74909
200 successfull movements 74910
Episode finished after 200 timesteps with 199.0 reward 0.2409
200 successfull movements 74911
200 successfull movements 74913
200 successfull movements 74915
200 successfull movements 74916
200 successfull movements 74917
20

200 successfull movements 75390
Episode finished after 200 timesteps with 199.0 reward 0.23609999999999998
200 successfull movements 75391
200 successfull movements 75392
200 successfull movements 75394
200 successfull movements 75395
200 successfull movements 75396
200 successfull movements 75398
200 successfull movements 75399
Episode finished after 39 timesteps with 38.0 reward 0.236
200 successfull movements 75401
200 successfull movements 75402
200 successfull movements 75403
200 successfull movements 75404
200 successfull movements 75405
200 successfull movements 75407
200 successfull movements 75408
200 successfull movements 75410
Episode finished after 200 timesteps with 199.0 reward 0.2359
200 successfull movements 75412
200 successfull movements 75413
200 successfull movements 75414
200 successfull movements 75416
200 successfull movements 75417
200 successfull movements 75418
200 successfull movements 75419
200 successfull movements 75420
Episode finished after 200 timesteps

200 successfull movements 76248
200 successfull movements 76250
Episode finished after 200 timesteps with 199.0 reward 0.22750000000000004
200 successfull movements 76251
200 successfull movements 76252
200 successfull movements 76253
200 successfull movements 76254
200 successfull movements 76256
200 successfull movements 76257
200 successfull movements 76258
200 successfull movements 76259
200 successfull movements 76260
Episode finished after 200 timesteps with 199.0 reward 0.22740000000000005
200 successfull movements 76261
200 successfull movements 76262
200 successfull movements 76264
200 successfull movements 76265
200 successfull movements 76266
200 successfull movements 76267
200 successfull movements 76268
200 successfull movements 76270
Episode finished after 200 timesteps with 199.0 reward 0.22729999999999995
200 successfull movements 76271
200 successfull movements 76272
200 successfull movements 76274
200 successfull movements 76275
200 successfull movements 76276
200 suc

Episode finished after 63 timesteps with 62.0 reward 0.22129999999999994
200 successfull movements 76871
200 successfull movements 76872
200 successfull movements 76874
200 successfull movements 76875
200 successfull movements 76877
200 successfull movements 76878
200 successfull movements 76880
Episode finished after 200 timesteps with 199.0 reward 0.22119999999999995
200 successfull movements 76881
200 successfull movements 76883
200 successfull movements 76884
200 successfull movements 76885
200 successfull movements 76886
200 successfull movements 76887
200 successfull movements 76888
200 successfull movements 76890
Episode finished after 200 timesteps with 199.0 reward 0.22109999999999996
200 successfull movements 76891
200 successfull movements 76893
200 successfull movements 76894
200 successfull movements 76895
200 successfull movements 76896
200 successfull movements 76897
200 successfull movements 76899
200 successfull movements 76900
Episode finished after 200 timesteps with

Episode finished after 200 timesteps with 199.0 reward 0.21699999999999997
200 successfull movements 77301
200 successfull movements 77303
200 successfull movements 77304
200 successfull movements 77306
200 successfull movements 77307
200 successfull movements 77308
200 successfull movements 77309
Episode finished after 65 timesteps with 64.0 reward 0.21689999999999998
200 successfull movements 77311
200 successfull movements 77313
200 successfull movements 77314
200 successfull movements 77315
200 successfull movements 77317
200 successfull movements 77318
200 successfull movements 77320
Episode finished after 200 timesteps with 199.0 reward 0.2168
200 successfull movements 77323
200 successfull movements 77324
200 successfull movements 77326
200 successfull movements 77327
200 successfull movements 77329
200 successfull movements 77330
Episode finished after 200 timesteps with 199.0 reward 0.2167
200 successfull movements 77331
200 successfull movements 77332
200 successfull movement

200 successfull movements 77895
200 successfull movements 77896
200 successfull movements 77897
200 successfull movements 77898
200 successfull movements 77899
200 successfull movements 77900
Episode finished after 200 timesteps with 199.0 reward 0.21099999999999997
200 successfull movements 77901
200 successfull movements 77902
200 successfull movements 77903
200 successfull movements 77904
200 successfull movements 77905
200 successfull movements 77906
200 successfull movements 77907
200 successfull movements 77910
Episode finished after 200 timesteps with 199.0 reward 0.21089999999999998
200 successfull movements 77911
200 successfull movements 77912
200 successfull movements 77913
200 successfull movements 77914
200 successfull movements 77916
200 successfull movements 77917
200 successfull movements 77918
200 successfull movements 77919
200 successfull movements 77920
Episode finished after 200 timesteps with 199.0 reward 0.2108
200 successfull movements 77921
200 successfull move

200 successfull movements 78557
200 successfull movements 78558
200 successfull movements 78559
200 successfull movements 78560
Episode finished after 200 timesteps with 199.0 reward 0.20440000000000003
200 successfull movements 78561
200 successfull movements 78562
200 successfull movements 78563
200 successfull movements 78564
200 successfull movements 78565
200 successfull movements 78566
200 successfull movements 78567
200 successfull movements 78568
Episode finished after 42 timesteps with 41.0 reward 0.20430000000000004
200 successfull movements 78571
200 successfull movements 78572
200 successfull movements 78573
200 successfull movements 78574
200 successfull movements 78576
200 successfull movements 78577
200 successfull movements 78578
200 successfull movements 78579
200 successfull movements 78580
Episode finished after 200 timesteps with 199.0 reward 0.20419999999999994
200 successfull movements 78581
200 successfull movements 78582
200 successfull movements 78584
200 succe

200 successfull movements 78879
200 successfull movements 78880
Episode finished after 200 timesteps with 199.0 reward 0.20120000000000005
200 successfull movements 78881
200 successfull movements 78883
200 successfull movements 78884
200 successfull movements 78885
200 successfull movements 78886
200 successfull movements 78887
200 successfull movements 78888
200 successfull movements 78889
200 successfull movements 78890
Episode finished after 200 timesteps with 199.0 reward 0.20109999999999995
200 successfull movements 78891
200 successfull movements 78892
200 successfull movements 78893
200 successfull movements 78894
200 successfull movements 78895
200 successfull movements 78896
200 successfull movements 78897
200 successfull movements 78898
200 successfull movements 78899
200 successfull movements 78900
Episode finished after 200 timesteps with 199.0 reward 0.20099999999999996
200 successfull movements 78901
200 successfull movements 78902
200 successfull movements 78903
200 suc

200 successfull movements 79352
200 successfull movements 79353
200 successfull movements 79354
200 successfull movements 79355
200 successfull movements 79356
200 successfull movements 79357
200 successfull movements 79358
200 successfull movements 79359
200 successfull movements 79360
Episode finished after 200 timesteps with 199.0 reward 0.19640000000000002
200 successfull movements 79361
200 successfull movements 79362
200 successfull movements 79363
200 successfull movements 79364
200 successfull movements 79365
200 successfull movements 79366
200 successfull movements 79368
200 successfull movements 79369
200 successfull movements 79370
Episode finished after 200 timesteps with 199.0 reward 0.19630000000000003
200 successfull movements 79371
200 successfull movements 79372
200 successfull movements 79373
200 successfull movements 79374
200 successfull movements 79375
200 successfull movements 79376
200 successfull movements 79377
200 successfull movements 79378
200 successfull mo

200 successfull movements 79819
200 successfull movements 79820
Episode finished after 200 timesteps with 199.0 reward 0.19179999999999997
200 successfull movements 79821
200 successfull movements 79822
200 successfull movements 79823
200 successfull movements 79824
200 successfull movements 79825
200 successfull movements 79826
200 successfull movements 79827
200 successfull movements 79828
200 successfull movements 79829
200 successfull movements 79830
Episode finished after 200 timesteps with 199.0 reward 0.19169999999999998
200 successfull movements 79831
200 successfull movements 79832
200 successfull movements 79833
200 successfull movements 79834
200 successfull movements 79835
200 successfull movements 79837
200 successfull movements 79838
200 successfull movements 79839
Episode finished after 182 timesteps with 181.0 reward 0.1916
200 successfull movements 79841
200 successfull movements 79842
200 successfull movements 79843
200 successfull movements 79844
200 successfull move

200 successfull movements 80298
200 successfull movements 80299
200 successfull movements 80300
Episode finished after 200 timesteps with 199.0 reward 0.18699999999999994
200 successfull movements 80301
200 successfull movements 80302
200 successfull movements 80304
200 successfull movements 80305
200 successfull movements 80306
200 successfull movements 80307
200 successfull movements 80308
200 successfull movements 80309
200 successfull movements 80310
Episode finished after 200 timesteps with 199.0 reward 0.18689999999999996
200 successfull movements 80311
200 successfull movements 80312
200 successfull movements 80313
200 successfull movements 80314
200 successfull movements 80315
200 successfull movements 80317
200 successfull movements 80319
200 successfull movements 80320
Episode finished after 200 timesteps with 199.0 reward 0.18679999999999997
200 successfull movements 80321
200 successfull movements 80324
200 successfull movements 80325
200 successfull movements 80326
200 suc

200 successfull movements 80747
200 successfull movements 80748
200 successfull movements 80749
200 successfull movements 80750
Episode finished after 200 timesteps with 199.0 reward 0.1825
200 successfull movements 80751
200 successfull movements 80752
200 successfull movements 80753
200 successfull movements 80754
200 successfull movements 80755
200 successfull movements 80756
200 successfull movements 80757
200 successfull movements 80758
200 successfull movements 80759
200 successfull movements 80760
Episode finished after 200 timesteps with 199.0 reward 0.1824
200 successfull movements 80761
200 successfull movements 80762
200 successfull movements 80763
200 successfull movements 80764
200 successfull movements 80765
200 successfull movements 80766
200 successfull movements 80767
200 successfull movements 80768
200 successfull movements 80769
200 successfull movements 80770
Episode finished after 200 timesteps with 199.0 reward 0.18230000000000002
200 successfull movements 80771
2

200 successfull movements 81374
200 successfull movements 81375
200 successfull movements 81376
200 successfull movements 81377
200 successfull movements 81378
200 successfull movements 81379
200 successfull movements 81380
Episode finished after 200 timesteps with 199.0 reward 0.17620000000000002
200 successfull movements 81381
200 successfull movements 81382
200 successfull movements 81383
200 successfull movements 81384
200 successfull movements 81385
200 successfull movements 81386
200 successfull movements 81387
200 successfull movements 81388
200 successfull movements 81389
200 successfull movements 81390
Episode finished after 200 timesteps with 199.0 reward 0.17610000000000003
200 successfull movements 81391
200 successfull movements 81392
200 successfull movements 81393
200 successfull movements 81394
200 successfull movements 81395
200 successfull movements 81396
200 successfull movements 81397
200 successfull movements 81398
200 successfull movements 81399
200 successfull mo

200 successfull movements 81843
200 successfull movements 81844
200 successfull movements 81845
200 successfull movements 81846
200 successfull movements 81847
200 successfull movements 81848
200 successfull movements 81849
200 successfull movements 81850
Episode finished after 200 timesteps with 199.0 reward 0.17149999999999999
200 successfull movements 81851
200 successfull movements 81852
200 successfull movements 81853
200 successfull movements 81854
200 successfull movements 81855
200 successfull movements 81856
200 successfull movements 81857
200 successfull movements 81858
200 successfull movements 81859
200 successfull movements 81860
Episode finished after 200 timesteps with 199.0 reward 0.1714
200 successfull movements 81861
200 successfull movements 81862
200 successfull movements 81863
200 successfull movements 81864
200 successfull movements 81865
200 successfull movements 81866
200 successfull movements 81867
200 successfull movements 81868
200 successfull movements 81869

200 successfull movements 82337
200 successfull movements 82338
200 successfull movements 82339
200 successfull movements 82340
Episode finished after 200 timesteps with 199.0 reward 0.16659999999999997
200 successfull movements 82341
200 successfull movements 82342
200 successfull movements 82343
200 successfull movements 82344
200 successfull movements 82345
200 successfull movements 82346
200 successfull movements 82347
200 successfull movements 82348
200 successfull movements 82349
200 successfull movements 82350
Episode finished after 200 timesteps with 199.0 reward 0.16649999999999998
200 successfull movements 82351
200 successfull movements 82352
200 successfull movements 82353
200 successfull movements 82354
200 successfull movements 82355
200 successfull movements 82356
200 successfull movements 82359
200 successfull movements 82360
Episode finished after 200 timesteps with 199.0 reward 0.1664
200 successfull movements 82361
200 successfull movements 82362
200 successfull move

200 successfull movements 82899
200 successfull movements 82900
Episode finished after 200 timesteps with 199.0 reward 0.16100000000000003
200 successfull movements 82901
200 successfull movements 82902
200 successfull movements 82903
200 successfull movements 82904
200 successfull movements 82905
200 successfull movements 82907
200 successfull movements 82908
200 successfull movements 82909
200 successfull movements 82910
Episode finished after 200 timesteps with 199.0 reward 0.16090000000000004
200 successfull movements 82911
200 successfull movements 82912
200 successfull movements 82913
200 successfull movements 82914
200 successfull movements 82915
200 successfull movements 82916
200 successfull movements 82917
200 successfull movements 82918
200 successfull movements 82919
200 successfull movements 82920
Episode finished after 200 timesteps with 199.0 reward 0.16079999999999994
200 successfull movements 82921
200 successfull movements 82922
200 successfull movements 82923
200 suc

200 successfull movements 83575
200 successfull movements 83576
200 successfull movements 83577
200 successfull movements 83578
200 successfull movements 83579
200 successfull movements 83580
Episode finished after 200 timesteps with 199.0 reward 0.1542
200 successfull movements 83581
200 successfull movements 83582
200 successfull movements 83583
200 successfull movements 83584
200 successfull movements 83585
200 successfull movements 83586
200 successfull movements 83587
200 successfull movements 83588
200 successfull movements 83589
200 successfull movements 83590
Episode finished after 200 timesteps with 199.0 reward 0.15410000000000001
200 successfull movements 83591
200 successfull movements 83592
200 successfull movements 83593
200 successfull movements 83594
200 successfull movements 83595
200 successfull movements 83596
200 successfull movements 83597
200 successfull movements 83598
200 successfull movements 83599
200 successfull movements 83600
Episode finished after 200 time

200 successfull movements 84014
200 successfull movements 84016
200 successfull movements 84017
200 successfull movements 84018
200 successfull movements 84019
200 successfull movements 84020
Episode finished after 200 timesteps with 199.0 reward 0.14980000000000004
200 successfull movements 84021
200 successfull movements 84022
200 successfull movements 84023
200 successfull movements 84024
200 successfull movements 84025
200 successfull movements 84026
200 successfull movements 84027
200 successfull movements 84028
200 successfull movements 84029
200 successfull movements 84030
Episode finished after 200 timesteps with 199.0 reward 0.14969999999999994
200 successfull movements 84031
200 successfull movements 84032
200 successfull movements 84033
200 successfull movements 84034
200 successfull movements 84035
200 successfull movements 84036
200 successfull movements 84037
200 successfull movements 84038
200 successfull movements 84039
200 successfull movements 84040
Episode finished a

Episode finished after 200 timesteps with 199.0 reward 0.14500000000000002
200 successfull movements 84501
200 successfull movements 84502
200 successfull movements 84503
200 successfull movements 84504
200 successfull movements 84505
200 successfull movements 84506
200 successfull movements 84507
200 successfull movements 84508
200 successfull movements 84509
200 successfull movements 84510
Episode finished after 200 timesteps with 199.0 reward 0.14490000000000003
200 successfull movements 84511
200 successfull movements 84512
200 successfull movements 84513
200 successfull movements 84514
200 successfull movements 84515
200 successfull movements 84516
200 successfull movements 84518
200 successfull movements 84519
200 successfull movements 84520
Episode finished after 200 timesteps with 199.0 reward 0.14480000000000004
200 successfull movements 84521
200 successfull movements 84522
200 successfull movements 84523
200 successfull movements 84524
200 successfull movements 84525
200 suc

Episode finished after 200 timesteps with 199.0 reward 0.1401
200 successfull movements 84991
200 successfull movements 84992
200 successfull movements 84993
200 successfull movements 84994
200 successfull movements 84995
200 successfull movements 84996
200 successfull movements 84997
200 successfull movements 84998
200 successfull movements 84999
200 successfull movements 85000
Episode finished after 200 timesteps with 199.0 reward 0.14
200 successfull movements 85001
200 successfull movements 85002
200 successfull movements 85003
200 successfull movements 85004
200 successfull movements 85005
200 successfull movements 85006
200 successfull movements 85007
200 successfull movements 85008
200 successfull movements 85009
200 successfull movements 85010
Episode finished after 200 timesteps with 199.0 reward 0.13990000000000002
200 successfull movements 85011
200 successfull movements 85012
200 successfull movements 85013
200 successfull movements 85014
200 successfull movements 85015
200

200 successfull movements 85463
200 successfull movements 85464
200 successfull movements 85465
200 successfull movements 85466
200 successfull movements 85467
200 successfull movements 85468
200 successfull movements 85469
200 successfull movements 85470
Episode finished after 200 timesteps with 199.0 reward 0.13529999999999998
200 successfull movements 85471
200 successfull movements 85472
200 successfull movements 85473
200 successfull movements 85474
200 successfull movements 85475
200 successfull movements 85476
200 successfull movements 85477
200 successfull movements 85478
200 successfull movements 85479
200 successfull movements 85480
Episode finished after 200 timesteps with 199.0 reward 0.1352
200 successfull movements 85481
200 successfull movements 85482
200 successfull movements 85483
200 successfull movements 85484
200 successfull movements 85485
200 successfull movements 85486
200 successfull movements 85487
200 successfull movements 85488
200 successfull movements 85489

200 successfull movements 85904
200 successfull movements 85905
200 successfull movements 85906
200 successfull movements 85907
200 successfull movements 85908
200 successfull movements 85909
200 successfull movements 85910
Episode finished after 200 timesteps with 199.0 reward 0.13090000000000002
200 successfull movements 85911
200 successfull movements 85912
200 successfull movements 85913
200 successfull movements 85914
200 successfull movements 85915
200 successfull movements 85916
200 successfull movements 85917
200 successfull movements 85918
200 successfull movements 85919
200 successfull movements 85920
Episode finished after 200 timesteps with 199.0 reward 0.13080000000000003
200 successfull movements 85921
200 successfull movements 85922
200 successfull movements 85923
200 successfull movements 85924
200 successfull movements 85925
200 successfull movements 85926
200 successfull movements 85927
200 successfull movements 85928
200 successfull movements 85929
200 successfull mo

200 successfull movements 86538
200 successfull movements 86539
200 successfull movements 86540
Episode finished after 200 timesteps with 199.0 reward 0.12460000000000004
200 successfull movements 86541
200 successfull movements 86542
200 successfull movements 86543
200 successfull movements 86544
200 successfull movements 86545
200 successfull movements 86546
200 successfull movements 86547
200 successfull movements 86548
200 successfull movements 86549
200 successfull movements 86550
Episode finished after 200 timesteps with 199.0 reward 0.12449999999999994
200 successfull movements 86551
200 successfull movements 86552
200 successfull movements 86553
200 successfull movements 86554
200 successfull movements 86555
200 successfull movements 86556
200 successfull movements 86557
200 successfull movements 86558
200 successfull movements 86559
200 successfull movements 86560
Episode finished after 200 timesteps with 199.0 reward 0.12439999999999996
200 successfull movements 86561
200 suc

200 successfull movements 86985
200 successfull movements 86986
200 successfull movements 86987
200 successfull movements 86988
200 successfull movements 86989
200 successfull movements 86990
Episode finished after 200 timesteps with 199.0 reward 0.12009999999999998
200 successfull movements 86991
200 successfull movements 86992
200 successfull movements 86993
200 successfull movements 86994
200 successfull movements 86995
200 successfull movements 86996
200 successfull movements 86997
200 successfull movements 86998
200 successfull movements 86999
200 successfull movements 87000
Episode finished after 200 timesteps with 199.0 reward 0.12
200 successfull movements 87001
200 successfull movements 87002
200 successfull movements 87003
200 successfull movements 87004
200 successfull movements 87005
200 successfull movements 87006
200 successfull movements 87007
200 successfull movements 87008
200 successfull movements 87009
200 successfull movements 87010
Episode finished after 200 timest

200 successfull movements 87553
200 successfull movements 87554
200 successfull movements 87555
200 successfull movements 87556
200 successfull movements 87557
200 successfull movements 87558
200 successfull movements 87559
200 successfull movements 87560
Episode finished after 200 timesteps with 199.0 reward 0.11439999999999995
200 successfull movements 87561
200 successfull movements 87562
200 successfull movements 87563
200 successfull movements 87564
200 successfull movements 87565
200 successfull movements 87566
200 successfull movements 87567
200 successfull movements 87568
200 successfull movements 87569
200 successfull movements 87570
Episode finished after 200 timesteps with 199.0 reward 0.11429999999999996
200 successfull movements 87571
200 successfull movements 87572
200 successfull movements 87573
200 successfull movements 87574
200 successfull movements 87575
200 successfull movements 87576
200 successfull movements 87577
200 successfull movements 87578
200 successfull mo

200 successfull movements 87843
200 successfull movements 87844
200 successfull movements 87845
200 successfull movements 87846
200 successfull movements 87847
200 successfull movements 87848
200 successfull movements 87849
200 successfull movements 87850
Episode finished after 200 timesteps with 199.0 reward 0.11150000000000004
200 successfull movements 87851
200 successfull movements 87852
200 successfull movements 87853
200 successfull movements 87854
200 successfull movements 87855
200 successfull movements 87856
200 successfull movements 87857
200 successfull movements 87858
200 successfull movements 87859
200 successfull movements 87860
Episode finished after 200 timesteps with 199.0 reward 0.11139999999999994
200 successfull movements 87861
200 successfull movements 87862
200 successfull movements 87863
200 successfull movements 87864
200 successfull movements 87865
200 successfull movements 87866
200 successfull movements 87867
200 successfull movements 87868
200 successfull mo

200 successfull movements 88309
200 successfull movements 88310
Episode finished after 200 timesteps with 199.0 reward 0.1069
200 successfull movements 88311
200 successfull movements 88312
200 successfull movements 88313
200 successfull movements 88314
200 successfull movements 88315
200 successfull movements 88316
200 successfull movements 88317
200 successfull movements 88318
200 successfull movements 88319
200 successfull movements 88320
Episode finished after 200 timesteps with 199.0 reward 0.1068
200 successfull movements 88321
200 successfull movements 88322
200 successfull movements 88323
200 successfull movements 88324
200 successfull movements 88325
200 successfull movements 88326
200 successfull movements 88327
200 successfull movements 88328
200 successfull movements 88329
200 successfull movements 88330
Episode finished after 200 timesteps with 199.0 reward 0.10670000000000002
200 successfull movements 88331
200 successfull movements 88332
200 successfull movements 88333
2

200 successfull movements 88859
200 successfull movements 88860
Episode finished after 200 timesteps with 199.0 reward 0.10140000000000005
200 successfull movements 88861
200 successfull movements 88862
200 successfull movements 88863
200 successfull movements 88864
200 successfull movements 88865
200 successfull movements 88866
200 successfull movements 88867
200 successfull movements 88868
200 successfull movements 88869
200 successfull movements 88870
Episode finished after 200 timesteps with 199.0 reward 0.10129999999999995
200 successfull movements 88871
200 successfull movements 88872
200 successfull movements 88873
200 successfull movements 88874
200 successfull movements 88875
200 successfull movements 88876
200 successfull movements 88877
200 successfull movements 88878
200 successfull movements 88879
200 successfull movements 88880
Episode finished after 200 timesteps with 199.0 reward 0.10119999999999996
200 successfull movements 88881
200 successfull movements 88882
200 suc

200 successfull movements 89480
Episode finished after 200 timesteps with 199.0 reward 0.09519999999999995
200 successfull movements 89481
200 successfull movements 89482
200 successfull movements 89483
200 successfull movements 89484
200 successfull movements 89485
200 successfull movements 89486
200 successfull movements 89487
200 successfull movements 89488
200 successfull movements 89489
200 successfull movements 89490
Episode finished after 200 timesteps with 199.0 reward 0.09509999999999996
200 successfull movements 89491
200 successfull movements 89492
200 successfull movements 89493
200 successfull movements 89494
200 successfull movements 89495
200 successfull movements 89496
200 successfull movements 89497
200 successfull movements 89498
200 successfull movements 89499
200 successfull movements 89500
Episode finished after 200 timesteps with 199.0 reward 0.09499999999999997
200 successfull movements 89501
200 successfull movements 89502
200 successfull movements 89503
200 suc

200 successfull movements 89929
200 successfull movements 89930
Episode finished after 200 timesteps with 199.0 reward 0.0907
200 successfull movements 89931
200 successfull movements 89932
200 successfull movements 89933
200 successfull movements 89934
200 successfull movements 89935
200 successfull movements 89936
200 successfull movements 89937
200 successfull movements 89938
200 successfull movements 89939
200 successfull movements 89940
Episode finished after 200 timesteps with 199.0 reward 0.09060000000000001
200 successfull movements 89941
200 successfull movements 89942
200 successfull movements 89943
200 successfull movements 89944
200 successfull movements 89945
200 successfull movements 89946
200 successfull movements 89947
200 successfull movements 89948
200 successfull movements 89949
200 successfull movements 89950
Episode finished after 200 timesteps with 199.0 reward 0.09050000000000002
200 successfull movements 89951
200 successfull movements 89952
200 successfull move

200 successfull movements 90359
200 successfull movements 90360
Episode finished after 200 timesteps with 199.0 reward 0.08640000000000003
200 successfull movements 90361
200 successfull movements 90362
200 successfull movements 90363
200 successfull movements 90364
200 successfull movements 90365
200 successfull movements 90366
200 successfull movements 90367
200 successfull movements 90368
200 successfull movements 90369
200 successfull movements 90370
Episode finished after 200 timesteps with 199.0 reward 0.08630000000000004
200 successfull movements 90371
200 successfull movements 90372
200 successfull movements 90373
200 successfull movements 90374
200 successfull movements 90375
200 successfull movements 90376
200 successfull movements 90377
200 successfull movements 90378
200 successfull movements 90379
200 successfull movements 90380
Episode finished after 200 timesteps with 199.0 reward 0.08619999999999994
200 successfull movements 90381
200 successfull movements 90382
200 suc

200 successfull movements 90881
200 successfull movements 90882
200 successfull movements 90883
200 successfull movements 90884
200 successfull movements 90885
200 successfull movements 90886
200 successfull movements 90887
200 successfull movements 90888
200 successfull movements 90889
200 successfull movements 90890
Episode finished after 200 timesteps with 199.0 reward 0.08109999999999995
200 successfull movements 90891
200 successfull movements 90892
200 successfull movements 90893
200 successfull movements 90894
200 successfull movements 90895
200 successfull movements 90896
200 successfull movements 90897
200 successfull movements 90898
200 successfull movements 90899
200 successfull movements 90900
Episode finished after 200 timesteps with 199.0 reward 0.08099999999999996
200 successfull movements 90901
200 successfull movements 90902
200 successfull movements 90903
200 successfull movements 90904
200 successfull movements 90905
200 successfull movements 90906
200 successfull mo

200 successfull movements 91120
Episode finished after 200 timesteps with 199.0 reward 0.07879999999999998
200 successfull movements 91121
200 successfull movements 91122
200 successfull movements 91123
200 successfull movements 91124
200 successfull movements 91125
200 successfull movements 91126
200 successfull movements 91127
200 successfull movements 91128
200 successfull movements 91129
200 successfull movements 91130
Episode finished after 200 timesteps with 199.0 reward 0.07869999999999999
200 successfull movements 91131
200 successfull movements 91132
200 successfull movements 91133
200 successfull movements 91134
200 successfull movements 91135
200 successfull movements 91136
200 successfull movements 91137
200 successfull movements 91138
200 successfull movements 91139
200 successfull movements 91140
Episode finished after 200 timesteps with 199.0 reward 0.0786
200 successfull movements 91141
200 successfull movements 91142
200 successfull movements 91143
200 successfull move

Episode finished after 200 timesteps with 199.0 reward 0.07530000000000003
200 successfull movements 91471
200 successfull movements 91472
200 successfull movements 91473
200 successfull movements 91474
200 successfull movements 91475
200 successfull movements 91476
200 successfull movements 91477
200 successfull movements 91478
200 successfull movements 91479
200 successfull movements 91480
Episode finished after 200 timesteps with 199.0 reward 0.07520000000000004
200 successfull movements 91481
200 successfull movements 91482
200 successfull movements 91483
200 successfull movements 91484
200 successfull movements 91485
200 successfull movements 91486
200 successfull movements 91487
200 successfull movements 91488
200 successfull movements 91489
200 successfull movements 91490
Episode finished after 200 timesteps with 199.0 reward 0.07509999999999994
200 successfull movements 91491
200 successfull movements 91492
200 successfull movements 91493
200 successfull movements 91494
200 suc

Episode finished after 200 timesteps with 199.0 reward 0.07109999999999994
200 successfull movements 91891
200 successfull movements 91892
200 successfull movements 91893
200 successfull movements 91894
200 successfull movements 91895
200 successfull movements 91896
200 successfull movements 91897
200 successfull movements 91898
200 successfull movements 91899
200 successfull movements 91900
Episode finished after 200 timesteps with 199.0 reward 0.07099999999999995
200 successfull movements 91901
200 successfull movements 91902
200 successfull movements 91903
200 successfull movements 91904
200 successfull movements 91905
200 successfull movements 91906
200 successfull movements 91907
200 successfull movements 91908
200 successfull movements 91909
200 successfull movements 91910
Episode finished after 200 timesteps with 199.0 reward 0.07089999999999996
200 successfull movements 91911
200 successfull movements 91912
200 successfull movements 91913
200 successfull movements 91914
200 suc

Episode finished after 200 timesteps with 199.0 reward 0.06779999999999997
200 successfull movements 92221
200 successfull movements 92222
200 successfull movements 92223
200 successfull movements 92224
200 successfull movements 92225
200 successfull movements 92226
200 successfull movements 92227
200 successfull movements 92228
200 successfull movements 92229
200 successfull movements 92230
Episode finished after 200 timesteps with 199.0 reward 0.06769999999999998
200 successfull movements 92231
200 successfull movements 92232
200 successfull movements 92233
200 successfull movements 92234
200 successfull movements 92235
200 successfull movements 92236
200 successfull movements 92237
200 successfull movements 92238
200 successfull movements 92239
200 successfull movements 92240
Episode finished after 200 timesteps with 199.0 reward 0.0676
200 successfull movements 92241
200 successfull movements 92242
200 successfull movements 92243
200 successfull movements 92244
200 successfull move

200 successfull movements 92529
200 successfull movements 92530
Episode finished after 200 timesteps with 199.0 reward 0.06469999999999998
200 successfull movements 92531
200 successfull movements 92532
200 successfull movements 92533
200 successfull movements 92534
200 successfull movements 92535
200 successfull movements 92536
200 successfull movements 92537
200 successfull movements 92538
200 successfull movements 92539
200 successfull movements 92540
Episode finished after 200 timesteps with 199.0 reward 0.06459999999999999
200 successfull movements 92541
200 successfull movements 92543
200 successfull movements 92544
200 successfull movements 92545
200 successfull movements 92546
200 successfull movements 92547
200 successfull movements 92548
200 successfull movements 92549
200 successfull movements 92550
Episode finished after 200 timesteps with 199.0 reward 0.0645
200 successfull movements 92551
200 successfull movements 92552
200 successfull movements 92553
200 successfull move

200 successfull movements 92886
200 successfull movements 92887
200 successfull movements 92888
200 successfull movements 92889
200 successfull movements 92890
Episode finished after 200 timesteps with 199.0 reward 0.06110000000000004
200 successfull movements 92891
200 successfull movements 92892
200 successfull movements 92893
200 successfull movements 92894
200 successfull movements 92895
200 successfull movements 92896
200 successfull movements 92897
200 successfull movements 92898
200 successfull movements 92899
200 successfull movements 92900
Episode finished after 200 timesteps with 199.0 reward 0.06099999999999994
200 successfull movements 92901
200 successfull movements 92902
200 successfull movements 92903
200 successfull movements 92904
200 successfull movements 92905
200 successfull movements 92906
200 successfull movements 92907
200 successfull movements 92908
200 successfull movements 92909
200 successfull movements 92910
Episode finished after 200 timesteps with 199.0 re

Episode finished after 200 timesteps with 199.0 reward 0.05589999999999995
200 successfull movements 93411
200 successfull movements 93412
200 successfull movements 93413
200 successfull movements 93414
200 successfull movements 93415
200 successfull movements 93416
200 successfull movements 93417
200 successfull movements 93418
200 successfull movements 93419
200 successfull movements 93420
Episode finished after 200 timesteps with 199.0 reward 0.05579999999999996
200 successfull movements 93421
200 successfull movements 93422
200 successfull movements 93423
200 successfull movements 93424
200 successfull movements 93425
200 successfull movements 93426
200 successfull movements 93427
200 successfull movements 93428
200 successfull movements 93429
200 successfull movements 93430
Episode finished after 200 timesteps with 199.0 reward 0.05569999999999997
200 successfull movements 93431
200 successfull movements 93432
200 successfull movements 93433
200 successfull movements 93434
200 suc

200 successfull movements 94066
200 successfull movements 94067
200 successfull movements 94068
200 successfull movements 94069
200 successfull movements 94070
Episode finished after 200 timesteps with 199.0 reward 0.05
200 successfull movements 94071
200 successfull movements 94072
200 successfull movements 94073
200 successfull movements 94074
200 successfull movements 94075
200 successfull movements 94076
200 successfull movements 94077
200 successfull movements 94078
200 successfull movements 94079
200 successfull movements 94080
Episode finished after 200 timesteps with 199.0 reward 0.05
200 successfull movements 94081
200 successfull movements 94082
200 successfull movements 94083
200 successfull movements 94084
200 successfull movements 94085
200 successfull movements 94086
200 successfull movements 94087
200 successfull movements 94088
200 successfull movements 94089
200 successfull movements 94090
Episode finished after 200 timesteps with 199.0 reward 0.05
200 successfull move

200 successfull movements 94513
200 successfull movements 94514
200 successfull movements 94515
200 successfull movements 94516
200 successfull movements 94517
200 successfull movements 94518
200 successfull movements 94519
200 successfull movements 94520
Episode finished after 200 timesteps with 199.0 reward 0.05
200 successfull movements 94521
200 successfull movements 94522
200 successfull movements 94523
200 successfull movements 94524
200 successfull movements 94525
200 successfull movements 94526
200 successfull movements 94527
200 successfull movements 94528
200 successfull movements 94529
200 successfull movements 94530
Episode finished after 200 timesteps with 199.0 reward 0.05
200 successfull movements 94531
200 successfull movements 94532
200 successfull movements 94533
200 successfull movements 94534
200 successfull movements 94535
200 successfull movements 94536
200 successfull movements 94537
200 successfull movements 94538
200 successfull movements 94539
200 successfull 

200 successfull movements 94942
200 successfull movements 94943
200 successfull movements 94944
200 successfull movements 94945
200 successfull movements 94946
200 successfull movements 94947
200 successfull movements 94948
200 successfull movements 94949
200 successfull movements 94950
Episode finished after 200 timesteps with 199.0 reward 0.05
200 successfull movements 94951
200 successfull movements 94952
200 successfull movements 94953
200 successfull movements 94954
200 successfull movements 94955
200 successfull movements 94956
200 successfull movements 94957
200 successfull movements 94958
200 successfull movements 94959
200 successfull movements 94960
Episode finished after 200 timesteps with 199.0 reward 0.05
200 successfull movements 94961
200 successfull movements 94962
200 successfull movements 94963
200 successfull movements 94964
200 successfull movements 94965
200 successfull movements 94966
200 successfull movements 94967
200 successfull movements 94968
200 successfull 

200 successfull movements 95452
200 successfull movements 95453
200 successfull movements 95454
200 successfull movements 95455
200 successfull movements 95456
200 successfull movements 95457
200 successfull movements 95458
200 successfull movements 95459
200 successfull movements 95460
Episode finished after 200 timesteps with 199.0 reward 0.05
200 successfull movements 95461
200 successfull movements 95462
200 successfull movements 95463
200 successfull movements 95464
200 successfull movements 95465
200 successfull movements 95466
200 successfull movements 95467
200 successfull movements 95468
200 successfull movements 95469
200 successfull movements 95470
Episode finished after 200 timesteps with 199.0 reward 0.05
200 successfull movements 95471
200 successfull movements 95472
200 successfull movements 95473
200 successfull movements 95474
200 successfull movements 95475
200 successfull movements 95476
200 successfull movements 95477
200 successfull movements 95478
200 successfull 

200 successfull movements 96153
200 successfull movements 96154
200 successfull movements 96155
200 successfull movements 96156
200 successfull movements 96157
200 successfull movements 96158
200 successfull movements 96159
200 successfull movements 96160
Episode finished after 200 timesteps with 199.0 reward 0.05
200 successfull movements 96161
200 successfull movements 96162
200 successfull movements 96163
200 successfull movements 96164
200 successfull movements 96165
200 successfull movements 96166
200 successfull movements 96167
200 successfull movements 96168
200 successfull movements 96169
200 successfull movements 96170
Episode finished after 200 timesteps with 199.0 reward 0.05
200 successfull movements 96171
200 successfull movements 96172
200 successfull movements 96173
200 successfull movements 96174
200 successfull movements 96175
200 successfull movements 96176
200 successfull movements 96177
200 successfull movements 96178
200 successfull movements 96179
200 successfull 

200 successfull movements 96585
200 successfull movements 96586
200 successfull movements 96587
200 successfull movements 96588
200 successfull movements 96589
200 successfull movements 96590
Episode finished after 200 timesteps with 199.0 reward 0.05
200 successfull movements 96591
200 successfull movements 96592
200 successfull movements 96593
200 successfull movements 96594
200 successfull movements 96595
200 successfull movements 96596
200 successfull movements 96597
200 successfull movements 96598
200 successfull movements 96599
200 successfull movements 96600
Episode finished after 200 timesteps with 199.0 reward 0.05
200 successfull movements 96601
200 successfull movements 96602
200 successfull movements 96603
200 successfull movements 96604
200 successfull movements 96605
200 successfull movements 96606
200 successfull movements 96607
200 successfull movements 96608
200 successfull movements 96609
200 successfull movements 96610
Episode finished after 200 timesteps with 199.0 

Episode finished after 200 timesteps with 199.0 reward 0.05
200 successfull movements 97051
200 successfull movements 97052
200 successfull movements 97053
200 successfull movements 97054
200 successfull movements 97055
200 successfull movements 97056
200 successfull movements 97057
200 successfull movements 97058
200 successfull movements 97059
200 successfull movements 97060
Episode finished after 200 timesteps with 199.0 reward 0.05
200 successfull movements 97061
200 successfull movements 97062
200 successfull movements 97063
200 successfull movements 97064
200 successfull movements 97065
200 successfull movements 97066
200 successfull movements 97067
200 successfull movements 97068
200 successfull movements 97069
200 successfull movements 97070
Episode finished after 200 timesteps with 199.0 reward 0.05
200 successfull movements 97071
200 successfull movements 97072
200 successfull movements 97073
200 successfull movements 97074
200 successfull movements 97075
200 successfull move

200 successfull movements 97571
200 successfull movements 97572
200 successfull movements 97573
200 successfull movements 97574
200 successfull movements 97575
200 successfull movements 97576
200 successfull movements 97577
200 successfull movements 97578
200 successfull movements 97579
200 successfull movements 97580
Episode finished after 200 timesteps with 199.0 reward 0.05
200 successfull movements 97581
200 successfull movements 97582
200 successfull movements 97583
200 successfull movements 97584
200 successfull movements 97585
200 successfull movements 97586
200 successfull movements 97587
200 successfull movements 97588
200 successfull movements 97589
200 successfull movements 97590
Episode finished after 200 timesteps with 199.0 reward 0.05
200 successfull movements 97591
200 successfull movements 97592
200 successfull movements 97593
200 successfull movements 97594
200 successfull movements 97595
200 successfull movements 97596
200 successfull movements 97597
200 successfull 

200 successfull movements 98232
200 successfull movements 98233
200 successfull movements 98234
200 successfull movements 98235
200 successfull movements 98236
200 successfull movements 98237
200 successfull movements 98238
200 successfull movements 98239
200 successfull movements 98240
Episode finished after 200 timesteps with 199.0 reward 0.05
200 successfull movements 98241
200 successfull movements 98242
200 successfull movements 98243
200 successfull movements 98244
200 successfull movements 98245
200 successfull movements 98246
200 successfull movements 98247
200 successfull movements 98248
200 successfull movements 98249
200 successfull movements 98250
Episode finished after 200 timesteps with 199.0 reward 0.05
200 successfull movements 98251
200 successfull movements 98252
200 successfull movements 98253
200 successfull movements 98254
200 successfull movements 98255
200 successfull movements 98256
200 successfull movements 98257
200 successfull movements 98258
200 successfull 

200 successfull movements 98671
200 successfull movements 98672
200 successfull movements 98673
200 successfull movements 98674
200 successfull movements 98675
200 successfull movements 98676
200 successfull movements 98677
200 successfull movements 98678
200 successfull movements 98679
200 successfull movements 98680
Episode finished after 200 timesteps with 199.0 reward 0.05
200 successfull movements 98681
200 successfull movements 98682
200 successfull movements 98683
200 successfull movements 98684
200 successfull movements 98685
200 successfull movements 98686
200 successfull movements 98687
200 successfull movements 98688
200 successfull movements 98689
200 successfull movements 98690
Episode finished after 200 timesteps with 199.0 reward 0.05
200 successfull movements 98691
200 successfull movements 98692
200 successfull movements 98693
200 successfull movements 98694
200 successfull movements 98695
200 successfull movements 98696
200 successfull movements 98697
200 successfull 

200 successfull movements 99132
200 successfull movements 99133
200 successfull movements 99134
200 successfull movements 99135
200 successfull movements 99136
200 successfull movements 99137
200 successfull movements 99138
200 successfull movements 99139
200 successfull movements 99140
Episode finished after 200 timesteps with 199.0 reward 0.05
200 successfull movements 99141
200 successfull movements 99142
200 successfull movements 99143
200 successfull movements 99144
200 successfull movements 99145
200 successfull movements 99146
200 successfull movements 99147
200 successfull movements 99148
200 successfull movements 99149
200 successfull movements 99150
Episode finished after 200 timesteps with 199.0 reward 0.05
200 successfull movements 99151
200 successfull movements 99152
200 successfull movements 99153
200 successfull movements 99154
200 successfull movements 99155
200 successfull movements 99156
200 successfull movements 99157
200 successfull movements 99158
200 successfull 

In [17]:
print(len(Q))
#print(Q)

29308


In [18]:
#testing the Q policy. Game is solved if average reward over 100 episodes is greater than 195.
episodes = 100
totalReward = 0
test_list = []
for i_episode in range(episodes):
    observation = env.reset()
    episodeReward = 0

    test_list.append(observation)

    for t in range(300):

        bestReward = 0
        action = env.action_space.sample()
        for possibleAction in [0, 1]:
            temp_state = []
            for i in observation:
                temp_state.append(round(i*observation_round,0))
            temp_state.append(possibleAction)

            possibleReward = Q.get(tuple(temp_state[2:]),0.)
            if(possibleReward > bestReward):
                action = possibleAction
                bestReward = possibleReward
        
        observation, reward, done, info = env.step(action)
        episodeReward += reward

        if done:
            print("Episode finished after {} timesteps with {} reward".format(t+1, episodeReward))
            break
    totalReward += episodeReward

print("Total Reward: {}".format(totalReward), totalReward/episodes)


Episode finished after 200 timesteps with 200.0 reward
Episode finished after 200 timesteps with 200.0 reward
Episode finished after 200 timesteps with 200.0 reward
Episode finished after 200 timesteps with 200.0 reward
Episode finished after 200 timesteps with 200.0 reward
Episode finished after 200 timesteps with 200.0 reward
Episode finished after 200 timesteps with 200.0 reward
Episode finished after 200 timesteps with 200.0 reward
Episode finished after 200 timesteps with 200.0 reward
Episode finished after 200 timesteps with 200.0 reward
Episode finished after 200 timesteps with 200.0 reward
Episode finished after 200 timesteps with 200.0 reward
Episode finished after 200 timesteps with 200.0 reward
Episode finished after 200 timesteps with 200.0 reward
Episode finished after 200 timesteps with 200.0 reward
Episode finished after 200 timesteps with 200.0 reward
Episode finished after 200 timesteps with 200.0 reward
Episode finished after 200 timesteps with 200.0 reward
Episode fi


# Conclusion
* Thus the game is solved after 100,000 episodes of training, though the last 50,000 episodes of training were likely superfulous.
* To improve, would be better to do a periodically test the optimal policy throughout the training to see when exactly the solution was found.
* Hyperparamters to tune learning rate, episolon start value and decay value
* There are other methods that can be used to attempt to solve the game.