# Module Five Assignment: Cartpole Problem
Review the code in this notebook and in the score_logger.py file in the *scores* folder (directory). Once you have reviewed the code, return to this notebook and select **Cell** and then **Run All** from the menu bar to run this code. The code takes several minutes to run.

In [2]:
import random  
import gym  
import numpy as np  
from collections import deque  
from keras.models import Sequential  
from keras.layers import Dense  
from keras.optimizers import Adam  
  
  
from scores.score_logger import ScoreLogger  
  
ENV_NAME = "CartPole-v1"  
  
GAMMA = 0.95  
LEARNING_RATE = 0.001  
  
MEMORY_SIZE = 1000000  
BATCH_SIZE = 20  
  
EXPLORATION_MAX = 1.0  
EXPLORATION_MIN = 0.01  
EXPLORATION_DECAY = 0.995  
  
  
class DQNSolver:  
  
    def __init__(self, observation_space, action_space):  
        self.exploration_rate = EXPLORATION_MAX  
  
        self.action_space = action_space  
        self.memory = deque(maxlen=MEMORY_SIZE)  
  
        self.model = Sequential()  
        self.model.add(Dense(24, input_shape=(observation_space,), activation="relu"))  
        self.model.add(Dense(24, activation="relu"))  
        self.model.add(Dense(self.action_space, activation="linear"))  
        self.model.compile(loss="mse", optimizer=Adam(lr=LEARNING_RATE))  
  
    def remember(self, state, action, reward, next_state, done):  
        self.memory.append((state, action, reward, next_state, done))  
  
    def act(self, state):  
        if np.random.rand() < self.exploration_rate:  
            return random.randrange(self.action_space)  
        q_values = self.model.predict(state)  
        return np.argmax(q_values[0])  
  
    def experience_replay(self):  
        if len(self.memory) < BATCH_SIZE:  
            return  
        batch = random.sample(self.memory, BATCH_SIZE)  
        for state, action, reward, state_next, terminal in batch:  
            q_update = reward  
            if not terminal:  
                q_update = (reward + GAMMA * np.amax(self.model.predict(state_next)[0]))  
            q_values = self.model.predict(state)  
            q_values[0][action] = q_update  
            self.model.fit(state, q_values, verbose=0)  
        self.exploration_rate *= EXPLORATION_DECAY  
        self.exploration_rate = max(EXPLORATION_MIN, self.exploration_rate)  
  
  
def cartpole():  
    env = gym.make(ENV_NAME)  
    score_logger = ScoreLogger(ENV_NAME)  
    observation_space = env.observation_space.shape[0]  
    action_space = env.action_space.n  
    dqn_solver = DQNSolver(observation_space, action_space)  
    run = 0  
    scores = []  # To keep track of the last 100 scores
    while True:  
        run += 1  
        state = env.reset()  
        state = np.reshape(state, [1, observation_space])  
        step = 0  
        while True:  
            step += 1  
            #env.render()  
            action = dqn_solver.act(state)  
            state_next, reward, terminal, info = env.step(action)  
            reward = reward if not terminal else -reward  
            state_next = np.reshape(state_next, [1, observation_space])  
            dqn_solver.remember(state, action, reward, state_next, terminal)  
            state = state_next  
            if terminal:  
                print ("Run: " + str(run) + ", exploration: " + str(dqn_solver.exploration_rate) + ", score: " + str(step))  
                score_logger.add_score(step, run)
                
                break  
            dqn_solver.experience_replay()  



Using TensorFlow backend.


In [2]:
cartpole()

Run: 1, exploration: 1.0, score: 16
Scores: (min: 16, avg: 16, max: 16)

Run: 2, exploration: 0.9137248860125932, score: 22
Scores: (min: 16, avg: 19, max: 22)

Run: 3, exploration: 0.736559652908221, score: 44
Scores: (min: 16, avg: 27.333333333333332, max: 44)

Run: 4, exploration: 0.6900935609921609, score: 14
Scores: (min: 14, avg: 24, max: 44)

Run: 5, exploration: 0.5819594443402982, score: 35
Scores: (min: 14, avg: 26.2, max: 44)

Run: 6, exploration: 0.547986285490042, score: 13
Scores: (min: 13, avg: 24, max: 44)

Run: 7, exploration: 0.5185893309484582, score: 12
Scores: (min: 12, avg: 22.285714285714285, max: 44)

Run: 8, exploration: 0.4932355662165453, score: 11
Scores: (min: 11, avg: 20.875, max: 44)

Run: 9, exploration: 0.45522245551230495, score: 17
Scores: (min: 11, avg: 20.444444444444443, max: 44)

Run: 10, exploration: 0.4180382776616619, score: 18
Scores: (min: 11, avg: 20.2, max: 44)

Run: 11, exploration: 0.3936343764094253, score: 13
Scores: (min: 11, avg: 19.5

Run: 86, exploration: 0.01, score: 15
Scores: (min: 8, avg: 17.24418604651163, max: 54)

Run: 87, exploration: 0.01, score: 41
Scores: (min: 8, avg: 17.517241379310345, max: 54)

Run: 88, exploration: 0.01, score: 17
Scores: (min: 8, avg: 17.511363636363637, max: 54)

Run: 89, exploration: 0.01, score: 36
Scores: (min: 8, avg: 17.719101123595507, max: 54)

Run: 90, exploration: 0.01, score: 36
Scores: (min: 8, avg: 17.92222222222222, max: 54)

Run: 91, exploration: 0.01, score: 42
Scores: (min: 8, avg: 18.186813186813186, max: 54)

Run: 92, exploration: 0.01, score: 49
Scores: (min: 8, avg: 18.52173913043478, max: 54)

Run: 93, exploration: 0.01, score: 23
Scores: (min: 8, avg: 18.56989247311828, max: 54)

Run: 94, exploration: 0.01, score: 63
Scores: (min: 8, avg: 19.04255319148936, max: 63)

Run: 95, exploration: 0.01, score: 35
Scores: (min: 8, avg: 19.210526315789473, max: 63)

Run: 96, exploration: 0.01, score: 71
Scores: (min: 8, avg: 19.75, max: 71)

Run: 97, exploration: 0.01, 

Run: 187, exploration: 0.01, score: 162
Scores: (min: 17, avg: 159.31, max: 462)

Run: 188, exploration: 0.01, score: 141
Scores: (min: 23, avg: 160.55, max: 462)

Run: 189, exploration: 0.01, score: 193
Scores: (min: 23, avg: 162.12, max: 462)

Run: 190, exploration: 0.01, score: 127
Scores: (min: 23, avg: 163.03, max: 462)

Run: 191, exploration: 0.01, score: 178
Scores: (min: 23, avg: 164.39, max: 462)

Run: 192, exploration: 0.01, score: 288
Scores: (min: 23, avg: 166.78, max: 462)

Run: 193, exploration: 0.01, score: 157
Scores: (min: 35, avg: 168.12, max: 462)

Run: 194, exploration: 0.01, score: 172
Scores: (min: 35, avg: 169.21, max: 462)

Run: 195, exploration: 0.01, score: 149
Scores: (min: 64, avg: 170.35, max: 462)

Run: 196, exploration: 0.01, score: 131
Scores: (min: 64, avg: 170.95, max: 462)

Run: 197, exploration: 0.01, score: 359
Scores: (min: 64, avg: 173.64, max: 462)

Run: 198, exploration: 0.01, score: 170
Scores: (min: 64, avg: 174.66, max: 462)

Run: 199, explor

NameError: name 'exit' is not defined

Note: If the code is running properly, you should begin to see output appearing above this code block. It will take several minutes, so it is recommended that you let this code run in the background while completing other work. When the code has finished, it will print output saying, "Solved in _ runs, _ total runs."

You may see an error about not having an exit command. This error does not affect the program's functionality and results from the steps taken to convert the code from Python 2.x to Python 3. Please disregard this error.

In [3]:
# Experiment 1: Modify Exploration Factor
EXPLORATION_DECAY = 0.9  # Slower decay to encourage more exploration

# Re-run the cartpole function with updated parameters
cartpole()


Run: 1, exploration: 1.0, score: 13
Scores: (min: 13, avg: 13, max: 13)

Run: 2, exploration: 0.0984770902183612, score: 29
Scores: (min: 13, avg: 21, max: 29)

Run: 3, exploration: 0.042391158275216244, score: 9
Scores: (min: 9, avg: 17, max: 29)

Run: 4, exploration: 0.014780882941434608, score: 11
Scores: (min: 9, avg: 15.5, max: 29)

Run: 5, exploration: 0.01, score: 8
Scores: (min: 8, avg: 14, max: 29)

Run: 6, exploration: 0.01, score: 10
Scores: (min: 8, avg: 13.333333333333334, max: 29)

Run: 7, exploration: 0.01, score: 13
Scores: (min: 8, avg: 13.285714285714286, max: 29)

Run: 8, exploration: 0.01, score: 22
Scores: (min: 8, avg: 14.375, max: 29)

Run: 9, exploration: 0.01, score: 31
Scores: (min: 8, avg: 16.22222222222222, max: 31)

Run: 10, exploration: 0.01, score: 13
Scores: (min: 8, avg: 15.9, max: 31)

Run: 11, exploration: 0.01, score: 10
Scores: (min: 8, avg: 15.363636363636363, max: 31)

Run: 12, exploration: 0.01, score: 9
Scores: (min: 8, avg: 14.833333333333334, 

Run: 94, exploration: 0.01, score: 10
Scores: (min: 8, avg: 123.8936170212766, max: 240)

Run: 95, exploration: 0.01, score: 155
Scores: (min: 8, avg: 124.22105263157894, max: 240)

Run: 96, exploration: 0.01, score: 202
Scores: (min: 8, avg: 125.03125, max: 240)

Run: 97, exploration: 0.01, score: 175
Scores: (min: 8, avg: 125.54639175257732, max: 240)

Run: 98, exploration: 0.01, score: 207
Scores: (min: 8, avg: 126.37755102040816, max: 240)

Run: 99, exploration: 0.01, score: 342
Scores: (min: 8, avg: 128.55555555555554, max: 342)

Run: 100, exploration: 0.01, score: 94
Scores: (min: 8, avg: 128.21, max: 342)

Run: 101, exploration: 0.01, score: 37
Scores: (min: 8, avg: 128.45, max: 342)

Run: 102, exploration: 0.01, score: 132
Scores: (min: 8, avg: 129.48, max: 342)

Run: 103, exploration: 0.01, score: 20
Scores: (min: 8, avg: 129.59, max: 342)

Run: 104, exploration: 0.01, score: 196
Scores: (min: 8, avg: 131.44, max: 342)

Run: 105, exploration: 0.01, score: 180
Scores: (min: 9, 

Run: 195, exploration: 0.01, score: 162
Scores: (min: 8, avg: 163.13, max: 500)

Run: 196, exploration: 0.01, score: 52
Scores: (min: 8, avg: 161.63, max: 500)

Run: 197, exploration: 0.01, score: 216
Scores: (min: 8, avg: 162.04, max: 500)

Run: 198, exploration: 0.01, score: 294
Scores: (min: 8, avg: 162.91, max: 500)

Run: 199, exploration: 0.01, score: 113
Scores: (min: 8, avg: 160.62, max: 500)

Run: 200, exploration: 0.01, score: 102
Scores: (min: 8, avg: 160.7, max: 500)

Run: 201, exploration: 0.01, score: 136
Scores: (min: 8, avg: 161.69, max: 500)

Run: 202, exploration: 0.01, score: 253
Scores: (min: 8, avg: 162.9, max: 500)

Run: 203, exploration: 0.01, score: 433
Scores: (min: 8, avg: 167.03, max: 500)

Run: 204, exploration: 0.01, score: 121
Scores: (min: 8, avg: 166.28, max: 500)

Run: 205, exploration: 0.01, score: 165
Scores: (min: 8, avg: 166.13, max: 500)

Run: 206, exploration: 0.01, score: 76
Scores: (min: 8, avg: 165.35, max: 500)

Run: 207, exploration: 0.01, sco

NameError: name 'exit' is not defined

In [None]:
EXPLORATION_DECAY = 0.995  # Reset Param
# Experiment 2: Modify Discount Factor
GAMMA = 0.9  # Increase discount factor to prioritize long-term rewards

# Re-run the cartpole function with updated parameters
cartpole()


Run: 1, exploration: 1.0, score: 19
Scores: (min: 19, avg: 19, max: 19)

Run: 2, exploration: 0.9416228069143757, score: 13
Scores: (min: 13, avg: 16, max: 19)

Run: 3, exploration: 0.7111635524897149, score: 57
Scores: (min: 13, avg: 29.666666666666668, max: 57)

Run: 4, exploration: 0.5590843898207511, score: 49
Scores: (min: 13, avg: 34.5, max: 57)

Run: 5, exploration: 0.5290920728090721, score: 12
Scores: (min: 12, avg: 30, max: 57)

Run: 6, exploration: 0.47862223409330756, score: 21
Scores: (min: 12, avg: 28.5, max: 57)

Run: 7, exploration: 0.45522245551230495, score: 11
Scores: (min: 11, avg: 26, max: 57)

Run: 8, exploration: 0.43732904629000013, score: 9
Scores: (min: 9, avg: 23.875, max: 57)

Run: 9, exploration: 0.4159480862733536, score: 11
Scores: (min: 9, avg: 22.444444444444443, max: 57)

Run: 10, exploration: 0.3976004408064698, score: 10
Scores: (min: 9, avg: 21.2, max: 57)

Run: 11, exploration: 0.37627099809304654, score: 12
Scores: (min: 9, avg: 20.363636363636363

Run: 90, exploration: 0.01, score: 173
Scores: (min: 8, avg: 125.11111111111111, max: 348)

Run: 91, exploration: 0.01, score: 128
Scores: (min: 8, avg: 125.14285714285714, max: 348)

Run: 92, exploration: 0.01, score: 145
Scores: (min: 8, avg: 125.3586956521739, max: 348)

Run: 93, exploration: 0.01, score: 159
Scores: (min: 8, avg: 125.72043010752688, max: 348)

Run: 94, exploration: 0.01, score: 191
Scores: (min: 8, avg: 126.41489361702128, max: 348)

Run: 95, exploration: 0.01, score: 162
Scores: (min: 8, avg: 126.78947368421052, max: 348)

Run: 96, exploration: 0.01, score: 161
Scores: (min: 8, avg: 127.14583333333333, max: 348)

Run: 97, exploration: 0.01, score: 207
Scores: (min: 8, avg: 127.96907216494846, max: 348)

Run: 98, exploration: 0.01, score: 148
Scores: (min: 8, avg: 128.1734693877551, max: 348)

Run: 99, exploration: 0.01, score: 184
Scores: (min: 8, avg: 128.73737373737373, max: 348)

Run: 100, exploration: 0.01, score: 166
Scores: (min: 8, avg: 129.11, max: 348)

R

Run: 190, exploration: 0.01, score: 114
Scores: (min: 14, avg: 156.1, max: 339)

Run: 191, exploration: 0.01, score: 225
Scores: (min: 14, avg: 157.07, max: 339)

Run: 192, exploration: 0.01, score: 106
Scores: (min: 14, avg: 156.68, max: 339)

Run: 193, exploration: 0.01, score: 168
Scores: (min: 14, avg: 156.77, max: 339)

Run: 194, exploration: 0.01, score: 102
Scores: (min: 14, avg: 155.88, max: 339)

Run: 195, exploration: 0.01, score: 145
Scores: (min: 14, avg: 155.71, max: 339)

Run: 196, exploration: 0.01, score: 111
Scores: (min: 14, avg: 155.21, max: 339)

Run: 197, exploration: 0.01, score: 199
Scores: (min: 14, avg: 155.13, max: 339)

Run: 198, exploration: 0.01, score: 201
Scores: (min: 14, avg: 155.66, max: 339)

Run: 199, exploration: 0.01, score: 172
Scores: (min: 14, avg: 155.54, max: 339)

Run: 200, exploration: 0.01, score: 165
Scores: (min: 14, avg: 155.53, max: 339)

Run: 201, exploration: 0.01, score: 41
Scores: (min: 14, avg: 154.26, max: 339)

Run: 202, explorat

Run: 291, exploration: 0.01, score: 179
Scores: (min: 26, avg: 143.89, max: 291)

Run: 292, exploration: 0.01, score: 136
Scores: (min: 26, avg: 144.19, max: 291)

Run: 293, exploration: 0.01, score: 167
Scores: (min: 26, avg: 144.18, max: 291)

Run: 294, exploration: 0.01, score: 57
Scores: (min: 26, avg: 143.73, max: 291)

Run: 295, exploration: 0.01, score: 82
Scores: (min: 26, avg: 143.1, max: 291)

Run: 296, exploration: 0.01, score: 134
Scores: (min: 26, avg: 143.33, max: 291)

Run: 297, exploration: 0.01, score: 122
Scores: (min: 26, avg: 142.56, max: 291)

Run: 298, exploration: 0.01, score: 153
Scores: (min: 26, avg: 142.08, max: 291)

Run: 299, exploration: 0.01, score: 20
Scores: (min: 20, avg: 140.56, max: 291)

Run: 300, exploration: 0.01, score: 241
Scores: (min: 20, avg: 141.32, max: 291)

Run: 301, exploration: 0.01, score: 265
Scores: (min: 20, avg: 143.56, max: 291)

Run: 302, exploration: 0.01, score: 105
Scores: (min: 20, avg: 144.32, max: 291)

Run: 303, exploratio

Run: 392, exploration: 0.01, score: 104
Scores: (min: 11, avg: 188.4, max: 500)

Run: 393, exploration: 0.01, score: 124
Scores: (min: 11, avg: 187.97, max: 500)

Run: 394, exploration: 0.01, score: 148
Scores: (min: 11, avg: 188.88, max: 500)

Run: 395, exploration: 0.01, score: 73
Scores: (min: 11, avg: 188.79, max: 500)

Run: 396, exploration: 0.01, score: 102
Scores: (min: 11, avg: 188.47, max: 500)

Run: 397, exploration: 0.01, score: 165
Scores: (min: 11, avg: 188.9, max: 500)

Run: 398, exploration: 0.01, score: 94
Scores: (min: 11, avg: 188.31, max: 500)

Run: 399, exploration: 0.01, score: 141
Scores: (min: 11, avg: 189.52, max: 500)

Run: 400, exploration: 0.01, score: 348
Scores: (min: 11, avg: 190.59, max: 500)

Run: 401, exploration: 0.01, score: 95
Scores: (min: 11, avg: 188.89, max: 500)

Run: 402, exploration: 0.01, score: 100
Scores: (min: 11, avg: 188.84, max: 500)

Run: 403, exploration: 0.01, score: 16
Scores: (min: 11, avg: 187.95, max: 500)

Run: 404, exploration:

Run: 493, exploration: 0.01, score: 159
Scores: (min: 10, avg: 163.14, max: 500)

Run: 494, exploration: 0.01, score: 120
Scores: (min: 10, avg: 162.86, max: 500)

Run: 495, exploration: 0.01, score: 116
Scores: (min: 10, avg: 163.29, max: 500)

Run: 496, exploration: 0.01, score: 122
Scores: (min: 10, avg: 163.49, max: 500)

Run: 497, exploration: 0.01, score: 151
Scores: (min: 10, avg: 163.35, max: 500)

Run: 498, exploration: 0.01, score: 103
Scores: (min: 10, avg: 163.44, max: 500)

Run: 499, exploration: 0.01, score: 133
Scores: (min: 10, avg: 163.36, max: 500)

Run: 500, exploration: 0.01, score: 180
Scores: (min: 10, avg: 161.68, max: 500)

Run: 501, exploration: 0.01, score: 143
Scores: (min: 10, avg: 162.16, max: 500)

Run: 502, exploration: 0.01, score: 200
Scores: (min: 10, avg: 163.16, max: 500)

Run: 503, exploration: 0.01, score: 129
Scores: (min: 10, avg: 164.29, max: 500)

Run: 504, exploration: 0.01, score: 59
Scores: (min: 10, avg: 162.09, max: 500)

Run: 505, explora

Run: 594, exploration: 0.01, score: 97
Scores: (min: 10, avg: 120.7, max: 500)

Run: 595, exploration: 0.01, score: 236
Scores: (min: 10, avg: 121.9, max: 500)

Run: 596, exploration: 0.01, score: 132
Scores: (min: 10, avg: 122, max: 500)

Run: 597, exploration: 0.01, score: 102
Scores: (min: 10, avg: 121.51, max: 500)

Run: 598, exploration: 0.01, score: 107
Scores: (min: 10, avg: 121.55, max: 500)

Run: 599, exploration: 0.01, score: 97
Scores: (min: 10, avg: 121.19, max: 500)

Run: 600, exploration: 0.01, score: 152
Scores: (min: 10, avg: 120.91, max: 500)

Run: 601, exploration: 0.01, score: 269
Scores: (min: 10, avg: 122.17, max: 500)

Run: 602, exploration: 0.01, score: 11
Scores: (min: 10, avg: 120.28, max: 500)

Run: 603, exploration: 0.01, score: 14
Scores: (min: 10, avg: 119.13, max: 500)

Run: 604, exploration: 0.01, score: 13
Scores: (min: 10, avg: 118.67, max: 500)

Run: 605, exploration: 0.01, score: 162
Scores: (min: 10, avg: 117.49, max: 500)

Run: 606, exploration: 0.0

In [None]:
GAMA = 0.95  # Reset Param
# Experiment 3: Modify Learning Rate
LEARNING_RATE = 0.01  # Increase learning rate for faster updates

# Re-run the cartpole function with updated parameters
cartpole()


Run: 1, exploration: 1.0, score: 11
Scores: (min: 11, avg: 11, max: 11)

Run: 2, exploration: 0.946354579813443, score: 20
Scores: (min: 11, avg: 15.5, max: 20)

Run: 3, exploration: 0.7705488893118823, score: 42
Scores: (min: 11, avg: 24.333333333333332, max: 42)

Run: 4, exploration: 0.7292124703704616, score: 12
Scores: (min: 11, avg: 21.25, max: 42)

Run: 5, exploration: 0.6935613678313175, score: 11
Scores: (min: 11, avg: 19.2, max: 42)

Run: 6, exploration: 0.653073201944699, score: 13
Scores: (min: 11, avg: 18.166666666666668, max: 42)

Run: 7, exploration: 0.6118738784280476, score: 14
Scores: (min: 11, avg: 17.571428571428573, max: 42)

Run: 8, exploration: 0.567555222460375, score: 16
Scores: (min: 11, avg: 17.375, max: 42)

Run: 9, exploration: 0.5344229416520513, score: 13
Scores: (min: 11, avg: 16.88888888888889, max: 42)

Run: 10, exploration: 0.4982051627146237, score: 15
Scores: (min: 11, avg: 16.7, max: 42)

Run: 11, exploration: 0.47622912292284103, score: 10
Scores: 

Run: 89, exploration: 0.01, score: 71
Scores: (min: 8, avg: 29.50561797752809, max: 111)

Run: 90, exploration: 0.01, score: 78
Scores: (min: 8, avg: 30.044444444444444, max: 111)

Run: 91, exploration: 0.01, score: 17
Scores: (min: 8, avg: 29.9010989010989, max: 111)

Run: 92, exploration: 0.01, score: 17
Scores: (min: 8, avg: 29.76086956521739, max: 111)

Run: 93, exploration: 0.01, score: 8
Scores: (min: 8, avg: 29.526881720430108, max: 111)

Run: 94, exploration: 0.01, score: 10
Scores: (min: 8, avg: 29.319148936170212, max: 111)

Run: 95, exploration: 0.01, score: 10
Scores: (min: 8, avg: 29.11578947368421, max: 111)

Run: 96, exploration: 0.01, score: 9
Scores: (min: 8, avg: 28.90625, max: 111)

Run: 97, exploration: 0.01, score: 8
Scores: (min: 8, avg: 28.690721649484537, max: 111)

Run: 98, exploration: 0.01, score: 9
Scores: (min: 8, avg: 28.489795918367346, max: 111)

Run: 99, exploration: 0.01, score: 10
Scores: (min: 8, avg: 28.303030303030305, max: 111)

Run: 100, explorat

Run: 192, exploration: 0.01, score: 39
Scores: (min: 8, avg: 21.81, max: 93)

Run: 193, exploration: 0.01, score: 85
Scores: (min: 8, avg: 22.58, max: 93)

Run: 194, exploration: 0.01, score: 21
Scores: (min: 8, avg: 22.69, max: 93)

Run: 195, exploration: 0.01, score: 114
Scores: (min: 8, avg: 23.73, max: 114)

Run: 196, exploration: 0.01, score: 111
Scores: (min: 8, avg: 24.75, max: 114)

Run: 197, exploration: 0.01, score: 39
Scores: (min: 8, avg: 25.06, max: 114)

Run: 198, exploration: 0.01, score: 33
Scores: (min: 8, avg: 25.3, max: 114)

Run: 199, exploration: 0.01, score: 14
Scores: (min: 8, avg: 25.34, max: 114)

Run: 200, exploration: 0.01, score: 61
Scores: (min: 8, avg: 25.85, max: 114)

Run: 201, exploration: 0.01, score: 120
Scores: (min: 8, avg: 26.96, max: 120)

Run: 202, exploration: 0.01, score: 19
Scores: (min: 8, avg: 27.05, max: 120)

Run: 203, exploration: 0.01, score: 60
Scores: (min: 8, avg: 27.54, max: 120)

Run: 204, exploration: 0.01, score: 132
Scores: (min:

Run: 296, exploration: 0.01, score: 37
Scores: (min: 8, avg: 39.76, max: 223)

Run: 297, exploration: 0.01, score: 13
Scores: (min: 8, avg: 39.5, max: 223)

Run: 298, exploration: 0.01, score: 21
Scores: (min: 8, avg: 39.38, max: 223)

Run: 299, exploration: 0.01, score: 19
Scores: (min: 8, avg: 39.43, max: 223)

Run: 300, exploration: 0.01, score: 145
Scores: (min: 8, avg: 40.27, max: 223)

Run: 301, exploration: 0.01, score: 18
Scores: (min: 8, avg: 39.25, max: 223)

Run: 302, exploration: 0.01, score: 10
Scores: (min: 8, avg: 39.16, max: 223)

Run: 303, exploration: 0.01, score: 12
Scores: (min: 8, avg: 38.68, max: 223)

Run: 304, exploration: 0.01, score: 24
Scores: (min: 8, avg: 37.6, max: 223)

Run: 305, exploration: 0.01, score: 91
Scores: (min: 8, avg: 37.68, max: 223)

Run: 306, exploration: 0.01, score: 94
Scores: (min: 8, avg: 37.7, max: 223)

Run: 307, exploration: 0.01, score: 37
Scores: (min: 8, avg: 37.6, max: 223)

Run: 308, exploration: 0.01, score: 113
Scores: (min: 8

Run: 401, exploration: 0.01, score: 9
Scores: (min: 8, avg: 29.39, max: 264)

Run: 402, exploration: 0.01, score: 10
Scores: (min: 8, avg: 29.39, max: 264)

Run: 403, exploration: 0.01, score: 10
Scores: (min: 8, avg: 29.37, max: 264)

Run: 404, exploration: 0.01, score: 11
Scores: (min: 8, avg: 29.24, max: 264)

Run: 405, exploration: 0.01, score: 8
Scores: (min: 8, avg: 28.41, max: 264)

Run: 406, exploration: 0.01, score: 10
Scores: (min: 8, avg: 27.57, max: 264)

Run: 407, exploration: 0.01, score: 11
Scores: (min: 8, avg: 27.31, max: 264)

Run: 408, exploration: 0.01, score: 9
Scores: (min: 8, avg: 26.27, max: 264)

Run: 409, exploration: 0.01, score: 10
Scores: (min: 8, avg: 25.43, max: 264)

Run: 410, exploration: 0.01, score: 9
Scores: (min: 8, avg: 23.65, max: 264)

Run: 411, exploration: 0.01, score: 9
Scores: (min: 8, avg: 21.96, max: 264)

Run: 412, exploration: 0.01, score: 10
Scores: (min: 8, avg: 21.96, max: 264)

Run: 413, exploration: 0.01, score: 9
Scores: (min: 8, av

Run: 507, exploration: 0.01, score: 8
Scores: (min: 8, avg: 10.33, max: 39)

Run: 508, exploration: 0.01, score: 10
Scores: (min: 8, avg: 10.34, max: 39)

Run: 509, exploration: 0.01, score: 9
Scores: (min: 8, avg: 10.33, max: 39)

Run: 510, exploration: 0.01, score: 11
Scores: (min: 8, avg: 10.35, max: 39)

Run: 511, exploration: 0.01, score: 10
Scores: (min: 8, avg: 10.36, max: 39)

Run: 512, exploration: 0.01, score: 8
Scores: (min: 8, avg: 10.34, max: 39)

Run: 513, exploration: 0.01, score: 8
Scores: (min: 8, avg: 10.33, max: 39)

Run: 514, exploration: 0.01, score: 13
Scores: (min: 8, avg: 10.37, max: 39)

Run: 515, exploration: 0.01, score: 9
Scores: (min: 8, avg: 10.37, max: 39)

Run: 516, exploration: 0.01, score: 9
Scores: (min: 8, avg: 10.37, max: 39)

Run: 517, exploration: 0.01, score: 10
Scores: (min: 8, avg: 10.37, max: 39)

Run: 518, exploration: 0.01, score: 9
Scores: (min: 8, avg: 10.37, max: 39)

Run: 519, exploration: 0.01, score: 9
Scores: (min: 8, avg: 10.36, max:

Run: 613, exploration: 0.01, score: 22
Scores: (min: 8, avg: 15.55, max: 49)

Run: 614, exploration: 0.01, score: 12
Scores: (min: 8, avg: 15.54, max: 49)

Run: 615, exploration: 0.01, score: 9
Scores: (min: 8, avg: 15.54, max: 49)

Run: 616, exploration: 0.01, score: 13
Scores: (min: 8, avg: 15.58, max: 49)

Run: 617, exploration: 0.01, score: 9
Scores: (min: 8, avg: 15.57, max: 49)

Run: 618, exploration: 0.01, score: 12
Scores: (min: 8, avg: 15.6, max: 49)

Run: 619, exploration: 0.01, score: 19
Scores: (min: 8, avg: 15.7, max: 49)

Run: 620, exploration: 0.01, score: 16
Scores: (min: 8, avg: 15.77, max: 49)

Run: 621, exploration: 0.01, score: 12
Scores: (min: 8, avg: 15.8, max: 49)

Run: 622, exploration: 0.01, score: 11
Scores: (min: 8, avg: 15.82, max: 49)

Run: 623, exploration: 0.01, score: 17
Scores: (min: 8, avg: 15.9, max: 49)

Run: 624, exploration: 0.01, score: 12
Scores: (min: 8, avg: 15.93, max: 49)

Run: 625, exploration: 0.01, score: 10
Scores: (min: 8, avg: 15.93, ma

Run: 719, exploration: 0.01, score: 19
Scores: (min: 8, avg: 19.14, max: 93)

Run: 720, exploration: 0.01, score: 15
Scores: (min: 8, avg: 19.13, max: 93)

Run: 721, exploration: 0.01, score: 21
Scores: (min: 8, avg: 19.22, max: 93)

Run: 722, exploration: 0.01, score: 12
Scores: (min: 8, avg: 19.23, max: 93)

Run: 723, exploration: 0.01, score: 14
Scores: (min: 8, avg: 19.2, max: 93)

Run: 724, exploration: 0.01, score: 9
Scores: (min: 8, avg: 19.17, max: 93)

Run: 725, exploration: 0.01, score: 41
Scores: (min: 8, avg: 19.48, max: 93)

Run: 726, exploration: 0.01, score: 17
Scores: (min: 8, avg: 19.48, max: 93)

Run: 727, exploration: 0.01, score: 16
Scores: (min: 8, avg: 19.46, max: 93)

Run: 728, exploration: 0.01, score: 33
Scores: (min: 8, avg: 19.67, max: 93)

Run: 729, exploration: 0.01, score: 9
Scores: (min: 8, avg: 19.31, max: 93)

Run: 730, exploration: 0.01, score: 36
Scores: (min: 8, avg: 19.57, max: 93)

Run: 731, exploration: 0.01, score: 20
Scores: (min: 8, avg: 19.68,

Run: 825, exploration: 0.01, score: 13
Scores: (min: 8, avg: 15.4, max: 76)

Run: 826, exploration: 0.01, score: 12
Scores: (min: 8, avg: 15.35, max: 76)

Run: 827, exploration: 0.01, score: 11
Scores: (min: 8, avg: 15.3, max: 76)

Run: 828, exploration: 0.01, score: 12
Scores: (min: 8, avg: 15.09, max: 76)

Run: 829, exploration: 0.01, score: 11
Scores: (min: 8, avg: 15.11, max: 76)

Run: 830, exploration: 0.01, score: 10
Scores: (min: 8, avg: 14.85, max: 76)

Run: 831, exploration: 0.01, score: 10
Scores: (min: 8, avg: 14.75, max: 76)

Run: 832, exploration: 0.01, score: 11
Scores: (min: 8, avg: 14.65, max: 76)

Run: 833, exploration: 0.01, score: 11
Scores: (min: 8, avg: 14.66, max: 76)

Run: 834, exploration: 0.01, score: 11
Scores: (min: 8, avg: 14.62, max: 76)

Run: 835, exploration: 0.01, score: 10
Scores: (min: 8, avg: 14.64, max: 76)

Run: 836, exploration: 0.01, score: 9
Scores: (min: 8, avg: 14.62, max: 76)

Run: 837, exploration: 0.01, score: 12
Scores: (min: 8, avg: 14.53,

Run: 931, exploration: 0.01, score: 10
Scores: (min: 8, avg: 11.78, max: 69)

Run: 932, exploration: 0.01, score: 10
Scores: (min: 8, avg: 11.77, max: 69)

Run: 933, exploration: 0.01, score: 10
Scores: (min: 8, avg: 11.76, max: 69)

Run: 934, exploration: 0.01, score: 9
Scores: (min: 8, avg: 11.74, max: 69)

Run: 935, exploration: 0.01, score: 10
Scores: (min: 8, avg: 11.74, max: 69)

Run: 936, exploration: 0.01, score: 9
Scores: (min: 8, avg: 11.74, max: 69)

Run: 937, exploration: 0.01, score: 11
Scores: (min: 8, avg: 11.73, max: 69)

Run: 938, exploration: 0.01, score: 9
Scores: (min: 8, avg: 11.73, max: 69)

Run: 939, exploration: 0.01, score: 9
Scores: (min: 8, avg: 11.71, max: 69)

Run: 940, exploration: 0.01, score: 12
Scores: (min: 8, avg: 11.73, max: 69)

Run: 941, exploration: 0.01, score: 11
Scores: (min: 8, avg: 11.66, max: 69)

Run: 942, exploration: 0.01, score: 9
Scores: (min: 8, avg: 11.64, max: 69)

Run: 943, exploration: 0.01, score: 9
Scores: (min: 8, avg: 11.64, ma

Run: 1037, exploration: 0.01, score: 15
Scores: (min: 8, avg: 14.44, max: 90)

Run: 1038, exploration: 0.01, score: 10
Scores: (min: 8, avg: 14.45, max: 90)

Run: 1039, exploration: 0.01, score: 11
Scores: (min: 8, avg: 14.47, max: 90)

Run: 1040, exploration: 0.01, score: 11
Scores: (min: 8, avg: 14.46, max: 90)

Run: 1041, exploration: 0.01, score: 9
Scores: (min: 8, avg: 14.44, max: 90)

Run: 1042, exploration: 0.01, score: 11
Scores: (min: 8, avg: 14.46, max: 90)

Run: 1043, exploration: 0.01, score: 10
Scores: (min: 8, avg: 14.47, max: 90)

Run: 1044, exploration: 0.01, score: 11
Scores: (min: 8, avg: 14.47, max: 90)

Run: 1045, exploration: 0.01, score: 10
Scores: (min: 8, avg: 14.46, max: 90)

Run: 1046, exploration: 0.01, score: 10
Scores: (min: 8, avg: 14.46, max: 90)

Run: 1047, exploration: 0.01, score: 10
Scores: (min: 8, avg: 14.48, max: 90)

Run: 1048, exploration: 0.01, score: 9
Scores: (min: 8, avg: 14.45, max: 90)

Run: 1049, exploration: 0.01, score: 10
Scores: (min: 

Run: 1142, exploration: 0.01, score: 9
Scores: (min: 8, avg: 12.44, max: 47)

Run: 1143, exploration: 0.01, score: 15
Scores: (min: 8, avg: 12.49, max: 47)

Run: 1144, exploration: 0.01, score: 9
Scores: (min: 8, avg: 12.47, max: 47)

Run: 1145, exploration: 0.01, score: 9
Scores: (min: 8, avg: 12.46, max: 47)

Run: 1146, exploration: 0.01, score: 10
Scores: (min: 8, avg: 12.46, max: 47)

Run: 1147, exploration: 0.01, score: 20
Scores: (min: 8, avg: 12.56, max: 47)

Run: 1148, exploration: 0.01, score: 15
Scores: (min: 8, avg: 12.62, max: 47)

Run: 1149, exploration: 0.01, score: 21
Scores: (min: 8, avg: 12.73, max: 47)

Run: 1150, exploration: 0.01, score: 15
Scores: (min: 8, avg: 12.79, max: 47)

Run: 1151, exploration: 0.01, score: 26
Scores: (min: 8, avg: 12.95, max: 47)

Run: 1152, exploration: 0.01, score: 10
Scores: (min: 8, avg: 12.95, max: 47)

Run: 1153, exploration: 0.01, score: 30
Scores: (min: 8, avg: 13.16, max: 47)

Run: 1154, exploration: 0.01, score: 46
Scores: (min: 8

Run: 1247, exploration: 0.01, score: 9
Scores: (min: 8, avg: 15.77, max: 56)

Run: 1248, exploration: 0.01, score: 11
Scores: (min: 8, avg: 15.73, max: 56)

Run: 1249, exploration: 0.01, score: 11
Scores: (min: 8, avg: 15.63, max: 56)

Run: 1250, exploration: 0.01, score: 17
Scores: (min: 8, avg: 15.65, max: 56)

Run: 1251, exploration: 0.01, score: 10
Scores: (min: 8, avg: 15.49, max: 56)

Run: 1252, exploration: 0.01, score: 13
Scores: (min: 8, avg: 15.52, max: 56)

Run: 1253, exploration: 0.01, score: 13
Scores: (min: 8, avg: 15.35, max: 56)

Run: 1254, exploration: 0.01, score: 20
Scores: (min: 8, avg: 15.09, max: 56)

Run: 1255, exploration: 0.01, score: 13
Scores: (min: 8, avg: 14.87, max: 56)

Run: 1256, exploration: 0.01, score: 10
Scores: (min: 8, avg: 14.81, max: 56)

Run: 1257, exploration: 0.01, score: 26
Scores: (min: 8, avg: 14.94, max: 56)

Run: 1258, exploration: 0.01, score: 10
Scores: (min: 8, avg: 14.9, max: 56)

Run: 1259, exploration: 0.01, score: 29
Scores: (min: 

Run: 1351, exploration: 0.01, score: 19
Scores: (min: 9, avg: 16.42, max: 44)

Run: 1352, exploration: 0.01, score: 11
Scores: (min: 9, avg: 16.4, max: 44)

Run: 1353, exploration: 0.01, score: 16
Scores: (min: 9, avg: 16.43, max: 44)

Run: 1354, exploration: 0.01, score: 9
Scores: (min: 9, avg: 16.32, max: 44)

Run: 1355, exploration: 0.01, score: 10
Scores: (min: 9, avg: 16.29, max: 44)

Run: 1356, exploration: 0.01, score: 11
Scores: (min: 9, avg: 16.3, max: 44)

Run: 1357, exploration: 0.01, score: 23
Scores: (min: 9, avg: 16.27, max: 44)

Run: 1358, exploration: 0.01, score: 16
Scores: (min: 9, avg: 16.33, max: 44)

Run: 1359, exploration: 0.01, score: 15
Scores: (min: 9, avg: 16.19, max: 44)

Run: 1360, exploration: 0.01, score: 13
Scores: (min: 9, avg: 16.23, max: 44)

Run: 1361, exploration: 0.01, score: 11
Scores: (min: 9, avg: 16.04, max: 44)

Run: 1362, exploration: 0.01, score: 23
Scores: (min: 9, avg: 16.17, max: 44)

Run: 1363, exploration: 0.01, score: 20
Scores: (min: 9

Run: 1455, exploration: 0.01, score: 9
Scores: (min: 8, avg: 18.18, max: 70)

Run: 1456, exploration: 0.01, score: 9
Scores: (min: 8, avg: 18.16, max: 70)

Run: 1457, exploration: 0.01, score: 9
Scores: (min: 8, avg: 18.02, max: 70)

Run: 1458, exploration: 0.01, score: 15
Scores: (min: 8, avg: 18.01, max: 70)

Run: 1459, exploration: 0.01, score: 18
Scores: (min: 8, avg: 18.04, max: 70)

Run: 1460, exploration: 0.01, score: 23
Scores: (min: 8, avg: 18.14, max: 70)

Run: 1461, exploration: 0.01, score: 34
Scores: (min: 8, avg: 18.37, max: 70)

Run: 1462, exploration: 0.01, score: 20
Scores: (min: 8, avg: 18.34, max: 70)

Run: 1463, exploration: 0.01, score: 15
Scores: (min: 8, avg: 18.29, max: 70)

Run: 1464, exploration: 0.01, score: 12
Scores: (min: 8, avg: 18.27, max: 70)

Run: 1465, exploration: 0.01, score: 14
Scores: (min: 8, avg: 18.32, max: 70)

Run: 1466, exploration: 0.01, score: 10
Scores: (min: 8, avg: 18.33, max: 70)

Run: 1467, exploration: 0.01, score: 39
Scores: (min: 8

Run: 1559, exploration: 0.01, score: 13
Scores: (min: 8, avg: 17.49, max: 64)

Run: 1560, exploration: 0.01, score: 17
Scores: (min: 8, avg: 17.43, max: 64)

Run: 1561, exploration: 0.01, score: 12
Scores: (min: 8, avg: 17.21, max: 64)

Run: 1562, exploration: 0.01, score: 9
Scores: (min: 8, avg: 17.1, max: 64)

Run: 1563, exploration: 0.01, score: 11
Scores: (min: 8, avg: 17.06, max: 64)

Run: 1564, exploration: 0.01, score: 10
Scores: (min: 8, avg: 17.04, max: 64)

Run: 1565, exploration: 0.01, score: 13
Scores: (min: 8, avg: 17.03, max: 64)

Run: 1566, exploration: 0.01, score: 20
Scores: (min: 8, avg: 17.13, max: 64)

Run: 1567, exploration: 0.01, score: 28
Scores: (min: 8, avg: 17.02, max: 64)

Run: 1568, exploration: 0.01, score: 30
Scores: (min: 8, avg: 17.05, max: 64)

Run: 1569, exploration: 0.01, score: 15
Scores: (min: 8, avg: 17.11, max: 64)

Run: 1570, exploration: 0.01, score: 12
Scores: (min: 8, avg: 17.01, max: 64)

Run: 1571, exploration: 0.01, score: 10
Scores: (min: 

Run: 1664, exploration: 0.01, score: 10
Scores: (min: 8, avg: 15.81, max: 58)

Run: 1665, exploration: 0.01, score: 15
Scores: (min: 8, avg: 15.83, max: 58)

Run: 1666, exploration: 0.01, score: 26
Scores: (min: 8, avg: 15.89, max: 58)

Run: 1667, exploration: 0.01, score: 26
Scores: (min: 8, avg: 15.87, max: 58)

Run: 1668, exploration: 0.01, score: 25
Scores: (min: 8, avg: 15.82, max: 58)

Run: 1669, exploration: 0.01, score: 32
Scores: (min: 8, avg: 15.99, max: 58)

Run: 1670, exploration: 0.01, score: 53
Scores: (min: 8, avg: 16.4, max: 58)

Run: 1671, exploration: 0.01, score: 9
Scores: (min: 8, avg: 16.39, max: 58)

Run: 1672, exploration: 0.01, score: 10
Scores: (min: 8, avg: 16.37, max: 58)

Run: 1673, exploration: 0.01, score: 9
Scores: (min: 8, avg: 16.34, max: 58)

Run: 1674, exploration: 0.01, score: 22
Scores: (min: 8, avg: 16.32, max: 58)

Run: 1675, exploration: 0.01, score: 16
Scores: (min: 8, avg: 16.36, max: 58)

Run: 1676, exploration: 0.01, score: 18
Scores: (min: 8

Run: 1769, exploration: 0.01, score: 14
Scores: (min: 8, avg: 16.54, max: 54)

Run: 1770, exploration: 0.01, score: 10
Scores: (min: 8, avg: 16.11, max: 54)

Run: 1771, exploration: 0.01, score: 11
Scores: (min: 8, avg: 16.13, max: 54)

Run: 1772, exploration: 0.01, score: 25
Scores: (min: 8, avg: 16.28, max: 54)

Run: 1773, exploration: 0.01, score: 27
Scores: (min: 8, avg: 16.46, max: 54)

Run: 1774, exploration: 0.01, score: 11
Scores: (min: 8, avg: 16.35, max: 54)

Run: 1775, exploration: 0.01, score: 39
Scores: (min: 8, avg: 16.58, max: 54)

Run: 1776, exploration: 0.01, score: 10
Scores: (min: 8, avg: 16.5, max: 54)

Run: 1777, exploration: 0.01, score: 13
Scores: (min: 8, avg: 16.54, max: 54)

Run: 1778, exploration: 0.01, score: 15
Scores: (min: 8, avg: 16.59, max: 54)

Run: 1779, exploration: 0.01, score: 13
Scores: (min: 8, avg: 16.53, max: 54)

Run: 1780, exploration: 0.01, score: 15
Scores: (min: 8, avg: 16.38, max: 54)

Run: 1781, exploration: 0.01, score: 13
Scores: (min:

Run: 1874, exploration: 0.01, score: 9
Scores: (min: 8, avg: 14.99, max: 43)

Run: 1875, exploration: 0.01, score: 11
Scores: (min: 8, avg: 14.71, max: 43)

Run: 1876, exploration: 0.01, score: 29
Scores: (min: 8, avg: 14.9, max: 43)

Run: 1877, exploration: 0.01, score: 10
Scores: (min: 8, avg: 14.87, max: 43)

Run: 1878, exploration: 0.01, score: 10
Scores: (min: 8, avg: 14.82, max: 43)

Run: 1879, exploration: 0.01, score: 9
Scores: (min: 8, avg: 14.78, max: 43)

Run: 1880, exploration: 0.01, score: 25
Scores: (min: 8, avg: 14.88, max: 43)

Run: 1881, exploration: 0.01, score: 10
Scores: (min: 8, avg: 14.85, max: 43)

Run: 1882, exploration: 0.01, score: 10
Scores: (min: 8, avg: 14.77, max: 43)

Run: 1883, exploration: 0.01, score: 11
Scores: (min: 8, avg: 14.77, max: 43)

Run: 1884, exploration: 0.01, score: 13
Scores: (min: 8, avg: 14.62, max: 43)

Run: 1885, exploration: 0.01, score: 17
Scores: (min: 8, avg: 14.6, max: 43)

Run: 1886, exploration: 0.01, score: 28
Scores: (min: 8,

Run: 1979, exploration: 0.01, score: 32
Scores: (min: 9, avg: 20.27, max: 55)

Run: 1980, exploration: 0.01, score: 11
Scores: (min: 9, avg: 20.13, max: 55)

Run: 1981, exploration: 0.01, score: 9
Scores: (min: 9, avg: 20.12, max: 55)

Run: 1982, exploration: 0.01, score: 9
Scores: (min: 9, avg: 20.11, max: 55)

Run: 1983, exploration: 0.01, score: 32
Scores: (min: 9, avg: 20.32, max: 55)

Run: 1984, exploration: 0.01, score: 40
Scores: (min: 9, avg: 20.59, max: 55)

Run: 1985, exploration: 0.01, score: 20
Scores: (min: 9, avg: 20.62, max: 55)

Run: 1986, exploration: 0.01, score: 9
Scores: (min: 9, avg: 20.43, max: 55)

Run: 1987, exploration: 0.01, score: 26
Scores: (min: 9, avg: 20.42, max: 55)

Run: 1988, exploration: 0.01, score: 13
Scores: (min: 9, avg: 20.28, max: 55)

Run: 1989, exploration: 0.01, score: 37
Scores: (min: 9, avg: 20.54, max: 55)

Run: 1990, exploration: 0.01, score: 10
Scores: (min: 9, avg: 20.55, max: 55)

Run: 1991, exploration: 0.01, score: 39
Scores: (min: 9

Run: 2084, exploration: 0.01, score: 54
Scores: (min: 8, avg: 19.83, max: 83)

Run: 2085, exploration: 0.01, score: 40
Scores: (min: 8, avg: 20.03, max: 83)

Run: 2086, exploration: 0.01, score: 38
Scores: (min: 8, avg: 20.32, max: 83)

Run: 2087, exploration: 0.01, score: 162
Scores: (min: 8, avg: 21.68, max: 162)

Run: 2088, exploration: 0.01, score: 121
Scores: (min: 8, avg: 22.76, max: 162)

Run: 2089, exploration: 0.01, score: 145
Scores: (min: 8, avg: 23.84, max: 162)

Run: 2090, exploration: 0.01, score: 182
Scores: (min: 8, avg: 25.56, max: 182)

Run: 2091, exploration: 0.01, score: 184
Scores: (min: 8, avg: 27.01, max: 184)

Run: 2092, exploration: 0.01, score: 167
Scores: (min: 8, avg: 28.39, max: 184)

Run: 2093, exploration: 0.01, score: 148
Scores: (min: 8, avg: 29.5, max: 184)

Run: 2094, exploration: 0.01, score: 245
Scores: (min: 8, avg: 31.77, max: 245)

Run: 2095, exploration: 0.01, score: 139
Scores: (min: 8, avg: 33.07, max: 245)

Run: 2096, exploration: 0.01, score

Run: 2185, exploration: 0.01, score: 243
Scores: (min: 10, avg: 159.07, max: 427)

Run: 2186, exploration: 0.01, score: 256
Scores: (min: 10, avg: 161.25, max: 427)

Run: 2187, exploration: 0.01, score: 199
Scores: (min: 10, avg: 161.62, max: 427)

Run: 2188, exploration: 0.01, score: 170
Scores: (min: 10, avg: 162.11, max: 427)

Run: 2189, exploration: 0.01, score: 151
Scores: (min: 10, avg: 162.17, max: 427)

Run: 2190, exploration: 0.01, score: 287
Scores: (min: 10, avg: 163.22, max: 427)

Run: 2191, exploration: 0.01, score: 173
Scores: (min: 10, avg: 163.11, max: 427)

Run: 2192, exploration: 0.01, score: 277
Scores: (min: 10, avg: 164.21, max: 427)

Run: 2193, exploration: 0.01, score: 147
Scores: (min: 10, avg: 164.2, max: 427)

Run: 2194, exploration: 0.01, score: 207
Scores: (min: 10, avg: 163.82, max: 427)

Run: 2195, exploration: 0.01, score: 211
Scores: (min: 10, avg: 164.54, max: 427)

Run: 2196, exploration: 0.01, score: 216
Scores: (min: 10, avg: 164.95, max: 427)

Run: 

In [None]:
LEARNING_RATE = 0.001  # Reset Param

# Modify multiple parameters
GAMMA = 0.98  # Slightly higher discount factor
LEARNING_RATE = 0.005  # Moderately increased learning rate
EXPLORATION_DECAY = 0.9  # Longer exploration period

# Run the experiment with the combined modifications
cartpole()


Run: 1, exploration: 0.81, score: 22
Scores: (min: 22, avg: 22, max: 22)

Run: 2, exploration: 0.13508517176729928, score: 18
Scores: (min: 18, avg: 20, max: 22)

Run: 3, exploration: 0.0523347633027361, score: 10
Scores: (min: 10, avg: 16.666666666666668, max: 22)

Run: 4, exploration: 0.020275559590445278, score: 10
Scores: (min: 10, avg: 15, max: 22)

Run: 5, exploration: 0.01, score: 9
Scores: (min: 9, avg: 13.8, max: 22)

Run: 6, exploration: 0.01, score: 10
Scores: (min: 9, avg: 13.166666666666666, max: 22)

Run: 7, exploration: 0.01, score: 9
Scores: (min: 9, avg: 12.571428571428571, max: 22)

Run: 8, exploration: 0.01, score: 10
Scores: (min: 9, avg: 12.25, max: 22)

Run: 9, exploration: 0.01, score: 10
Scores: (min: 9, avg: 12, max: 22)

Run: 10, exploration: 0.01, score: 10
Scores: (min: 9, avg: 11.8, max: 22)

Run: 11, exploration: 0.01, score: 10
Scores: (min: 9, avg: 11.636363636363637, max: 22)

Run: 12, exploration: 0.01, score: 9
Scores: (min: 9, avg: 11.416666666666666

Run: 96, exploration: 0.01, score: 16
Scores: (min: 8, avg: 15.71875, max: 53)

Run: 97, exploration: 0.01, score: 110
Scores: (min: 8, avg: 16.690721649484537, max: 110)

Run: 98, exploration: 0.01, score: 46
Scores: (min: 8, avg: 16.989795918367346, max: 110)

Run: 99, exploration: 0.01, score: 30
Scores: (min: 8, avg: 17.12121212121212, max: 110)

Run: 100, exploration: 0.01, score: 22
Scores: (min: 8, avg: 17.17, max: 110)

Run: 101, exploration: 0.01, score: 20
Scores: (min: 8, avg: 17.15, max: 110)

Run: 102, exploration: 0.01, score: 10
Scores: (min: 8, avg: 17.07, max: 110)

Run: 103, exploration: 0.01, score: 25
Scores: (min: 8, avg: 17.22, max: 110)

Run: 104, exploration: 0.01, score: 8
Scores: (min: 8, avg: 17.2, max: 110)

Run: 105, exploration: 0.01, score: 10
Scores: (min: 8, avg: 17.21, max: 110)

Run: 106, exploration: 0.01, score: 19
Scores: (min: 8, avg: 17.3, max: 110)

Run: 107, exploration: 0.01, score: 23
Scores: (min: 8, avg: 17.44, max: 110)

Run: 108, explorat

Run: 199, exploration: 0.01, score: 10
Scores: (min: 8, avg: 115.03, max: 500)

Run: 200, exploration: 0.01, score: 10
Scores: (min: 8, avg: 114.91, max: 500)

Run: 201, exploration: 0.01, score: 9
Scores: (min: 8, avg: 114.8, max: 500)

Run: 202, exploration: 0.01, score: 9
Scores: (min: 8, avg: 114.79, max: 500)

Run: 203, exploration: 0.01, score: 10
Scores: (min: 8, avg: 114.64, max: 500)

Run: 204, exploration: 0.01, score: 8
Scores: (min: 8, avg: 114.64, max: 500)

Run: 205, exploration: 0.01, score: 9
Scores: (min: 8, avg: 114.63, max: 500)

Run: 206, exploration: 0.01, score: 9
Scores: (min: 8, avg: 114.53, max: 500)

Run: 207, exploration: 0.01, score: 10
Scores: (min: 8, avg: 114.4, max: 500)

Run: 208, exploration: 0.01, score: 11
Scores: (min: 8, avg: 114.36, max: 500)

Run: 209, exploration: 0.01, score: 10
Scores: (min: 8, avg: 114.21, max: 500)

Run: 210, exploration: 0.01, score: 10
Scores: (min: 8, avg: 113.99, max: 500)

Run: 211, exploration: 0.01, score: 9
Scores: (

Run: 303, exploration: 0.01, score: 99
Scores: (min: 8, avg: 18.54, max: 133)

Run: 304, exploration: 0.01, score: 50
Scores: (min: 8, avg: 18.96, max: 133)

Run: 305, exploration: 0.01, score: 41
Scores: (min: 8, avg: 19.28, max: 133)

Run: 306, exploration: 0.01, score: 58
Scores: (min: 8, avg: 19.77, max: 133)

Run: 307, exploration: 0.01, score: 25
Scores: (min: 8, avg: 19.92, max: 133)

Run: 308, exploration: 0.01, score: 22
Scores: (min: 8, avg: 20.03, max: 133)

Run: 309, exploration: 0.01, score: 18
Scores: (min: 8, avg: 20.11, max: 133)

Run: 310, exploration: 0.01, score: 18
Scores: (min: 8, avg: 20.19, max: 133)

Run: 311, exploration: 0.01, score: 114
Scores: (min: 8, avg: 21.24, max: 133)

Run: 312, exploration: 0.01, score: 64
Scores: (min: 8, avg: 21.8, max: 133)

Run: 313, exploration: 0.01, score: 106
Scores: (min: 8, avg: 22.76, max: 133)

Run: 314, exploration: 0.01, score: 18
Scores: (min: 8, avg: 22.85, max: 133)

Run: 315, exploration: 0.01, score: 21
Scores: (min

Run: 407, exploration: 0.01, score: 137
Scores: (min: 8, avg: 62, max: 500)

Run: 408, exploration: 0.01, score: 174
Scores: (min: 8, avg: 63.52, max: 500)

Run: 409, exploration: 0.01, score: 77
Scores: (min: 8, avg: 64.11, max: 500)

Run: 410, exploration: 0.01, score: 19
Scores: (min: 8, avg: 64.12, max: 500)

Run: 411, exploration: 0.01, score: 97
Scores: (min: 8, avg: 63.95, max: 500)

Run: 412, exploration: 0.01, score: 9
Scores: (min: 8, avg: 63.4, max: 500)

Run: 413, exploration: 0.01, score: 23
Scores: (min: 8, avg: 62.57, max: 500)

Run: 414, exploration: 0.01, score: 13
Scores: (min: 8, avg: 62.52, max: 500)

Run: 415, exploration: 0.01, score: 12
Scores: (min: 8, avg: 62.43, max: 500)

Run: 416, exploration: 0.01, score: 19
Scores: (min: 8, avg: 62.44, max: 500)

Run: 417, exploration: 0.01, score: 11
Scores: (min: 8, avg: 62.35, max: 500)

Run: 418, exploration: 0.01, score: 11
Scores: (min: 8, avg: 62.31, max: 500)

Run: 419, exploration: 0.01, score: 13
Scores: (min: 8,

Run: 510, exploration: 0.01, score: 12
Scores: (min: 8, avg: 121.17, max: 291)

Run: 511, exploration: 0.01, score: 18
Scores: (min: 8, avg: 120.38, max: 291)

Run: 512, exploration: 0.01, score: 26
Scores: (min: 8, avg: 120.55, max: 291)

Run: 513, exploration: 0.01, score: 9
Scores: (min: 8, avg: 120.41, max: 291)

Run: 514, exploration: 0.01, score: 9
Scores: (min: 8, avg: 120.37, max: 291)

Run: 515, exploration: 0.01, score: 10
Scores: (min: 8, avg: 120.35, max: 291)

Run: 516, exploration: 0.01, score: 18
Scores: (min: 8, avg: 120.34, max: 291)

Run: 517, exploration: 0.01, score: 11
Scores: (min: 8, avg: 120.34, max: 291)

Run: 518, exploration: 0.01, score: 11
Scores: (min: 8, avg: 120.34, max: 291)

Run: 519, exploration: 0.01, score: 13
Scores: (min: 8, avg: 120.34, max: 291)

Run: 520, exploration: 0.01, score: 19
Scores: (min: 8, avg: 120.44, max: 291)

Run: 521, exploration: 0.01, score: 14
Scores: (min: 8, avg: 120.48, max: 291)

Run: 522, exploration: 0.01, score: 15
Sco

Run: 612, exploration: 0.01, score: 46
Scores: (min: 8, avg: 167.16, max: 500)

Run: 613, exploration: 0.01, score: 16
Scores: (min: 8, avg: 167.23, max: 500)

Run: 614, exploration: 0.01, score: 12
Scores: (min: 8, avg: 167.26, max: 500)

Run: 615, exploration: 0.01, score: 15
Scores: (min: 8, avg: 167.31, max: 500)

Run: 616, exploration: 0.01, score: 16
Scores: (min: 8, avg: 167.29, max: 500)

Run: 617, exploration: 0.01, score: 20
Scores: (min: 8, avg: 167.38, max: 500)

Run: 618, exploration: 0.01, score: 38
Scores: (min: 8, avg: 167.65, max: 500)

Run: 619, exploration: 0.01, score: 20
Scores: (min: 8, avg: 167.72, max: 500)

Run: 620, exploration: 0.01, score: 40
Scores: (min: 8, avg: 167.93, max: 500)

Run: 621, exploration: 0.01, score: 32
Scores: (min: 8, avg: 168.11, max: 500)

Run: 622, exploration: 0.01, score: 18
Scores: (min: 8, avg: 168.14, max: 500)

Run: 623, exploration: 0.01, score: 26
Scores: (min: 8, avg: 168.23, max: 500)

Run: 624, exploration: 0.01, score: 67
S

Run: 715, exploration: 0.01, score: 179
Scores: (min: 8, avg: 66.36, max: 459)

Run: 716, exploration: 0.01, score: 231
Scores: (min: 8, avg: 68.51, max: 459)

Run: 717, exploration: 0.01, score: 45
Scores: (min: 8, avg: 68.76, max: 459)

Run: 718, exploration: 0.01, score: 180
Scores: (min: 8, avg: 70.18, max: 459)

Run: 719, exploration: 0.01, score: 92
Scores: (min: 8, avg: 70.9, max: 459)

Run: 720, exploration: 0.01, score: 108
Scores: (min: 8, avg: 71.58, max: 459)

Run: 721, exploration: 0.01, score: 248
Scores: (min: 8, avg: 73.74, max: 459)

Run: 722, exploration: 0.01, score: 221
Scores: (min: 8, avg: 75.77, max: 459)

Run: 723, exploration: 0.01, score: 10
Scores: (min: 8, avg: 75.61, max: 459)

Run: 724, exploration: 0.01, score: 10
Scores: (min: 8, avg: 75.04, max: 459)

Run: 725, exploration: 0.01, score: 500
Scores: (min: 8, avg: 78.83, max: 500)

Run: 726, exploration: 0.01, score: 476
Scores: (min: 8, avg: 82.12, max: 500)

Run: 727, exploration: 0.01, score: 8
Scores:

Run: 820, exploration: 0.01, score: 10
Scores: (min: 8, avg: 49.86, max: 500)

Run: 821, exploration: 0.01, score: 9
Scores: (min: 8, avg: 47.47, max: 500)

Run: 822, exploration: 0.01, score: 9
Scores: (min: 8, avg: 45.35, max: 500)

Run: 823, exploration: 0.01, score: 10
Scores: (min: 8, avg: 45.35, max: 500)

Run: 824, exploration: 0.01, score: 10
Scores: (min: 8, avg: 45.35, max: 500)

Run: 825, exploration: 0.01, score: 41
Scores: (min: 8, avg: 40.76, max: 476)

Run: 826, exploration: 0.01, score: 13
Scores: (min: 8, avg: 36.13, max: 316)

Run: 827, exploration: 0.01, score: 46
Scores: (min: 8, avg: 36.51, max: 316)

Run: 828, exploration: 0.01, score: 32
Scores: (min: 8, avg: 36.74, max: 316)

Run: 829, exploration: 0.01, score: 68
Scores: (min: 8, avg: 36.14, max: 316)

Run: 830, exploration: 0.01, score: 21
Scores: (min: 8, avg: 35.5, max: 316)

Run: 831, exploration: 0.01, score: 22
Scores: (min: 8, avg: 33.31, max: 316)

Run: 832, exploration: 0.01, score: 30
Scores: (min: 8,

Run: 924, exploration: 0.01, score: 103
Scores: (min: 8, avg: 52.24, max: 198)

Run: 925, exploration: 0.01, score: 45
Scores: (min: 8, avg: 52.28, max: 198)

Run: 926, exploration: 0.01, score: 63
Scores: (min: 8, avg: 52.78, max: 198)

Run: 927, exploration: 0.01, score: 45
Scores: (min: 8, avg: 52.77, max: 198)

Run: 928, exploration: 0.01, score: 31
Scores: (min: 8, avg: 52.76, max: 198)

Run: 929, exploration: 0.01, score: 38
Scores: (min: 8, avg: 52.46, max: 198)

Run: 930, exploration: 0.01, score: 35
Scores: (min: 8, avg: 52.6, max: 198)

Run: 931, exploration: 0.01, score: 24
Scores: (min: 8, avg: 52.62, max: 198)

Run: 932, exploration: 0.01, score: 23
Scores: (min: 8, avg: 52.55, max: 198)

Run: 933, exploration: 0.01, score: 24
Scores: (min: 8, avg: 52.55, max: 198)

Run: 934, exploration: 0.01, score: 28
Scores: (min: 8, avg: 52.67, max: 198)

Run: 935, exploration: 0.01, score: 24
Scores: (min: 8, avg: 52.81, max: 198)

Run: 936, exploration: 0.01, score: 29
Scores: (min:

Run: 1028, exploration: 0.01, score: 29
Scores: (min: 8, avg: 69.66, max: 256)

Run: 1029, exploration: 0.01, score: 34
Scores: (min: 8, avg: 69.62, max: 256)

Run: 1030, exploration: 0.01, score: 10
Scores: (min: 8, avg: 69.37, max: 256)

Run: 1031, exploration: 0.01, score: 18
Scores: (min: 8, avg: 69.31, max: 256)

Run: 1032, exploration: 0.01, score: 9
Scores: (min: 8, avg: 69.17, max: 256)

Run: 1033, exploration: 0.01, score: 11
Scores: (min: 8, avg: 69.04, max: 256)

Run: 1034, exploration: 0.01, score: 9
Scores: (min: 8, avg: 68.85, max: 256)

Run: 1035, exploration: 0.01, score: 28
Scores: (min: 8, avg: 68.89, max: 256)

Run: 1036, exploration: 0.01, score: 19
Scores: (min: 8, avg: 68.79, max: 256)

Run: 1037, exploration: 0.01, score: 10
Scores: (min: 8, avg: 68.41, max: 256)

Run: 1038, exploration: 0.01, score: 12
Scores: (min: 8, avg: 68.03, max: 256)

Run: 1039, exploration: 0.01, score: 9
Scores: (min: 8, avg: 67.65, max: 256)

Run: 1040, exploration: 0.01, score: 10
Sco

Run: 1131, exploration: 0.01, score: 31
Scores: (min: 8, avg: 89.25, max: 485)

Run: 1132, exploration: 0.01, score: 9
Scores: (min: 8, avg: 89.25, max: 485)

Run: 1133, exploration: 0.01, score: 68
Scores: (min: 8, avg: 89.82, max: 485)

Run: 1134, exploration: 0.01, score: 47
Scores: (min: 8, avg: 90.2, max: 485)

Run: 1135, exploration: 0.01, score: 157
Scores: (min: 8, avg: 91.49, max: 485)

Run: 1136, exploration: 0.01, score: 157
Scores: (min: 8, avg: 92.87, max: 485)

Run: 1137, exploration: 0.01, score: 300
Scores: (min: 8, avg: 95.77, max: 485)

Run: 1138, exploration: 0.01, score: 94
Scores: (min: 8, avg: 96.59, max: 485)

Run: 1139, exploration: 0.01, score: 10
Scores: (min: 8, avg: 96.6, max: 485)

Run: 1140, exploration: 0.01, score: 16
Scores: (min: 8, avg: 96.66, max: 485)

Run: 1141, exploration: 0.01, score: 172
Scores: (min: 8, avg: 98.18, max: 485)

Run: 1142, exploration: 0.01, score: 127
Scores: (min: 8, avg: 99.29, max: 485)

Run: 1143, exploration: 0.01, score: 1

Understanding the Cartpole Problem with Reinforcement Learning

How Does Reinforcement Learning Apply to Cartpole?
In the Cartpole problem, we use reinforcement learning to teach an agent how to balance a pole on a moving cart. The agent makes decisions based on the current state of the system, and over time, it learns how to maximize rewards by keeping the pole upright for as long as possible.
Goal of the Agent: The agent’s main goal is simple—keep the pole balanced. Every time the pole stays upright, the agent earns a reward. The longer it can balance the pole, the better the outcome (Surma, 2018).
State Values: The agent tracks four things to understand the environment: the cart's position, its velocity, the angle of the pole, and the speed at which the pole is tipping (Lamba, 2018).
Actions: The agent has two possible moves: push the cart left or push it right (Düğmeci, 2018).
Algorithm Used: For this task, we use Deep Q-Learning (DQN). The DQN algorithm uses a neural network to estimate the best action the agent should take based on the current state (Surma, 2018).

How Does Experience Replay Work in Cartpole?
Experience Replay: As the agent interacts with the environment, it stores these interactions in memory. Later, it randomly picks from this memory to learn, which helps the agent avoid focusing too much on what happened recently and makes the learning process more stable (Düğmeci, 2018).
The Role of the Discount Factor: The discount factor (GAMMA) helps the agent balance immediate and future rewards. A higher discount factor encourages the agent to look further into the future when deciding its actions, while a lower value makes it focus more on the here and now (Lamba, 2018).

How Neural Networks Improve Deep Q-Learning
The Network: In this case, the agent’s brain is a neural network with two hidden layers, each having 24 neurons. The input layer takes in the four key state values (cart position, velocity, pole angle, and angular velocity), and the output layer provides the Q-values, which indicate the expected reward for each possible action (Surma, 2018).
Why Use a Neural Network?: Using a neural network makes Q-learning more efficient, especially in a complex environment like Cartpole where the state space is continuous. Instead of using traditional methods that would require a large table of values, the network allows the agent to generalize from what it learns, which speeds up the decision-making process (Lamba, 2018).

How Does the Learning Rate Affect Performance?
Learning Rate Impact: The learning rate (LEARNING_RATE) controls how quickly the neural network updates its understanding of the world. If the learning rate is too high, the agent can struggle to find a stable solution. If it’s too low, the agent will take longer to learn. Finding the right balance is key to ensuring the agent learns efficiently. Typically, a slightly higher learning rate can improve performance, but if it’s too high, the training can become unstable (Surma, 2018).

Summary of experiments:
In Experiment 1, I slowed the exploration decay (EXPLORATION_DECAY = 0.9), allowing the agent to spend more time exploring different strategies before focusing on exploiting the best ones. This change led to faster convergence compared to Run 1, as the agent had more opportunities to gather diverse experiences and improve its decision-making. However, in Experiment 3, I increased the learning rate to 0.01 to speed up the learning process. Despite running 2000 times, the agent didn’t solve the problem. The higher learning rate caused the updates to be too aggressive, leading to instability in the learning process and preventing the agent from converging to an optimal solution. While the slower exploration in Experiment 1 encouraged balanced learning, the overly high learning rate in Experiment 3 disrupted the agent’s ability to learn effectively.

References:
Düğmeci, D. (2018). Finding shortest path using Q-learning algorithm. Towards Data Science. Retrieved from https://towardsdatascience.com
Lamba, A. (2018). An introduction to Q-learning: Reinforcement learning. freeCodeCamp News. Retrieved from https://freecodecamp.org
Surma, G. (2018). Cartpole: Introduction to reinforcement learning (DQN - Deep Q-learning). Medium. Retrieved from https://github.com/gsurma/cartpole

