# Module Five Assignment: Cartpole Problem
Review the code in this notebook and in the score_logger.py file in the *scores* folder (directory). Once you have reviewed the code, return to this notebook and select **Cell** and then **Run All** from the menu bar to run this code. The code takes several minutes to run.

In [1]:
import random  
import gym  
import numpy as np  
from collections import deque  
from keras.models import Sequential  
from keras.layers import Dense  
from keras.optimizers import Adam  
  
  
from scores.score_logger import ScoreLogger  
  
ENV_NAME = "CartPole-v1"  
  
GAMMA = 0.95  
LEARNING_RATE = 0.001  
  
MEMORY_SIZE = 1000000  
BATCH_SIZE = 20  
  
EXPLORATION_MAX = 1.0  
EXPLORATION_MIN = 0.01  
EXPLORATION_DECAY = 0.995  
  
  
class DQNSolver:  
  
    def __init__(self, observation_space, action_space):  
        self.exploration_rate = EXPLORATION_MAX  
  
        self.action_space = action_space  
        self.memory = deque(maxlen=MEMORY_SIZE)  
  
        self.model = Sequential()  
        self.model.add(Dense(24, input_shape=(observation_space,), activation="relu"))  
        self.model.add(Dense(24, activation="relu"))  
        self.model.add(Dense(self.action_space, activation="linear"))  
        self.model.compile(loss="mse", optimizer=Adam(lr=LEARNING_RATE))  
  
    def remember(self, state, action, reward, next_state, done):  
        self.memory.append((state, action, reward, next_state, done))  
  
    def act(self, state):  
        if np.random.rand() < self.exploration_rate:  
            return random.randrange(self.action_space)  
        q_values = self.model.predict(state)  
        return np.argmax(q_values[0])  
  
    def experience_replay(self):  
        if len(self.memory) < BATCH_SIZE:  
            return  
        batch = random.sample(self.memory, BATCH_SIZE)  
        for state, action, reward, state_next, terminal in batch:  
            q_update = reward  
            if not terminal:  
                q_update = (reward + GAMMA * np.amax(self.model.predict(state_next)[0]))  
            q_values = self.model.predict(state)  
            q_values[0][action] = q_update  
            self.model.fit(state, q_values, verbose=0)  
        self.exploration_rate *= EXPLORATION_DECAY  
        self.exploration_rate = max(EXPLORATION_MIN, self.exploration_rate)  
  
  
def cartpole():  
    env = gym.make(ENV_NAME)  
    score_logger = ScoreLogger(ENV_NAME)  
    observation_space = env.observation_space.shape[0]  
    action_space = env.action_space.n  
    dqn_solver = DQNSolver(observation_space, action_space)  
    run = 0  
    while True:  
        run += 1  
        state = env.reset()  
        state = np.reshape(state, [1, observation_space])  
        step = 0  
        while True:  
            step += 1  
            #env.render()  
            action = dqn_solver.act(state)  
            state_next, reward, terminal, info = env.step(action)  
            reward = reward if not terminal else -reward  
            state_next = np.reshape(state_next, [1, observation_space])  
            dqn_solver.remember(state, action, reward, state_next, terminal)  
            state = state_next  
            if terminal:  
                print ("Run: " + str(run) + ", exploration: " + str(dqn_solver.exploration_rate) + ", score: " + str(step))  
                score_logger.add_score(step, run)  
                break  
            dqn_solver.experience_replay()  



Using TensorFlow backend.


In [2]:
cartpole() #Stock params was quick but inconsistent

Run: 1, exploration: 0.8911090557802088, score: 43
Scores: (min: 43, avg: 43, max: 43)

Run: 2, exploration: 0.8348931673187264, score: 14
Scores: (min: 14, avg: 28.5, max: 43)

Run: 3, exploration: 0.736559652908221, score: 26
Scores: (min: 14, avg: 27.666666666666668, max: 43)

Run: 4, exploration: 0.653073201944699, score: 25
Scores: (min: 14, avg: 27, max: 43)

Run: 5, exploration: 0.6088145090359074, score: 15
Scores: (min: 14, avg: 24.6, max: 43)

Run: 6, exploration: 0.5535075230322891, score: 20
Scores: (min: 14, avg: 23.833333333333332, max: 43)

Run: 7, exploration: 0.5264466124450268, score: 11
Scores: (min: 11, avg: 22, max: 43)

Run: 8, exploration: 0.500708706245853, score: 11
Scores: (min: 11, avg: 20.625, max: 43)

Run: 9, exploration: 0.46211964903917074, score: 17
Scores: (min: 11, avg: 20.22222222222222, max: 43)

Run: 10, exploration: 0.4180382776616619, score: 21
Scores: (min: 11, avg: 20.3, max: 43)

Run: 11, exploration: 0.3995984329713264, score: 10
Scores: (min

Run: 91, exploration: 0.01, score: 268
Scores: (min: 9, avg: 146.93406593406593, max: 500)

Run: 92, exploration: 0.01, score: 433
Scores: (min: 9, avg: 150.04347826086956, max: 500)





Run: 93, exploration: 0.01, score: 325
Scores: (min: 9, avg: 151.9247311827957, max: 500)

Run: 94, exploration: 0.01, score: 159
Scores: (min: 9, avg: 152, max: 500)

Run: 95, exploration: 0.01, score: 174
Scores: (min: 9, avg: 152.23157894736843, max: 500)

Run: 96, exploration: 0.01, score: 148
Scores: (min: 9, avg: 152.1875, max: 500)

Run: 97, exploration: 0.01, score: 45
Scores: (min: 9, avg: 151.08247422680412, max: 500)

Run: 98, exploration: 0.01, score: 12
Scores: (min: 9, avg: 149.66326530612244, max: 500)

Run: 99, exploration: 0.01, score: 14
Scores: (min: 9, avg: 148.2929292929293, max: 500)

Run: 100, exploration: 0.01, score: 13
Scores: (min: 9, avg: 146.94, max: 500)

Run: 101, exploration: 0.01, score: 270
Scores: (min: 9, avg: 149.21, max: 500)

Run: 102, exploration: 0.01, score: 220
Scores: (min: 9, avg: 151.27, max: 500)

Run: 103, exploration: 0.01, score: 236
Scores: (min: 9, avg: 153.37, max: 500)

Run: 104, exploration: 0.01, score: 348
Scores: (min: 9, avg: 1

NameError: name 'exit' is not defined

In [3]:
GAMMA = 0.99 # Higher discount factor allowed for more consistency, program took longer
cartpole()

Run: 1, exploration: 1.0, score: 19
Scores: (min: 19, avg: 19, max: 19)

Run: 2, exploration: 0.8955869907338783, score: 23
Scores: (min: 19, avg: 21, max: 23)

Run: 3, exploration: 0.8603841919146962, score: 9
Scores: (min: 9, avg: 17, max: 23)

Run: 4, exploration: 0.8183201210226743, score: 11
Scores: (min: 9, avg: 15.5, max: 23)

Run: 5, exploration: 0.7219385759785162, score: 26
Scores: (min: 9, avg: 17.6, max: 26)

Run: 6, exploration: 0.6662995813682115, score: 17
Scores: (min: 9, avg: 17.5, max: 26)

Run: 7, exploration: 0.6305556603555866, score: 12
Scores: (min: 9, avg: 16.714285714285715, max: 26)

Run: 8, exploration: 0.5647174463480732, score: 23
Scores: (min: 9, avg: 17.5, max: 26)

Run: 9, exploration: 0.4529463432347434, score: 45
Scores: (min: 9, avg: 20.555555555555557, max: 45)

Run: 10, exploration: 0.4286478463299511, score: 12
Scores: (min: 9, avg: 19.7, max: 45)

Run: 11, exploration: 0.40769130904675194, score: 11
Scores: (min: 9, avg: 18.90909090909091, max: 45

Run: 91, exploration: 0.01, score: 500
Scores: (min: 9, avg: 142.1098901098901, max: 500)

Run: 92, exploration: 0.01, score: 355
Scores: (min: 9, avg: 144.42391304347825, max: 500)

Run: 93, exploration: 0.01, score: 305
Scores: (min: 9, avg: 146.1505376344086, max: 500)

Run: 94, exploration: 0.01, score: 218
Scores: (min: 9, avg: 146.91489361702128, max: 500)

Run: 95, exploration: 0.01, score: 274
Scores: (min: 9, avg: 148.25263157894736, max: 500)

Run: 96, exploration: 0.01, score: 299
Scores: (min: 9, avg: 149.82291666666666, max: 500)

Run: 97, exploration: 0.01, score: 389
Scores: (min: 9, avg: 152.28865979381445, max: 500)

Run: 98, exploration: 0.01, score: 401
Scores: (min: 9, avg: 154.8265306122449, max: 500)

Run: 99, exploration: 0.01, score: 260
Scores: (min: 9, avg: 155.88888888888889, max: 500)

Run: 100, exploration: 0.01, score: 481
Scores: (min: 9, avg: 159.14, max: 500)

Run: 101, exploration: 0.01, score: 230
Scores: (min: 9, avg: 161.25, max: 500)

Run: 102, exp

Run: 194, exploration: 0.01, score: 11
Scores: (min: 8, avg: 50.89, max: 500)

Run: 195, exploration: 0.01, score: 11
Scores: (min: 8, avg: 48.26, max: 500)

Run: 196, exploration: 0.01, score: 9
Scores: (min: 8, avg: 45.36, max: 500)

Run: 197, exploration: 0.01, score: 15
Scores: (min: 8, avg: 41.62, max: 500)

Run: 198, exploration: 0.01, score: 10
Scores: (min: 8, avg: 37.71, max: 500)

Run: 199, exploration: 0.01, score: 9
Scores: (min: 8, avg: 35.2, max: 500)

Run: 200, exploration: 0.01, score: 8
Scores: (min: 8, avg: 30.47, max: 500)

Run: 201, exploration: 0.01, score: 8
Scores: (min: 8, avg: 28.25, max: 500)

Run: 202, exploration: 0.01, score: 9
Scores: (min: 8, avg: 23.34, max: 413)

Run: 203, exploration: 0.01, score: 11
Scores: (min: 8, avg: 20.02, max: 413)

Run: 204, exploration: 0.01, score: 9
Scores: (min: 8, avg: 16.03, max: 413)

Run: 205, exploration: 0.01, score: 10
Scores: (min: 8, avg: 12, max: 265)

Run: 206, exploration: 0.01, score: 10
Scores: (min: 8, avg: 9

Run: 302, exploration: 0.01, score: 9
Scores: (min: 8, avg: 9.41, max: 13)

Run: 303, exploration: 0.01, score: 11
Scores: (min: 8, avg: 9.41, max: 13)

Run: 304, exploration: 0.01, score: 11
Scores: (min: 8, avg: 9.43, max: 13)

Run: 305, exploration: 0.01, score: 10
Scores: (min: 8, avg: 9.43, max: 13)

Run: 306, exploration: 0.01, score: 10
Scores: (min: 8, avg: 9.43, max: 13)

Run: 307, exploration: 0.01, score: 10
Scores: (min: 8, avg: 9.44, max: 13)

Run: 308, exploration: 0.01, score: 11
Scores: (min: 8, avg: 9.46, max: 13)

Run: 309, exploration: 0.01, score: 19
Scores: (min: 8, avg: 9.57, max: 19)

Run: 310, exploration: 0.01, score: 18
Scores: (min: 8, avg: 9.66, max: 19)

Run: 311, exploration: 0.01, score: 10
Scores: (min: 8, avg: 9.67, max: 19)

Run: 312, exploration: 0.01, score: 10
Scores: (min: 8, avg: 9.69, max: 19)

Run: 313, exploration: 0.01, score: 10
Scores: (min: 8, avg: 9.7, max: 19)

Run: 314, exploration: 0.01, score: 9
Scores: (min: 8, avg: 9.7, max: 19)

Run

Run: 406, exploration: 0.01, score: 476
Scores: (min: 9, avg: 175.52, max: 500)

Run: 407, exploration: 0.01, score: 448
Scores: (min: 9, avg: 179.9, max: 500)

Run: 408, exploration: 0.01, score: 406
Scores: (min: 9, avg: 183.85, max: 500)

Run: 409, exploration: 0.01, score: 339
Scores: (min: 9, avg: 187.05, max: 500)

Run: 410, exploration: 0.01, score: 500
Scores: (min: 9, avg: 191.87, max: 500)

Run: 411, exploration: 0.01, score: 500
Scores: (min: 9, avg: 196.77, max: 500)

Solved in 311 runs, 411 total runs.


NameError: name 'exit' is not defined

In [4]:
GAMMA = 0.50 #Stopped, I Dont think it will ever reach an average of 195 to end the program. Low discount factor = short sighted agen
cartpole()

Run: 1, exploration: 1.0, score: 19
Scores: (min: 19, avg: 19, max: 19)

Run: 2, exploration: 0.9511101304657719, score: 11
Scores: (min: 11, avg: 15, max: 19)

Run: 3, exploration: 0.8647077305675338, score: 20
Scores: (min: 11, avg: 16.666666666666668, max: 20)

Run: 4, exploration: 0.8265651079747222, score: 10
Scores: (min: 10, avg: 15, max: 20)

Run: 5, exploration: 0.7628626641409962, score: 17
Scores: (min: 10, avg: 15.4, max: 20)

Run: 6, exploration: 0.4858739637363176, score: 91
Scores: (min: 10, avg: 28, max: 91)

Run: 7, exploration: 0.42437208406280985, score: 28
Scores: (min: 10, avg: 28, max: 91)

Run: 8, exploration: 0.3877593341372176, score: 19
Scores: (min: 10, avg: 26.875, max: 91)

Run: 9, exploration: 0.3578751580867638, score: 17
Scores: (min: 10, avg: 25.77777777777778, max: 91)

Run: 10, exploration: 0.27164454854530906, score: 56
Scores: (min: 10, avg: 28.8, max: 91)

Run: 11, exploration: 0.2494556624678441, score: 18
Scores: (min: 10, avg: 27.818181818181817

Run: 90, exploration: 0.01, score: 35
Scores: (min: 9, avg: 102.75555555555556, max: 351)

Run: 91, exploration: 0.01, score: 253
Scores: (min: 9, avg: 104.4065934065934, max: 351)

Run: 92, exploration: 0.01, score: 222
Scores: (min: 9, avg: 105.68478260869566, max: 351)

Run: 93, exploration: 0.01, score: 176
Scores: (min: 9, avg: 106.44086021505376, max: 351)

Run: 94, exploration: 0.01, score: 246
Scores: (min: 9, avg: 107.92553191489361, max: 351)

Run: 95, exploration: 0.01, score: 200
Scores: (min: 9, avg: 108.89473684210526, max: 351)

Run: 96, exploration: 0.01, score: 143
Scores: (min: 9, avg: 109.25, max: 351)

Run: 97, exploration: 0.01, score: 42
Scores: (min: 9, avg: 108.55670103092784, max: 351)

Run: 98, exploration: 0.01, score: 53
Scores: (min: 9, avg: 107.98979591836735, max: 351)

Run: 99, exploration: 0.01, score: 144
Scores: (min: 9, avg: 108.35353535353535, max: 351)

Run: 100, exploration: 0.01, score: 67
Scores: (min: 9, avg: 107.94, max: 351)

Run: 101, explor

Run: 190, exploration: 0.01, score: 162
Scores: (min: 10, avg: 108.89, max: 420)

Run: 191, exploration: 0.01, score: 125
Scores: (min: 10, avg: 107.61, max: 420)

Run: 192, exploration: 0.01, score: 135
Scores: (min: 10, avg: 106.74, max: 420)

Run: 193, exploration: 0.01, score: 146
Scores: (min: 10, avg: 106.44, max: 420)

Run: 194, exploration: 0.01, score: 29
Scores: (min: 10, avg: 104.27, max: 420)

Run: 195, exploration: 0.01, score: 79
Scores: (min: 10, avg: 103.06, max: 420)

Run: 196, exploration: 0.01, score: 14
Scores: (min: 10, avg: 101.77, max: 420)

Run: 197, exploration: 0.01, score: 10
Scores: (min: 10, avg: 101.45, max: 420)

Run: 198, exploration: 0.01, score: 11
Scores: (min: 10, avg: 101.03, max: 420)

Run: 199, exploration: 0.01, score: 11
Scores: (min: 10, avg: 99.7, max: 420)

Run: 200, exploration: 0.01, score: 57
Scores: (min: 10, avg: 99.6, max: 420)

Run: 201, exploration: 0.01, score: 10
Scores: (min: 10, avg: 98.11, max: 420)

Run: 202, exploration: 0.01, 

Run: 293, exploration: 0.01, score: 12
Scores: (min: 10, avg: 68.85, max: 226)

Run: 294, exploration: 0.01, score: 98
Scores: (min: 10, avg: 69.54, max: 226)

Run: 295, exploration: 0.01, score: 51
Scores: (min: 10, avg: 69.26, max: 226)

Run: 296, exploration: 0.01, score: 37
Scores: (min: 10, avg: 69.49, max: 226)

Run: 297, exploration: 0.01, score: 221
Scores: (min: 10, avg: 71.6, max: 226)

Run: 298, exploration: 0.01, score: 12
Scores: (min: 10, avg: 71.61, max: 226)

Run: 299, exploration: 0.01, score: 16
Scores: (min: 10, avg: 71.66, max: 226)

Run: 300, exploration: 0.01, score: 10
Scores: (min: 10, avg: 71.19, max: 226)

Run: 301, exploration: 0.01, score: 57
Scores: (min: 10, avg: 71.66, max: 226)

Run: 302, exploration: 0.01, score: 26
Scores: (min: 10, avg: 70.68, max: 226)

Run: 303, exploration: 0.01, score: 27
Scores: (min: 10, avg: 70.3, max: 226)

Run: 304, exploration: 0.01, score: 32
Scores: (min: 10, avg: 69.98, max: 226)

Run: 305, exploration: 0.01, score: 150
S

Run: 396, exploration: 0.01, score: 97
Scores: (min: 10, avg: 72.92, max: 221)

Run: 397, exploration: 0.01, score: 95
Scores: (min: 10, avg: 71.66, max: 199)

Run: 398, exploration: 0.01, score: 223
Scores: (min: 10, avg: 73.77, max: 223)

Run: 399, exploration: 0.01, score: 52
Scores: (min: 10, avg: 74.13, max: 223)

Run: 400, exploration: 0.01, score: 140
Scores: (min: 11, avg: 75.43, max: 223)

Run: 401, exploration: 0.01, score: 118
Scores: (min: 11, avg: 76.04, max: 223)

Run: 402, exploration: 0.01, score: 185
Scores: (min: 11, avg: 77.63, max: 223)

Run: 403, exploration: 0.01, score: 165
Scores: (min: 11, avg: 79.01, max: 223)

Run: 404, exploration: 0.01, score: 142
Scores: (min: 11, avg: 80.11, max: 223)

Run: 405, exploration: 0.01, score: 78
Scores: (min: 11, avg: 79.39, max: 223)

Run: 406, exploration: 0.01, score: 28
Scores: (min: 11, avg: 78.66, max: 223)

Run: 407, exploration: 0.01, score: 23
Scores: (min: 11, avg: 78.32, max: 223)

Run: 408, exploration: 0.01, score

Run: 499, exploration: 0.01, score: 37
Scores: (min: 10, avg: 67.06, max: 245)

Run: 500, exploration: 0.01, score: 41
Scores: (min: 10, avg: 66.07, max: 245)

Run: 501, exploration: 0.01, score: 58
Scores: (min: 10, avg: 65.47, max: 245)

Run: 502, exploration: 0.01, score: 127
Scores: (min: 10, avg: 64.89, max: 245)

Run: 503, exploration: 0.01, score: 21
Scores: (min: 10, avg: 63.45, max: 245)

Run: 504, exploration: 0.01, score: 10
Scores: (min: 10, avg: 62.13, max: 245)

Run: 505, exploration: 0.01, score: 37
Scores: (min: 10, avg: 61.72, max: 245)

Run: 506, exploration: 0.01, score: 25
Scores: (min: 10, avg: 61.69, max: 245)

Run: 507, exploration: 0.01, score: 17
Scores: (min: 10, avg: 61.63, max: 245)

Run: 508, exploration: 0.01, score: 10
Scores: (min: 10, avg: 61.31, max: 245)

Run: 509, exploration: 0.01, score: 81
Scores: (min: 10, avg: 61.9, max: 245)

Run: 510, exploration: 0.01, score: 211
Scores: (min: 10, avg: 63.9, max: 245)

Run: 511, exploration: 0.01, score: 20
S

KeyboardInterrupt: 

In [5]:
LEARNING_RATE = 0.0001
cartpole()

Run: 1, exploration: 1.0, score: 16
Scores: (min: 16, avg: 16, max: 16)

Run: 2, exploration: 0.810157377815473, score: 46
Scores: (min: 16, avg: 31, max: 46)

Run: 3, exploration: 0.7590483508202912, score: 14
Scores: (min: 14, avg: 25.333333333333332, max: 46)

Run: 4, exploration: 0.7111635524897149, score: 14
Scores: (min: 14, avg: 22.5, max: 46)

Run: 5, exploration: 0.6596532430440636, score: 16
Scores: (min: 14, avg: 21.2, max: 46)

Run: 6, exploration: 0.6369088258938781, score: 8
Scores: (min: 8, avg: 19, max: 46)

Run: 7, exploration: 0.6057704364907278, score: 11
Scores: (min: 8, avg: 17.857142857142858, max: 46)

Run: 8, exploration: 0.5398075216808175, score: 24
Scores: (min: 8, avg: 18.625, max: 46)

Run: 9, exploration: 0.483444593917636, score: 23
Scores: (min: 8, avg: 19.11111111111111, max: 46)

Run: 10, exploration: 0.4598090507939749, score: 11
Scores: (min: 8, avg: 18.3, max: 46)

Run: 11, exploration: 0.4417353564707963, score: 9
Scores: (min: 8, avg: 17.454545454

Run: 83, exploration: 0.01, score: 10
Scores: (min: 8, avg: 12.578313253012048, max: 57)

Run: 84, exploration: 0.01, score: 21
Scores: (min: 8, avg: 12.678571428571429, max: 57)

Run: 85, exploration: 0.01, score: 22
Scores: (min: 8, avg: 12.788235294117648, max: 57)

Run: 86, exploration: 0.01, score: 12
Scores: (min: 8, avg: 12.779069767441861, max: 57)

Run: 87, exploration: 0.01, score: 10
Scores: (min: 8, avg: 12.74712643678161, max: 57)

Run: 88, exploration: 0.01, score: 10
Scores: (min: 8, avg: 12.715909090909092, max: 57)

Run: 89, exploration: 0.01, score: 23
Scores: (min: 8, avg: 12.831460674157304, max: 57)

Run: 90, exploration: 0.01, score: 9
Scores: (min: 8, avg: 12.78888888888889, max: 57)

Run: 91, exploration: 0.01, score: 10
Scores: (min: 8, avg: 12.758241758241759, max: 57)

Run: 92, exploration: 0.01, score: 10
Scores: (min: 8, avg: 12.728260869565217, max: 57)

Run: 93, exploration: 0.01, score: 38
Scores: (min: 8, avg: 13, max: 57)

Run: 94, exploration: 0.01, s

KeyboardInterrupt: 

In [6]:
LEARNING_RATE = 0.01 #Stopped, This high of a learning rate probably will not meet accuracy threshhold
cartpole()

Run: 1, exploration: 0.9275689688183278, score: 35
Scores: (min: 35, avg: 35, max: 35)

Run: 2, exploration: 0.8433051360508336, score: 20
Scores: (min: 20, avg: 27.5, max: 35)

Run: 3, exploration: 0.6935613678313175, score: 40
Scores: (min: 20, avg: 31.666666666666668, max: 40)

Run: 4, exploration: 0.6149486215357263, score: 25
Scores: (min: 20, avg: 30, max: 40)

Run: 5, exploration: 0.5704072587541458, score: 16
Scores: (min: 16, avg: 27.2, max: 40)

Run: 6, exploration: 0.4932355662165453, score: 30
Scores: (min: 16, avg: 27.666666666666668, max: 40)

Run: 7, exploration: 0.46912134373457726, score: 11
Scores: (min: 11, avg: 25.285714285714285, max: 40)

Run: 8, exploration: 0.446186062443672, score: 11
Scores: (min: 11, avg: 23.5, max: 40)

Run: 9, exploration: 0.3632974174544486, score: 42
Scores: (min: 11, avg: 25.555555555555557, max: 42)

Run: 10, exploration: 0.3017979588795719, score: 38
Scores: (min: 11, avg: 26.8, max: 42)

Run: 11, exploration: 0.2689348941735696, score

Run: 89, exploration: 0.01, score: 58
Scores: (min: 10, avg: 26.674157303370787, max: 122)

Run: 90, exploration: 0.01, score: 18
Scores: (min: 10, avg: 26.57777777777778, max: 122)

Run: 91, exploration: 0.01, score: 69
Scores: (min: 10, avg: 27.043956043956044, max: 122)

Run: 92, exploration: 0.01, score: 29
Scores: (min: 10, avg: 27.065217391304348, max: 122)

Run: 93, exploration: 0.01, score: 22
Scores: (min: 10, avg: 27.010752688172044, max: 122)

Run: 94, exploration: 0.01, score: 27
Scores: (min: 10, avg: 27.01063829787234, max: 122)

Run: 95, exploration: 0.01, score: 84
Scores: (min: 10, avg: 27.610526315789475, max: 122)

Run: 96, exploration: 0.01, score: 31
Scores: (min: 10, avg: 27.645833333333332, max: 122)

Run: 97, exploration: 0.01, score: 55
Scores: (min: 10, avg: 27.927835051546392, max: 122)

Run: 98, exploration: 0.01, score: 43
Scores: (min: 10, avg: 28.081632653061224, max: 122)

Run: 99, exploration: 0.01, score: 20
Scores: (min: 10, avg: 28, max: 122)

Run: 1

Run: 191, exploration: 0.01, score: 36
Scores: (min: 9, avg: 37.15, max: 148)

Run: 192, exploration: 0.01, score: 16
Scores: (min: 9, avg: 37.02, max: 148)

Run: 193, exploration: 0.01, score: 14
Scores: (min: 9, avg: 36.94, max: 148)

Run: 194, exploration: 0.01, score: 22
Scores: (min: 9, avg: 36.89, max: 148)

Run: 195, exploration: 0.01, score: 18
Scores: (min: 9, avg: 36.23, max: 148)

Run: 196, exploration: 0.01, score: 26
Scores: (min: 9, avg: 36.18, max: 148)

Run: 197, exploration: 0.01, score: 17
Scores: (min: 9, avg: 35.8, max: 148)

Run: 198, exploration: 0.01, score: 17
Scores: (min: 9, avg: 35.54, max: 148)

Run: 199, exploration: 0.01, score: 28
Scores: (min: 9, avg: 35.62, max: 148)

Run: 200, exploration: 0.01, score: 48
Scores: (min: 9, avg: 35.9, max: 148)

Run: 201, exploration: 0.01, score: 10
Scores: (min: 9, avg: 35.83, max: 148)

Run: 202, exploration: 0.01, score: 53
Scores: (min: 9, avg: 36.06, max: 148)

Run: 203, exploration: 0.01, score: 19
Scores: (min: 9

KeyboardInterrupt: 

In [None]:
EXPLORATION_MAX = 1.5
EXPLORATION_MIN = 0.1
EXPLORATION_DECAY = 0.999
cartpole()

Run: 1, exploration: 1.5, score: 18
Scores: (min: 18, avg: 18, max: 18)

Run: 2, exploration: 1.482098670741313, score: 14
Scores: (min: 14, avg: 16, max: 18)

Run: 3, exploration: 1.4600221387043475, score: 16
Scores: (min: 14, avg: 16, max: 18)

Run: 4, exploration: 1.4111917889414172, score: 35
Scores: (min: 14, avg: 20.75, max: 35)

Run: 5, exploration: 1.3585467674380012, score: 39
Scores: (min: 14, avg: 24.4, max: 39)

Run: 6, exploration: 1.3210165657118556, score: 29
Scores: (min: 14, avg: 25.166666666666668, max: 39)

Run: 7, exploration: 1.3065578218677316, score: 12
Scores: (min: 12, avg: 23.285714285714285, max: 39)

Run: 8, exploration: 1.290965074024554, score: 13
Scores: (min: 12, avg: 22, max: 39)

Run: 9, exploration: 1.262860076651409, score: 23
Scores: (min: 12, avg: 22.11111111111111, max: 39)

Run: 10, exploration: 1.2415623485717155, score: 18
Scores: (min: 12, avg: 21.7, max: 39)

Run: 11, exploration: 1.2024420545907453, score: 33
Scores: (min: 12, avg: 22.72727

Run: 83, exploration: 0.18275887941697883, score: 97
Scores: (min: 9, avg: 26.566265060240966, max: 97)

Run: 84, exploration: 0.18075855338677094, score: 12
Scores: (min: 9, avg: 26.392857142857142, max: 97)

Run: 85, exploration: 0.1782443170521789, score: 15
Scores: (min: 9, avg: 26.258823529411764, max: 97)

Run: 86, exploration: 0.167021518132341, score: 66
Scores: (min: 9, avg: 26.72093023255814, max: 97)

Run: 87, exploration: 0.1650282466727592, score: 13
Scores: (min: 9, avg: 26.563218390804597, max: 97)

Run: 88, exploration: 0.16240750599940948, score: 17
Scores: (min: 9, avg: 26.454545454545453, max: 97)

Run: 89, exploration: 0.15919002894535694, score: 21
Scores: (min: 9, avg: 26.39325842696629, max: 97)

Run: 90, exploration: 0.1520296585498867, score: 47
Scores: (min: 9, avg: 26.622222222222224, max: 97)

Run: 91, exploration: 0.14665129293350307, score: 37
Scores: (min: 9, avg: 26.736263736263737, max: 97)

Run: 92, exploration: 0.14360221838192763, score: 22
Scores: (

In [None]:
EXPLORATION_MAX = 0.5
EXPLORATION_MIN = 0.001
EXPLORATION_DECAY = 0.99
cartpole()

The goal of the agent in the Cartpole Problem is to keep the inverted pendulum, the cartpole, balanced by applying appropriate forces to a pivot point (Surma, 2019). The values which contribute to the state of the cartpole are cart position, cart velocity, pole angle, and pole velocity. The possible actions that can be performed are moving the cart to the left, applying a force of -1, or moving the cart to the right, applying a force of +1. The reinforcement algorithm used in this instance is Deep Q-Learning (DQN).
In the Cartpole Problem, the agent is awarded a reward of +1 point for each timestep the pole remains upright. The run ends when the pole is more 15 degrees from vertical or the cart moves more than 2.4 units from the center (Surma, 2019). This reward structure allows for states to be captured in memory. This allows the agent to refer to this memory to choose the best possible move from previous experiences. Essentially, the reward structure incentivizes moves that keep the pole upright and allows the agent to remember moves and sequences of moves that generated the best score, therefore, keeping the pole upright the longest amount of time. In this instance the discount factor promotes the importance of future rewards. A low discount factor will make the agent short-sighted by only considering current rewards. A higher discount factor will cause the agent to pursue long-term rewards (Shaw et al., 2021).
The Cartpole Problem utilizes 3 dense layers. The first layer has 4 input receptors with 24 output transmitters. This layer uses the ReLU activation method. The first layer takes in the current values to determine the state. The next layer has 24 input receptors with 24 output transmitters. This layer uses the ReLU activation method. The second layer takes in data from the first layer and processes the data by referring to memory. The final layer has 24 input receptors with 2 output transmitters. This layer uses a linear activation method. The final layer processes memories and determines which move should be used, pushing the cart right or left. This structure allows for the state to be taken in as the input and Q-values to be generateed as output for all possible actions. This ends with the decision of which action to take. The generation of Q-values allows the Q algorithm to be more efficient. Lower learning rates take more time for completion but makes the agent more accurate. Higher learning rates can result in an agent that is very inaccurate.

Works Cited:
Surma, G. (2019, November 10). Cartpole - introduction to reinforcement learning (DQN - deep Q-learning). Medium. 
    https://gsurma.medium.com/cartpole-introduction-to-reinforcement-learning-ed0eb5b58288 
Shaw, R. N., Ghosh, A., Balas, V. E., &amp; Bianchini, M. (Eds.). (2021). Artificial Intelligence for Future Generation             Robotics. Elsevier. 

Note: If the code is running properly, you should begin to see output appearing above this code block. It will take several minutes, so it is recommended that you let this code run in the background while completing other work. When the code has finished, it will print output saying, "Solved in _ runs, _ total runs."

You may see an error about not having an exit command. This error does not affect the program's functionality and results from the steps taken to convert the code from Python 2.x to Python 3. Please disregard this error.