# Module Five Assignment: Cartpole Problem
Review the code in this notebook and in the score_logger.py file in the *scores* folder (directory). Once you have reviewed the code, return to this notebook and select **Cell** and then **Run All** from the menu bar to run this code. The code takes several minutes to run.

In [3]:
import random  
import gym  
import numpy as np  
from collections import deque  
from keras.models import Sequential  
from keras.layers import Dense  
from keras.optimizers import Adam  
  
  
from scores.score_logger import ScoreLogger  
  
ENV_NAME = "CartPole-v1"  
  
GAMMA = 0.95  
LEARNING_RATE = 0.001  
  
MEMORY_SIZE = 1000000  
BATCH_SIZE = 20  
  
EXPLORATION_MAX = 1.0  
EXPLORATION_MIN = 0.01  
EXPLORATION_DECAY = 0.995  
  
  
class DQNSolver:  
  
    def __init__(self, observation_space, action_space):  
        self.exploration_rate = EXPLORATION_MAX  
  
        self.action_space = action_space  
        self.memory = deque(maxlen=MEMORY_SIZE)  
  
        self.model = Sequential()  
        self.model.add(Dense(24, input_shape=(observation_space,), activation="relu"))  
        self.model.add(Dense(24, activation="relu"))  
        self.model.add(Dense(self.action_space, activation="linear"))  
        self.model.compile(loss="mse", optimizer=Adam(lr=LEARNING_RATE))  
  
    def remember(self, state, action, reward, next_state, done):  
        self.memory.append((state, action, reward, next_state, done))  
  
    def act(self, state):  
        if np.random.rand() < self.exploration_rate:  
            return random.randrange(self.action_space)  
        q_values = self.model.predict(state)  
        return np.argmax(q_values[0])  
  
    def experience_replay(self):  
        if len(self.memory) < BATCH_SIZE:  
            return  
        batch = random.sample(self.memory, BATCH_SIZE)  
        for state, action, reward, state_next, terminal in batch:  
            q_update = reward  
            if not terminal:  
                q_update = (reward + GAMMA * np.amax(self.model.predict(state_next)[0]))  
            q_values = self.model.predict(state)  
            q_values[0][action] = q_update  
            self.model.fit(state, q_values, verbose=0)  
        self.exploration_rate *= EXPLORATION_DECAY  
        self.exploration_rate = max(EXPLORATION_MIN, self.exploration_rate)  
  
  
def cartpole():  
    env = gym.make(ENV_NAME)  
    score_logger = ScoreLogger(ENV_NAME)  
    observation_space = env.observation_space.shape[0]  
    action_space = env.action_space.n  
    dqn_solver = DQNSolver(observation_space, action_space)  
    run = 0  
    while True:  
        run += 1  
        state = env.reset()  
        state = np.reshape(state, [1, observation_space])  
        step = 0  
        while True:  
            step += 1  
            #env.render()  
            action = dqn_solver.act(state)  
            state_next, reward, terminal, info = env.step(action)  
            reward = reward if not terminal else -reward  
            state_next = np.reshape(state_next, [1, observation_space])  
            dqn_solver.remember(state, action, reward, state_next, terminal)  
            state = state_next  
            if terminal:  
                print ("Run: " + str(run) + ", exploration: " + str(dqn_solver.exploration_rate) + ", score: " + str(step))  
                score_logger.add_score(step, run)  
                break  
            dqn_solver.experience_replay()  



In [4]:
cartpole()

Run: 1, exploration: 1.0, score: 17
Scores: (min: 17, avg: 17, max: 17)

Run: 2, exploration: 0.8734200960253871, score: 30
Scores: (min: 17, avg: 23.5, max: 30)

Run: 3, exploration: 0.7901049725470279, score: 21
Scores: (min: 17, avg: 22.666666666666668, max: 30)

Run: 4, exploration: 0.736559652908221, score: 15
Scores: (min: 15, avg: 20.75, max: 30)

Run: 5, exploration: 0.6935613678313175, score: 13
Scores: (min: 13, avg: 19.2, max: 30)

Run: 6, exploration: 0.6401093727576664, score: 17
Scores: (min: 13, avg: 18.833333333333332, max: 30)

Run: 7, exploration: 0.6057704364907278, score: 12
Scores: (min: 12, avg: 17.857142857142858, max: 30)

Run: 8, exploration: 0.5819594443402982, score: 9
Scores: (min: 9, avg: 16.75, max: 30)

Run: 9, exploration: 0.5452463540625918, score: 14
Scores: (min: 9, avg: 16.444444444444443, max: 30)

Run: 10, exploration: 0.5159963842937159, score: 12
Scores: (min: 9, avg: 16, max: 30)

Run: 11, exploration: 0.4907693883854626, score: 11
Scores: (min:

Run: 89, exploration: 0.01, score: 246
Scores: (min: 8, avg: 119.37078651685393, max: 380)

Run: 90, exploration: 0.01, score: 230
Scores: (min: 8, avg: 120.6, max: 380)

Run: 91, exploration: 0.01, score: 193
Scores: (min: 8, avg: 121.3956043956044, max: 380)

Run: 92, exploration: 0.01, score: 254
Scores: (min: 8, avg: 122.83695652173913, max: 380)

Run: 93, exploration: 0.01, score: 117
Scores: (min: 8, avg: 122.7741935483871, max: 380)

Run: 94, exploration: 0.01, score: 266
Scores: (min: 8, avg: 124.29787234042553, max: 380)

Run: 95, exploration: 0.01, score: 217
Scores: (min: 8, avg: 125.27368421052631, max: 380)

Run: 96, exploration: 0.01, score: 134
Scores: (min: 8, avg: 125.36458333333333, max: 380)

Run: 97, exploration: 0.01, score: 139
Scores: (min: 8, avg: 125.50515463917526, max: 380)

Run: 98, exploration: 0.01, score: 162
Scores: (min: 8, avg: 125.87755102040816, max: 380)

Run: 99, exploration: 0.01, score: 500
Scores: (min: 8, avg: 129.65656565656565, max: 500)

Run

NameError: name 'exit' is not defined

Note: If the code is running properly, you should begin to see output appearing above this code block. It will take several minutes, so it is recommended that you let this code run in the background while completing other work. When the code has finished, it will print output saying, "Solved in _ runs, _ total runs."

You may see an error about not having an exit command. This error does not affect the program's functionality and results from the steps taken to convert the code from Python 2.x to Python 3. Please disregard this error.

In [1]:
# This wll modify the Discoutn Factor, aka Gamma to see how the results change
import random  
import gym  
import numpy as np  
from collections import deque  
from keras.models import Sequential  
from keras.layers import Dense  
from keras.optimizers import Adam  
  
  
from scores.score_logger import ScoreLogger  
  
ENV_NAME = "CartPole-v1"  
## changed Gamma from .95 to .70 and resulted in 900+ runs with no solution 
##  now try .99
GAMMA = 0.99 
LEARNING_RATE = 0.001  
  
MEMORY_SIZE = 1000000  
BATCH_SIZE = 20  
  
EXPLORATION_MAX = 1.0  
EXPLORATION_MIN = 0.01  
EXPLORATION_DECAY = 0.995  
  
  
class DQNSolver:  
  
    def __init__(self, observation_space, action_space):  
        self.exploration_rate = EXPLORATION_MAX  
  
        self.action_space = action_space  
        self.memory = deque(maxlen=MEMORY_SIZE)  
  
        self.model = Sequential()  
        self.model.add(Dense(24, input_shape=(observation_space,), activation="relu"))  
        self.model.add(Dense(24, activation="relu"))  
        self.model.add(Dense(self.action_space, activation="linear"))  
        self.model.compile(loss="mse", optimizer=Adam(lr=LEARNING_RATE))  
  
    def remember(self, state, action, reward, next_state, done):  
        self.memory.append((state, action, reward, next_state, done))  
  
    def act(self, state):  
        if np.random.rand() < self.exploration_rate:  
            return random.randrange(self.action_space)  
        q_values = self.model.predict(state)  
        return np.argmax(q_values[0])  
  
    def experience_replay(self):  
        if len(self.memory) < BATCH_SIZE:  
            return  
        batch = random.sample(self.memory, BATCH_SIZE)  
        for state, action, reward, state_next, terminal in batch:  
            q_update = reward  
            if not terminal:  
                q_update = (reward + GAMMA * np.amax(self.model.predict(state_next)[0]))  
            q_values = self.model.predict(state)  
            q_values[0][action] = q_update  
            self.model.fit(state, q_values, verbose=0)  
        self.exploration_rate *= EXPLORATION_DECAY  
        self.exploration_rate = max(EXPLORATION_MIN, self.exploration_rate)  
  
  
def cartpole():  
    env = gym.make(ENV_NAME)  
    score_logger = ScoreLogger(ENV_NAME)  
    observation_space = env.observation_space.shape[0]  
    action_space = env.action_space.n  
    dqn_solver = DQNSolver(observation_space, action_space)  
    run = 0  
    while True:  
        run += 1  
        state = env.reset()  
        state = np.reshape(state, [1, observation_space])  
        step = 0  
        while True:  
            step += 1  
            #env.render()  
            action = dqn_solver.act(state)  
            state_next, reward, terminal, info = env.step(action)  
            reward = reward if not terminal else -reward  
            state_next = np.reshape(state_next, [1, observation_space])  
            dqn_solver.remember(state, action, reward, state_next, terminal)  
            state = state_next  
            if terminal:  
                print ("Run: " + str(run) + ", exploration: " + str(dqn_solver.exploration_rate) + ", score: " + str(step))  
                score_logger.add_score(step, run)  
                break  
            dqn_solver.experience_replay()  



Using TensorFlow backend.


In [2]:
cartpole()

Run: 1, exploration: 1.0, score: 16
Scores: (min: 16, avg: 16, max: 16)

Run: 2, exploration: 0.946354579813443, score: 15
Scores: (min: 15, avg: 15.5, max: 16)

Run: 3, exploration: 0.8647077305675338, score: 19
Scores: (min: 15, avg: 16.666666666666668, max: 19)

Run: 4, exploration: 0.8265651079747222, score: 10
Scores: (min: 10, avg: 15, max: 19)

Run: 5, exploration: 0.7705488893118823, score: 15
Scores: (min: 10, avg: 15, max: 19)

Run: 6, exploration: 0.7402609576967045, score: 9
Scores: (min: 9, avg: 14, max: 19)

Run: 7, exploration: 0.7005493475733617, score: 12
Scores: (min: 9, avg: 13.714285714285714, max: 19)

Run: 8, exploration: 0.6662995813682115, score: 11
Scores: (min: 9, avg: 13.375, max: 19)

Run: 9, exploration: 0.6305556603555866, score: 12
Scores: (min: 9, avg: 13.222222222222221, max: 19)

Run: 10, exploration: 0.5937455908197752, score: 13
Scores: (min: 9, avg: 13.2, max: 19)

Run: 11, exploration: 0.5647174463480732, score: 11
Scores: (min: 9, avg: 13, max: 19

Run: 93, exploration: 0.01, score: 10
Scores: (min: 8, avg: 74.34408602150538, max: 500)

Run: 94, exploration: 0.01, score: 8
Scores: (min: 8, avg: 73.63829787234043, max: 500)

Run: 95, exploration: 0.01, score: 9
Scores: (min: 8, avg: 72.9578947368421, max: 500)

Run: 96, exploration: 0.01, score: 9
Scores: (min: 8, avg: 72.29166666666667, max: 500)

Run: 97, exploration: 0.01, score: 10
Scores: (min: 8, avg: 71.64948453608247, max: 500)

Run: 98, exploration: 0.01, score: 10
Scores: (min: 8, avg: 71.0204081632653, max: 500)

Run: 99, exploration: 0.01, score: 12
Scores: (min: 8, avg: 70.42424242424242, max: 500)

Run: 100, exploration: 0.01, score: 9
Scores: (min: 8, avg: 69.81, max: 500)

Run: 101, exploration: 0.01, score: 10
Scores: (min: 8, avg: 69.75, max: 500)

Run: 102, exploration: 0.01, score: 9
Scores: (min: 8, avg: 69.69, max: 500)

Run: 103, exploration: 0.01, score: 8
Scores: (min: 8, avg: 69.58, max: 500)

Run: 104, exploration: 0.01, score: 9
Scores: (min: 8, avg: 69

Run: 197, exploration: 0.01, score: 281
Scores: (min: 8, avg: 44.46, max: 500)

Run: 198, exploration: 0.01, score: 500
Scores: (min: 8, avg: 49.36, max: 500)

Run: 199, exploration: 0.01, score: 500
Scores: (min: 8, avg: 54.24, max: 500)

Run: 200, exploration: 0.01, score: 500
Scores: (min: 8, avg: 59.15, max: 500)

Run: 201, exploration: 0.01, score: 500
Scores: (min: 8, avg: 64.05, max: 500)

Run: 202, exploration: 0.01, score: 500
Scores: (min: 8, avg: 68.96, max: 500)

Run: 203, exploration: 0.01, score: 500
Scores: (min: 8, avg: 73.88, max: 500)

Run: 204, exploration: 0.01, score: 500
Scores: (min: 8, avg: 78.79, max: 500)

Run: 205, exploration: 0.01, score: 500
Scores: (min: 8, avg: 83.71, max: 500)

Run: 206, exploration: 0.01, score: 500
Scores: (min: 8, avg: 88.62, max: 500)

Run: 207, exploration: 0.01, score: 500
Scores: (min: 8, avg: 93.52, max: 500)

Run: 208, exploration: 0.01, score: 500
Scores: (min: 8, avg: 98.43, max: 500)

Run: 209, exploration: 0.01, score: 500


NameError: name 'exit' is not defined

In [5]:
# This run will vary the learning rate
#
import random  
import gym  
import numpy as np  
from collections import deque  
from keras.models import Sequential  
from keras.layers import Dense  
from keras.optimizers import Adam  
  
  
from scores.score_logger import ScoreLogger  
  
ENV_NAME = "CartPole-v1"  
  
GAMMA = 0.95  
# changing the learning rate from .001 to .002
# doing a small increment since the last big increment change took too long
LEARNING_RATE = 0.002  
  
MEMORY_SIZE = 1000000  
BATCH_SIZE = 20  
  
EXPLORATION_MAX = 1.0  
EXPLORATION_MIN = 0.01  
EXPLORATION_DECAY = 0.995  
  
  
class DQNSolver:  
  
    def __init__(self, observation_space, action_space):  
        self.exploration_rate = EXPLORATION_MAX  
  
        self.action_space = action_space  
        self.memory = deque(maxlen=MEMORY_SIZE)  
  
        self.model = Sequential()  
        self.model.add(Dense(24, input_shape=(observation_space,), activation="relu"))  
        self.model.add(Dense(24, activation="relu"))  
        self.model.add(Dense(self.action_space, activation="linear"))  
        self.model.compile(loss="mse", optimizer=Adam(lr=LEARNING_RATE))  
  
    def remember(self, state, action, reward, next_state, done):  
        self.memory.append((state, action, reward, next_state, done))  
  
    def act(self, state):  
        if np.random.rand() < self.exploration_rate:  
            return random.randrange(self.action_space)  
        q_values = self.model.predict(state)  
        return np.argmax(q_values[0])  
  
    def experience_replay(self):  
        if len(self.memory) < BATCH_SIZE:  
            return  
        batch = random.sample(self.memory, BATCH_SIZE)  
        for state, action, reward, state_next, terminal in batch:  
            q_update = reward  
            if not terminal:  
                q_update = (reward + GAMMA * np.amax(self.model.predict(state_next)[0]))  
            q_values = self.model.predict(state)  
            q_values[0][action] = q_update  
            self.model.fit(state, q_values, verbose=0)  
        self.exploration_rate *= EXPLORATION_DECAY  
        self.exploration_rate = max(EXPLORATION_MIN, self.exploration_rate)  
  
  
def cartpole():  
    env = gym.make(ENV_NAME)  
    score_logger = ScoreLogger(ENV_NAME)  
    observation_space = env.observation_space.shape[0]  
    action_space = env.action_space.n  
    dqn_solver = DQNSolver(observation_space, action_space)  
    run = 0  
    while True:  
        run += 1  
        state = env.reset()  
        state = np.reshape(state, [1, observation_space])  
        step = 0  
        while True:  
            step += 1  
            #env.render()  
            action = dqn_solver.act(state)  
            state_next, reward, terminal, info = env.step(action)  
            reward = reward if not terminal else -reward  
            state_next = np.reshape(state_next, [1, observation_space])  
            dqn_solver.remember(state, action, reward, state_next, terminal)  
            state = state_next  
            if terminal:  
                print ("Run: " + str(run) + ", exploration: " + str(dqn_solver.exploration_rate) + ", score: " + str(step))  
                score_logger.add_score(step, run)  
                break  
            dqn_solver.experience_replay()  



In [6]:
cartpole()

Run: 1, exploration: 1.0, score: 16
Scores: (min: 16, avg: 16, max: 16)

Run: 2, exploration: 0.9416228069143757, score: 16
Scores: (min: 16, avg: 16, max: 16)

Run: 3, exploration: 0.9000874278732445, score: 10
Scores: (min: 10, avg: 14, max: 16)

Run: 4, exploration: 0.8603841919146962, score: 10
Scores: (min: 10, avg: 13, max: 16)

Run: 5, exploration: 0.8061065909263957, score: 14
Scores: (min: 10, avg: 13.2, max: 16)

Run: 6, exploration: 0.736559652908221, score: 19
Scores: (min: 10, avg: 14.166666666666666, max: 19)

Run: 7, exploration: 0.6149486215357263, score: 37
Scores: (min: 10, avg: 17.428571428571427, max: 37)

Run: 8, exploration: 0.567555222460375, score: 17
Scores: (min: 10, avg: 17.375, max: 37)

Run: 9, exploration: 0.5290920728090721, score: 15
Scores: (min: 10, avg: 17.11111111111111, max: 37)

Run: 10, exploration: 0.4529463432347434, score: 32
Scores: (min: 10, avg: 18.6, max: 37)

Run: 11, exploration: 0.31890579420988907, score: 71
Scores: (min: 10, avg: 23.36

Run: 91, exploration: 0.01, score: 108
Scores: (min: 10, avg: 103.71428571428571, max: 293)

Run: 92, exploration: 0.01, score: 166
Scores: (min: 10, avg: 104.3913043478261, max: 293)

Run: 93, exploration: 0.01, score: 139
Scores: (min: 10, avg: 104.76344086021506, max: 293)

Run: 94, exploration: 0.01, score: 164
Scores: (min: 10, avg: 105.3936170212766, max: 293)

Run: 95, exploration: 0.01, score: 176
Scores: (min: 10, avg: 106.13684210526316, max: 293)

Run: 96, exploration: 0.01, score: 114
Scores: (min: 10, avg: 106.21875, max: 293)

Run: 97, exploration: 0.01, score: 127
Scores: (min: 10, avg: 106.43298969072166, max: 293)

Run: 98, exploration: 0.01, score: 260
Scores: (min: 10, avg: 108, max: 293)

Run: 99, exploration: 0.01, score: 127
Scores: (min: 10, avg: 108.1919191919192, max: 293)

Run: 100, exploration: 0.01, score: 118
Scores: (min: 10, avg: 108.29, max: 293)

Run: 101, exploration: 0.01, score: 45
Scores: (min: 10, avg: 108.58, max: 293)

Run: 102, exploration: 0.01

Run: 191, exploration: 0.01, score: 131
Scores: (min: 11, avg: 136.83, max: 347)

Run: 192, exploration: 0.01, score: 143
Scores: (min: 11, avg: 136.6, max: 347)

Run: 193, exploration: 0.01, score: 202
Scores: (min: 11, avg: 137.23, max: 347)

Run: 194, exploration: 0.01, score: 99
Scores: (min: 11, avg: 136.58, max: 347)

Run: 195, exploration: 0.01, score: 89
Scores: (min: 11, avg: 135.71, max: 347)

Run: 196, exploration: 0.01, score: 147
Scores: (min: 11, avg: 136.04, max: 347)

Run: 197, exploration: 0.01, score: 217
Scores: (min: 11, avg: 136.94, max: 347)

Run: 198, exploration: 0.01, score: 162
Scores: (min: 11, avg: 135.96, max: 347)

Run: 199, exploration: 0.01, score: 217
Scores: (min: 11, avg: 136.86, max: 347)

Run: 200, exploration: 0.01, score: 109
Scores: (min: 11, avg: 136.77, max: 347)

Run: 201, exploration: 0.01, score: 127
Scores: (min: 11, avg: 137.59, max: 347)

Run: 202, exploration: 0.01, score: 103
Scores: (min: 11, avg: 136.08, max: 347)

Run: 203, explorati

Run: 294, exploration: 0.01, score: 10
Scores: (min: 8, avg: 60.09, max: 354)

Run: 295, exploration: 0.01, score: 10
Scores: (min: 8, avg: 59.3, max: 354)

Run: 296, exploration: 0.01, score: 10
Scores: (min: 8, avg: 57.93, max: 354)

Run: 297, exploration: 0.01, score: 10
Scores: (min: 8, avg: 55.86, max: 354)

Run: 298, exploration: 0.01, score: 9
Scores: (min: 8, avg: 54.33, max: 354)

Run: 299, exploration: 0.01, score: 10
Scores: (min: 8, avg: 52.26, max: 354)

Run: 300, exploration: 0.01, score: 9
Scores: (min: 8, avg: 51.26, max: 354)

Run: 301, exploration: 0.01, score: 10
Scores: (min: 8, avg: 50.09, max: 354)

Run: 302, exploration: 0.01, score: 10
Scores: (min: 8, avg: 49.16, max: 354)

Run: 303, exploration: 0.01, score: 9
Scores: (min: 8, avg: 48.68, max: 354)

Run: 304, exploration: 0.01, score: 10
Scores: (min: 8, avg: 48.62, max: 354)

Run: 305, exploration: 0.01, score: 10
Scores: (min: 8, avg: 48.03, max: 354)

Run: 306, exploration: 0.01, score: 10
Scores: (min: 8, 

Run: 399, exploration: 0.01, score: 9
Scores: (min: 8, avg: 22.14, max: 261)

Run: 400, exploration: 0.01, score: 9
Scores: (min: 8, avg: 22.14, max: 261)

Run: 401, exploration: 0.01, score: 11
Scores: (min: 8, avg: 22.15, max: 261)

Run: 402, exploration: 0.01, score: 10
Scores: (min: 8, avg: 22.15, max: 261)

Run: 403, exploration: 0.01, score: 10
Scores: (min: 8, avg: 22.16, max: 261)

Run: 404, exploration: 0.01, score: 8
Scores: (min: 8, avg: 22.14, max: 261)

Run: 405, exploration: 0.01, score: 10
Scores: (min: 8, avg: 22.14, max: 261)

Run: 406, exploration: 0.01, score: 9
Scores: (min: 8, avg: 22.13, max: 261)

Run: 407, exploration: 0.01, score: 9
Scores: (min: 8, avg: 22.12, max: 261)

Run: 408, exploration: 0.01, score: 10
Scores: (min: 8, avg: 22.12, max: 261)

Run: 409, exploration: 0.01, score: 9
Scores: (min: 8, avg: 22.12, max: 261)

Run: 410, exploration: 0.01, score: 9
Scores: (min: 8, avg: 22.1, max: 261)

Run: 411, exploration: 0.01, score: 9
Scores: (min: 8, avg: 

Run: 504, exploration: 0.01, score: 22
Scores: (min: 8, avg: 14.31, max: 142)

Run: 505, exploration: 0.01, score: 8
Scores: (min: 8, avg: 14.29, max: 142)

Run: 506, exploration: 0.01, score: 10
Scores: (min: 8, avg: 14.3, max: 142)

Run: 507, exploration: 0.01, score: 9
Scores: (min: 8, avg: 14.3, max: 142)

Run: 508, exploration: 0.01, score: 14
Scores: (min: 8, avg: 14.34, max: 142)

Run: 509, exploration: 0.01, score: 15
Scores: (min: 8, avg: 14.4, max: 142)

Run: 510, exploration: 0.01, score: 10
Scores: (min: 8, avg: 14.41, max: 142)

Run: 511, exploration: 0.01, score: 9
Scores: (min: 8, avg: 14.41, max: 142)

Run: 512, exploration: 0.01, score: 26
Scores: (min: 8, avg: 14.56, max: 142)

Run: 513, exploration: 0.01, score: 15
Scores: (min: 8, avg: 14.61, max: 142)

Run: 514, exploration: 0.01, score: 21
Scores: (min: 8, avg: 14.72, max: 142)

Run: 515, exploration: 0.01, score: 15
Scores: (min: 8, avg: 14.78, max: 142)

Run: 516, exploration: 0.01, score: 11
Scores: (min: 8, av

Run: 610, exploration: 0.01, score: 63
Scores: (min: 8, avg: 28.05, max: 117)

Run: 611, exploration: 0.01, score: 66
Scores: (min: 8, avg: 28.62, max: 117)

Run: 612, exploration: 0.01, score: 79
Scores: (min: 8, avg: 29.15, max: 117)

Run: 613, exploration: 0.01, score: 20
Scores: (min: 8, avg: 29.2, max: 117)

Run: 614, exploration: 0.01, score: 12
Scores: (min: 8, avg: 29.11, max: 117)

Run: 615, exploration: 0.01, score: 74
Scores: (min: 8, avg: 29.7, max: 117)

Run: 616, exploration: 0.01, score: 117
Scores: (min: 8, avg: 30.76, max: 117)

Run: 617, exploration: 0.01, score: 82
Scores: (min: 8, avg: 31.47, max: 117)

Run: 618, exploration: 0.01, score: 36
Scores: (min: 8, avg: 31.73, max: 117)

Run: 619, exploration: 0.01, score: 11
Scores: (min: 8, avg: 31.73, max: 117)

Run: 620, exploration: 0.01, score: 51
Scores: (min: 8, avg: 31.98, max: 117)

Run: 621, exploration: 0.01, score: 20
Scores: (min: 8, avg: 32.09, max: 117)

Run: 622, exploration: 0.01, score: 10
Scores: (min: 

Run: 714, exploration: 0.01, score: 262
Scores: (min: 9, avg: 91.65, max: 500)

Run: 715, exploration: 0.01, score: 500
Scores: (min: 9, avg: 95.91, max: 500)

Run: 716, exploration: 0.01, score: 229
Scores: (min: 9, avg: 97.03, max: 500)

Run: 717, exploration: 0.01, score: 302
Scores: (min: 9, avg: 99.23, max: 500)

Run: 718, exploration: 0.01, score: 276
Scores: (min: 9, avg: 101.63, max: 500)

Run: 719, exploration: 0.01, score: 315
Scores: (min: 9, avg: 104.67, max: 500)

Run: 720, exploration: 0.01, score: 344
Scores: (min: 9, avg: 107.6, max: 500)

Run: 721, exploration: 0.01, score: 317
Scores: (min: 9, avg: 110.57, max: 500)

Run: 722, exploration: 0.01, score: 354
Scores: (min: 9, avg: 114.01, max: 500)

Run: 723, exploration: 0.01, score: 500
Scores: (min: 9, avg: 118.34, max: 500)

Run: 724, exploration: 0.01, score: 328
Scores: (min: 9, avg: 120.87, max: 500)

Run: 725, exploration: 0.01, score: 471
Scores: (min: 9, avg: 124.35, max: 500)

Run: 726, exploration: 0.01, scor

NameError: name 'exit' is not defined

In [7]:
# This run I will change exploration
#
import random  
import gym  
import numpy as np  
from collections import deque  
from keras.models import Sequential  
from keras.layers import Dense  
from keras.optimizers import Adam  
  
  
from scores.score_logger import ScoreLogger  
  
ENV_NAME = "CartPole-v1"  
  
GAMMA = 0.95  
LEARNING_RATE = 0.001  
  
MEMORY_SIZE = 1000000  
BATCH_SIZE = 20  
## change expploration factor  
EXPLORATION_MAX = 1.0  
EXPLORATION_MIN = 0.05  
EXPLORATION_DECAY = 0.995  
  
  
class DQNSolver:  
  
    def __init__(self, observation_space, action_space):  
        self.exploration_rate = EXPLORATION_MAX  
  
        self.action_space = action_space  
        self.memory = deque(maxlen=MEMORY_SIZE)  
  
        self.model = Sequential()  
        self.model.add(Dense(24, input_shape=(observation_space,), activation="relu"))  
        self.model.add(Dense(24, activation="relu"))  
        self.model.add(Dense(self.action_space, activation="linear"))  
        self.model.compile(loss="mse", optimizer=Adam(lr=LEARNING_RATE))  
  
    def remember(self, state, action, reward, next_state, done):  
        self.memory.append((state, action, reward, next_state, done))  
  
    def act(self, state):  
        if np.random.rand() < self.exploration_rate:  
            return random.randrange(self.action_space)  
        q_values = self.model.predict(state)  
        return np.argmax(q_values[0])  
  
    def experience_replay(self):  
        if len(self.memory) < BATCH_SIZE:  
            return  
        batch = random.sample(self.memory, BATCH_SIZE)  
        for state, action, reward, state_next, terminal in batch:  
            q_update = reward  
            if not terminal:  
                q_update = (reward + GAMMA * np.amax(self.model.predict(state_next)[0]))  
            q_values = self.model.predict(state)  
            q_values[0][action] = q_update  
            self.model.fit(state, q_values, verbose=0)  
        self.exploration_rate *= EXPLORATION_DECAY  
        self.exploration_rate = max(EXPLORATION_MIN, self.exploration_rate)  
  
  
def cartpole():  
    env = gym.make(ENV_NAME)  
    score_logger = ScoreLogger(ENV_NAME)  
    observation_space = env.observation_space.shape[0]  
    action_space = env.action_space.n  
    dqn_solver = DQNSolver(observation_space, action_space)  
    run = 0  
    while True:  
        run += 1  
        state = env.reset()  
        state = np.reshape(state, [1, observation_space])  
        step = 0  
        while True:  
            step += 1  
            #env.render()  
            action = dqn_solver.act(state)  
            state_next, reward, terminal, info = env.step(action)  
            reward = reward if not terminal else -reward  
            state_next = np.reshape(state_next, [1, observation_space])  
            dqn_solver.remember(state, action, reward, state_next, terminal)  
            state = state_next  
            if terminal:  
                print ("Run: " + str(run) + ", exploration: " + str(dqn_solver.exploration_rate) + ", score: " + str(step))  
                score_logger.add_score(step, run)  
                break  
            dqn_solver.experience_replay()  



In [8]:
cartpole()

Run: 1, exploration: 0.9000874278732445, score: 41
Scores: (min: 41, avg: 41, max: 41)

Run: 2, exploration: 0.8020760579717637, score: 24
Scores: (min: 24, avg: 32.5, max: 41)

Run: 3, exploration: 0.7439808620067382, score: 16
Scores: (min: 16, avg: 27, max: 41)

Run: 4, exploration: 0.6832098777212641, score: 18
Scores: (min: 16, avg: 24.75, max: 41)

Run: 5, exploration: 0.6465587967553006, score: 12
Scores: (min: 12, avg: 22.2, max: 41)

Run: 6, exploration: 0.6118738784280476, score: 12
Scores: (min: 12, avg: 20.5, max: 41)

Run: 7, exploration: 0.547986285490042, score: 23
Scores: (min: 12, avg: 20.857142857142858, max: 41)

Run: 8, exploration: 0.5185893309484582, score: 12
Scores: (min: 12, avg: 19.75, max: 41)

Run: 9, exploration: 0.4858739637363176, score: 14
Scores: (min: 12, avg: 19.11111111111111, max: 41)

Run: 10, exploration: 0.4222502236424958, score: 29
Scores: (min: 12, avg: 20.1, max: 41)

Run: 11, exploration: 0.3995984329713264, score: 12
Scores: (min: 12, avg: 

Run: 88, exploration: 0.05, score: 25
Scores: (min: 8, avg: 18.988636363636363, max: 150)

Run: 89, exploration: 0.05, score: 29
Scores: (min: 8, avg: 19.10112359550562, max: 150)

Run: 90, exploration: 0.05, score: 50
Scores: (min: 8, avg: 19.444444444444443, max: 150)

Run: 91, exploration: 0.05, score: 47
Scores: (min: 8, avg: 19.747252747252748, max: 150)

Run: 92, exploration: 0.05, score: 41
Scores: (min: 8, avg: 19.97826086956522, max: 150)

Run: 93, exploration: 0.05, score: 125
Scores: (min: 8, avg: 21.107526881720432, max: 150)

Run: 94, exploration: 0.05, score: 176
Scores: (min: 8, avg: 22.75531914893617, max: 176)

Run: 95, exploration: 0.05, score: 95
Scores: (min: 8, avg: 23.51578947368421, max: 176)

Run: 96, exploration: 0.05, score: 188
Scores: (min: 8, avg: 25.229166666666668, max: 188)

Run: 97, exploration: 0.05, score: 138
Scores: (min: 8, avg: 26.391752577319586, max: 188)

Run: 98, exploration: 0.05, score: 117
Scores: (min: 8, avg: 27.316326530612244, max: 188)

Run: 189, exploration: 0.05, score: 267
Scores: (min: 35, avg: 160.41, max: 349)

Run: 190, exploration: 0.05, score: 75
Scores: (min: 35, avg: 160.66, max: 349)

Run: 191, exploration: 0.05, score: 146
Scores: (min: 35, avg: 161.65, max: 349)

Run: 192, exploration: 0.05, score: 293
Scores: (min: 35, avg: 164.17, max: 349)

Run: 193, exploration: 0.05, score: 228
Scores: (min: 35, avg: 165.2, max: 349)

Run: 194, exploration: 0.05, score: 141
Scores: (min: 35, avg: 164.85, max: 349)

Run: 195, exploration: 0.05, score: 108
Scores: (min: 35, avg: 164.98, max: 349)

Run: 196, exploration: 0.05, score: 119
Scores: (min: 35, avg: 164.29, max: 349)

Run: 197, exploration: 0.05, score: 193
Scores: (min: 35, avg: 164.84, max: 349)

Run: 198, exploration: 0.05, score: 194
Scores: (min: 35, avg: 165.61, max: 349)

Run: 199, exploration: 0.05, score: 144
Scores: (min: 35, avg: 164.93, max: 349)

Run: 200, exploration: 0.05, score: 101
Scores: (min: 35, avg: 164.36, max: 349)

Run: 201, explorat

Run: 290, exploration: 0.05, score: 120
Scores: (min: 10, avg: 146.35, max: 299)

Run: 291, exploration: 0.05, score: 500
Scores: (min: 10, avg: 149.89, max: 500)

Run: 292, exploration: 0.05, score: 102
Scores: (min: 10, avg: 147.98, max: 500)

Run: 293, exploration: 0.05, score: 142
Scores: (min: 10, avg: 147.12, max: 500)

Run: 294, exploration: 0.05, score: 300
Scores: (min: 10, avg: 148.71, max: 500)

Run: 295, exploration: 0.05, score: 90
Scores: (min: 10, avg: 148.53, max: 500)

Run: 296, exploration: 0.05, score: 109
Scores: (min: 10, avg: 148.43, max: 500)

Run: 297, exploration: 0.05, score: 78
Scores: (min: 10, avg: 147.28, max: 500)

Run: 298, exploration: 0.05, score: 103
Scores: (min: 10, avg: 146.37, max: 500)

Run: 299, exploration: 0.05, score: 156
Scores: (min: 10, avg: 146.49, max: 500)

Run: 300, exploration: 0.05, score: 164
Scores: (min: 10, avg: 147.12, max: 500)

Run: 301, exploration: 0.05, score: 168
Scores: (min: 10, avg: 147.78, max: 500)

Run: 302, explorat

Run: 391, exploration: 0.05, score: 132
Scores: (min: 11, avg: 145.79, max: 468)

Run: 392, exploration: 0.05, score: 89
Scores: (min: 11, avg: 145.66, max: 468)

Run: 393, exploration: 0.05, score: 185
Scores: (min: 11, avg: 146.09, max: 468)

Run: 394, exploration: 0.05, score: 29
Scores: (min: 11, avg: 143.38, max: 468)

Run: 395, exploration: 0.05, score: 148
Scores: (min: 11, avg: 143.96, max: 468)

Run: 396, exploration: 0.05, score: 253
Scores: (min: 11, avg: 145.4, max: 468)

Run: 397, exploration: 0.05, score: 170
Scores: (min: 11, avg: 146.32, max: 468)

Run: 398, exploration: 0.05, score: 143
Scores: (min: 11, avg: 146.72, max: 468)

Run: 399, exploration: 0.05, score: 85
Scores: (min: 11, avg: 146.01, max: 468)

Run: 400, exploration: 0.05, score: 180
Scores: (min: 11, avg: 146.17, max: 468)

Run: 401, exploration: 0.05, score: 115
Scores: (min: 11, avg: 145.64, max: 468)

Run: 402, exploration: 0.05, score: 78
Scores: (min: 11, avg: 143.28, max: 468)

Run: 403, exploration

Run: 492, exploration: 0.05, score: 287
Scores: (min: 11, avg: 164.7, max: 500)

Run: 493, exploration: 0.05, score: 500
Scores: (min: 11, avg: 167.85, max: 500)

Run: 494, exploration: 0.05, score: 95
Scores: (min: 11, avg: 168.51, max: 500)

Run: 495, exploration: 0.05, score: 107
Scores: (min: 11, avg: 168.1, max: 500)

Run: 496, exploration: 0.05, score: 113
Scores: (min: 11, avg: 166.7, max: 500)

Run: 497, exploration: 0.05, score: 112
Scores: (min: 11, avg: 166.12, max: 500)

Run: 498, exploration: 0.05, score: 239
Scores: (min: 11, avg: 167.08, max: 500)

Run: 499, exploration: 0.05, score: 40
Scores: (min: 11, avg: 166.63, max: 500)

Run: 500, exploration: 0.05, score: 161
Scores: (min: 11, avg: 166.44, max: 500)

Run: 501, exploration: 0.05, score: 108
Scores: (min: 11, avg: 166.37, max: 500)

Run: 502, exploration: 0.05, score: 277
Scores: (min: 11, avg: 168.36, max: 500)

Run: 503, exploration: 0.05, score: 77
Scores: (min: 11, avg: 167.04, max: 500)

Run: 504, exploration:

Run: 593, exploration: 0.05, score: 68
Scores: (min: 9, avg: 111.63, max: 500)

Run: 594, exploration: 0.05, score: 171
Scores: (min: 9, avg: 112.39, max: 500)

Run: 595, exploration: 0.05, score: 209
Scores: (min: 9, avg: 113.41, max: 500)

Run: 596, exploration: 0.05, score: 153
Scores: (min: 9, avg: 113.81, max: 500)

Run: 597, exploration: 0.05, score: 155
Scores: (min: 9, avg: 114.24, max: 500)

Run: 598, exploration: 0.05, score: 330
Scores: (min: 9, avg: 115.15, max: 500)

Run: 599, exploration: 0.05, score: 189
Scores: (min: 9, avg: 116.64, max: 500)

Run: 600, exploration: 0.05, score: 189
Scores: (min: 9, avg: 116.92, max: 500)

Run: 601, exploration: 0.05, score: 117
Scores: (min: 9, avg: 117.01, max: 500)

Run: 602, exploration: 0.05, score: 258
Scores: (min: 9, avg: 116.82, max: 500)

Run: 603, exploration: 0.05, score: 95
Scores: (min: 9, avg: 117, max: 500)

Run: 604, exploration: 0.05, score: 21
Scores: (min: 9, avg: 116.03, max: 500)

Run: 605, exploration: 0.05, score

Run: 696, exploration: 0.05, score: 228
Scores: (min: 9, avg: 135.59, max: 500)

Run: 697, exploration: 0.05, score: 138
Scores: (min: 9, avg: 135.42, max: 500)

Run: 698, exploration: 0.05, score: 136
Scores: (min: 9, avg: 133.48, max: 500)

Run: 699, exploration: 0.05, score: 134
Scores: (min: 9, avg: 132.93, max: 500)

Run: 700, exploration: 0.05, score: 486
Scores: (min: 9, avg: 135.9, max: 500)

Run: 701, exploration: 0.05, score: 151
Scores: (min: 9, avg: 136.24, max: 500)

Run: 702, exploration: 0.05, score: 149
Scores: (min: 9, avg: 135.15, max: 500)

Run: 703, exploration: 0.05, score: 121
Scores: (min: 9, avg: 135.41, max: 500)

Run: 704, exploration: 0.05, score: 185
Scores: (min: 9, avg: 137.05, max: 500)

Run: 705, exploration: 0.05, score: 139
Scores: (min: 9, avg: 138.21, max: 500)

Run: 706, exploration: 0.05, score: 156
Scores: (min: 9, avg: 139.63, max: 500)

Run: 707, exploration: 0.05, score: 133
Scores: (min: 9, avg: 140.85, max: 500)

Run: 708, exploration: 0.05, 

Run: 798, exploration: 0.05, score: 143
Scores: (min: 12, avg: 103.45, max: 486)

Run: 799, exploration: 0.05, score: 120
Scores: (min: 12, avg: 103.31, max: 486)

Run: 800, exploration: 0.05, score: 77
Scores: (min: 12, avg: 99.22, max: 420)

Run: 801, exploration: 0.05, score: 194
Scores: (min: 12, avg: 99.65, max: 420)

Run: 802, exploration: 0.05, score: 118
Scores: (min: 12, avg: 99.34, max: 420)

Run: 803, exploration: 0.05, score: 180
Scores: (min: 12, avg: 99.93, max: 420)

Run: 804, exploration: 0.05, score: 92
Scores: (min: 12, avg: 99, max: 420)

Run: 805, exploration: 0.05, score: 110
Scores: (min: 12, avg: 98.71, max: 420)

Run: 806, exploration: 0.05, score: 75
Scores: (min: 12, avg: 97.9, max: 420)

Run: 807, exploration: 0.05, score: 35
Scores: (min: 12, avg: 96.92, max: 420)

Run: 808, exploration: 0.05, score: 132
Scores: (min: 12, avg: 97.76, max: 420)

Run: 809, exploration: 0.05, score: 124
Scores: (min: 12, avg: 98.5, max: 420)

Run: 810, exploration: 0.05, score:

Run: 899, exploration: 0.05, score: 106
Scores: (min: 12, avg: 118.94, max: 452)

Run: 900, exploration: 0.05, score: 246
Scores: (min: 12, avg: 120.63, max: 452)

Run: 901, exploration: 0.05, score: 120
Scores: (min: 12, avg: 119.89, max: 452)

Run: 902, exploration: 0.05, score: 136
Scores: (min: 12, avg: 120.07, max: 452)

Run: 903, exploration: 0.05, score: 72
Scores: (min: 12, avg: 118.99, max: 452)

Run: 904, exploration: 0.05, score: 146
Scores: (min: 12, avg: 119.53, max: 452)

Run: 905, exploration: 0.05, score: 13
Scores: (min: 12, avg: 118.56, max: 452)

Run: 906, exploration: 0.05, score: 148
Scores: (min: 12, avg: 119.29, max: 452)

Run: 907, exploration: 0.05, score: 34
Scores: (min: 12, avg: 119.28, max: 452)

Run: 908, exploration: 0.05, score: 191
Scores: (min: 12, avg: 119.87, max: 452)

Run: 909, exploration: 0.05, score: 88
Scores: (min: 12, avg: 119.51, max: 452)

Run: 910, exploration: 0.05, score: 163
Scores: (min: 12, avg: 120.19, max: 452)

Run: 911, exploratio

Run: 1000, exploration: 0.05, score: 35
Scores: (min: 9, avg: 145.28, max: 500)

Run: 1001, exploration: 0.05, score: 9
Scores: (min: 9, avg: 144.17, max: 500)

Run: 1002, exploration: 0.05, score: 240
Scores: (min: 9, avg: 145.21, max: 500)

Run: 1003, exploration: 0.05, score: 94
Scores: (min: 9, avg: 145.43, max: 500)

Run: 1004, exploration: 0.05, score: 47
Scores: (min: 9, avg: 144.44, max: 500)

Run: 1005, exploration: 0.05, score: 112
Scores: (min: 9, avg: 145.43, max: 500)

Run: 1006, exploration: 0.05, score: 90
Scores: (min: 9, avg: 144.85, max: 500)

Run: 1007, exploration: 0.05, score: 56
Scores: (min: 9, avg: 145.07, max: 500)

Run: 1008, exploration: 0.05, score: 67
Scores: (min: 9, avg: 143.83, max: 500)

Run: 1009, exploration: 0.05, score: 182
Scores: (min: 9, avg: 144.77, max: 500)

Run: 1010, exploration: 0.05, score: 101
Scores: (min: 9, avg: 144.15, max: 500)

Run: 1011, exploration: 0.05, score: 10
Scores: (min: 9, avg: 143.18, max: 500)

Run: 1012, exploration: 0

Run: 1103, exploration: 0.05, score: 9
Scores: (min: 8, avg: 19.98, max: 182)

Run: 1104, exploration: 0.05, score: 10
Scores: (min: 8, avg: 19.61, max: 182)

Run: 1105, exploration: 0.05, score: 10
Scores: (min: 8, avg: 18.59, max: 182)

Run: 1106, exploration: 0.05, score: 9
Scores: (min: 8, avg: 17.78, max: 182)

Run: 1107, exploration: 0.05, score: 11
Scores: (min: 8, avg: 17.33, max: 182)

Run: 1108, exploration: 0.05, score: 9
Scores: (min: 8, avg: 16.75, max: 182)

Run: 1109, exploration: 0.05, score: 12
Scores: (min: 8, avg: 15.05, max: 101)

Run: 1110, exploration: 0.05, score: 8
Scores: (min: 8, avg: 14.12, max: 97)

Run: 1111, exploration: 0.05, score: 12
Scores: (min: 8, avg: 14.14, max: 97)

Run: 1112, exploration: 0.05, score: 9
Scores: (min: 8, avg: 14.13, max: 97)

Run: 1113, exploration: 0.05, score: 11
Scores: (min: 8, avg: 14.14, max: 97)

Run: 1114, exploration: 0.05, score: 9
Scores: (min: 8, avg: 14.12, max: 97)

Run: 1115, exploration: 0.05, score: 10
Scores: (mi

Run: 1208, exploration: 0.05, score: 36
Scores: (min: 8, avg: 15.16, max: 179)

Run: 1209, exploration: 0.05, score: 24
Scores: (min: 8, avg: 15.28, max: 179)

Run: 1210, exploration: 0.05, score: 33
Scores: (min: 8, avg: 15.53, max: 179)

Run: 1211, exploration: 0.05, score: 22
Scores: (min: 8, avg: 15.63, max: 179)

Run: 1212, exploration: 0.05, score: 16
Scores: (min: 8, avg: 15.7, max: 179)

Run: 1213, exploration: 0.05, score: 15
Scores: (min: 8, avg: 15.74, max: 179)

Run: 1214, exploration: 0.05, score: 16
Scores: (min: 8, avg: 15.81, max: 179)

Run: 1215, exploration: 0.05, score: 25
Scores: (min: 8, avg: 15.96, max: 179)

Run: 1216, exploration: 0.05, score: 20
Scores: (min: 8, avg: 16.07, max: 179)

Run: 1217, exploration: 0.05, score: 28
Scores: (min: 8, avg: 16.25, max: 179)

Run: 1218, exploration: 0.05, score: 25
Scores: (min: 8, avg: 16.42, max: 179)

Run: 1219, exploration: 0.05, score: 30
Scores: (min: 8, avg: 16.63, max: 179)

Run: 1220, exploration: 0.05, score: 25
S

Run: 1311, exploration: 0.05, score: 18
Scores: (min: 8, avg: 24.99, max: 151)

Run: 1312, exploration: 0.05, score: 18
Scores: (min: 8, avg: 25.01, max: 151)

Run: 1313, exploration: 0.05, score: 18
Scores: (min: 8, avg: 25.04, max: 151)

Run: 1314, exploration: 0.05, score: 19
Scores: (min: 8, avg: 25.07, max: 151)

Run: 1315, exploration: 0.05, score: 13
Scores: (min: 8, avg: 24.95, max: 151)

Run: 1316, exploration: 0.05, score: 31
Scores: (min: 8, avg: 25.06, max: 151)

Run: 1317, exploration: 0.05, score: 206
Scores: (min: 8, avg: 26.84, max: 206)

Run: 1318, exploration: 0.05, score: 27
Scores: (min: 8, avg: 26.86, max: 206)

Run: 1319, exploration: 0.05, score: 21
Scores: (min: 8, avg: 26.77, max: 206)

Run: 1320, exploration: 0.05, score: 17
Scores: (min: 8, avg: 26.69, max: 206)

Run: 1321, exploration: 0.05, score: 16
Scores: (min: 8, avg: 26.64, max: 206)

Run: 1322, exploration: 0.05, score: 10
Scores: (min: 8, avg: 26.55, max: 206)

Run: 1323, exploration: 0.05, score: 25

Run: 1413, exploration: 0.05, score: 173
Scores: (min: 10, avg: 122.14, max: 500)

Run: 1414, exploration: 0.05, score: 131
Scores: (min: 10, avg: 123.26, max: 500)

Run: 1415, exploration: 0.05, score: 209
Scores: (min: 10, avg: 125.22, max: 500)

Run: 1416, exploration: 0.05, score: 217
Scores: (min: 10, avg: 127.08, max: 500)

Run: 1417, exploration: 0.05, score: 309
Scores: (min: 10, avg: 128.11, max: 500)

Run: 1418, exploration: 0.05, score: 151
Scores: (min: 10, avg: 129.35, max: 500)

Run: 1419, exploration: 0.05, score: 208
Scores: (min: 10, avg: 131.22, max: 500)

Run: 1420, exploration: 0.05, score: 134
Scores: (min: 10, avg: 132.39, max: 500)

Run: 1421, exploration: 0.05, score: 113
Scores: (min: 10, avg: 133.36, max: 500)

Run: 1422, exploration: 0.05, score: 415
Scores: (min: 11, avg: 137.41, max: 500)

Run: 1423, exploration: 0.05, score: 499
Scores: (min: 11, avg: 142.15, max: 500)

Run: 1424, exploration: 0.05, score: 153
Scores: (min: 11, avg: 143.53, max: 500)

Run:

NameError: name 'exit' is not defined

The cartpole problem is a pole attached to a cart so it can freely pivot, with the goal of the agent moving the cart left or right to swing the pole up straight (at least in robotic versions) and keep it up straight by keeping it moving left or right. The states are the movement (velocity) of the cart (left or right), the angle of the pole, the angular velocity of the pole, and the position of the cart.

In this case it uses a Deep Q Neural Network algorithm (DQN). The replay function is important because it needs to remember past tries and not get stuck in a rut, so to speak, so it samples from them randomly

This implementation uses a fairly simple neural network, with a input layer size of 24, only a single hidden layer also of 24, and then an output layer. As mentioned, the NN has the advantage of allowing remembering past tries.

Increasing the learning rate (doubling in my experience, from .001 to .002) drastically increased how many runs needed to solve the problem. I believe this is because it focused too much on learning instead of remembering past trials.

PilcoLearner. (2011, May 26). Cart-Pole swing-up [Video]. YouTube. https://www.youtube.com/watch?v=XiigTGKZfks

PyLessons. (n.d.). Introduction to Reinforcement Learning
https://pylessons.com/CartPole-reinforcement-learning