# Module Five Assignment: Cartpole Problem
Review the code in this notebook and in the score_logger.py file in the *scores* folder (directory). Once you have reviewed the code, return to this notebook and select **Cell** and then **Run All** from the menu bar to run this code. The code takes several minutes to run.

In [3]:
import random  
import gym  
import numpy as np  
from collections import deque  
from keras.models import Sequential  
from keras.layers import Dense  
from keras.optimizers import Adam  
  
  
from scores.score_logger import ScoreLogger  
  
ENV_NAME = "CartPole-v1"  
  
#GAMMA = 0.95
GAMMA = 0.96
LEARNING_RATE = 0.001  
  
#MEMORY_SIZE = 1000000
MEMORY_SIZE = 1000000
BATCH_SIZE = 20  
  
#EXPLORATION_MAX = 1.0  
#EXPLORATION_MIN = 0.01  
#EXPLORATION_DECAY = 0.995 

EXPLORATION_MAX = 1.0  
EXPLORATION_MIN = 0.02  
EXPLORATION_DECAY = 0.995 
  
  
class DQNSolver:  
  
    def __init__(self, observation_space, action_space):  
        self.exploration_rate = EXPLORATION_MAX  
  
        self.action_space = action_space  
        self.memory = deque(maxlen=MEMORY_SIZE)  
  
        self.model = Sequential()  
        self.model.add(Dense(24, input_shape=(observation_space,), activation="relu"))  
        self.model.add(Dense(24, activation="relu"))  
        self.model.add(Dense(self.action_space, activation="linear"))  
        self.model.compile(loss="mse", optimizer=Adam(lr=LEARNING_RATE))  
  
    def remember(self, state, action, reward, next_state, done):  
        self.memory.append((state, action, reward, next_state, done))  
  
    def act(self, state):  
        if np.random.rand() < self.exploration_rate:  
            return random.randrange(self.action_space)  
        q_values = self.model.predict(state)  
        return np.argmax(q_values[0])  
  
    def experience_replay(self):  
        if len(self.memory) < BATCH_SIZE:  
            return  
        batch = random.sample(self.memory, BATCH_SIZE)  
        for state, action, reward, state_next, terminal in batch:  
            q_update = reward  
            if not terminal:  
                q_update = (reward + GAMMA * np.amax(self.model.predict(state_next)[0]))  
            q_values = self.model.predict(state)  
            q_values[0][action] = q_update  
            self.model.fit(state, q_values, verbose=0)  
        self.exploration_rate *= EXPLORATION_DECAY  
        self.exploration_rate = max(EXPLORATION_MIN, self.exploration_rate)  
  
  
def cartpole():  
    env = gym.make(ENV_NAME)  
    score_logger = ScoreLogger(ENV_NAME)  
    observation_space = env.observation_space.shape[0]  
    action_space = env.action_space.n  
    dqn_solver = DQNSolver(observation_space, action_space)  
    run = 0  
    while True:  
        run += 1  
        state = env.reset()  
        state = np.reshape(state, [1, observation_space])  
        step = 0  
        while True:  
            step += 1  
            #env.render()  
            action = dqn_solver.act(state)  
            state_next, reward, terminal, info = env.step(action)  
            reward = reward if not terminal else -reward  
            state_next = np.reshape(state_next, [1, observation_space])  
            dqn_solver.remember(state, action, reward, state_next, terminal)  
            state = state_next  
            if terminal:  
                print ("Run: " + str(run) + ", exploration: " + str(dqn_solver.exploration_rate) + ", score: " + str(step))  
                score_logger.add_score(step, run)  
                break  
            dqn_solver.experience_replay()  



In [6]:
According to the article, "Cartpole: Introduction", the Cartpole is like an inverted penduleum that is kept centered by applying
various forces. In this example, the agrent picks the best one, take action and execute.The states include in the above example is 
state, action, reward, next_state and done. The agent goes through these states and execute the best among all.At first, agent and environment are 
created along with observer, the agent passed through all these states and complete steps.

The line:
    reward = reward if not terminal else -reward
explains more specifically about the whole cartpole, stating if the agent goes through all reward it otherwise do not reward and start
the whole process.

The terms like 'remember' and 'replay' used are examples of Deep learning. The deep natural artitecture for this example is:
    self.model = Sequential()  
    self.model.add(Dense(24, input_shape=(observation_space,), activation="relu"))  
    self.model.add(Dense(24, activation="relu"))  
    self.model.add(Dense(self.action_space, activation="linear"))  
    self.model.compile(loss="mse", optimizer=Adam(lr=LEARNING_RATE))  
the natural arttecture made the learning more efficient because they have same artitecture but different weights. In every step,
the weight of current step is copied for target weight.
  

SyntaxError: invalid syntax (<ipython-input-6-08388b1363e8>, line 1)

In [4]:
cartpole()

Run: 1, exploration: 1.0, score: 15
Scores: (min: 15, avg: 15, max: 15)

Run: 2, exploration: 0.8348931673187264, score: 41
Scores: (min: 15, avg: 28, max: 41)

Run: 3, exploration: 0.7822236754458713, score: 14
Scores: (min: 14, avg: 23.333333333333332, max: 41)

Run: 4, exploration: 0.6832098777212641, score: 28
Scores: (min: 14, avg: 24.5, max: 41)

Run: 5, exploration: 0.6118738784280476, score: 23
Scores: (min: 14, avg: 24.2, max: 41)

Run: 6, exploration: 0.5819594443402982, score: 11
Scores: (min: 11, avg: 22, max: 41)

Run: 7, exploration: 0.510849320360386, score: 27
Scores: (min: 11, avg: 22.714285714285715, max: 41)

Run: 8, exploration: 0.4932355662165453, score: 8
Scores: (min: 8, avg: 20.875, max: 41)

Run: 9, exploration: 0.4738479773082268, score: 9
Scores: (min: 8, avg: 19.555555555555557, max: 41)

Run: 10, exploration: 0.446186062443672, score: 13
Scores: (min: 8, avg: 18.9, max: 41)

Run: 11, exploration: 0.41386834584198684, score: 16
Scores: (min: 8, avg: 18.63636

Run: 88, exploration: 0.01, score: 175
Scores: (min: 8, avg: 93.8409090909091, max: 332)

Run: 89, exploration: 0.01, score: 151
Scores: (min: 8, avg: 94.48314606741573, max: 332)

Run: 90, exploration: 0.01, score: 152
Scores: (min: 8, avg: 95.12222222222222, max: 332)

Run: 91, exploration: 0.01, score: 191
Scores: (min: 8, avg: 96.17582417582418, max: 332)

Run: 92, exploration: 0.01, score: 152
Scores: (min: 8, avg: 96.78260869565217, max: 332)

Run: 93, exploration: 0.01, score: 195
Scores: (min: 8, avg: 97.83870967741936, max: 332)

Run: 94, exploration: 0.01, score: 197
Scores: (min: 8, avg: 98.8936170212766, max: 332)

Run: 95, exploration: 0.01, score: 194
Scores: (min: 8, avg: 99.89473684210526, max: 332)

Run: 96, exploration: 0.01, score: 154
Scores: (min: 8, avg: 100.45833333333333, max: 332)

Run: 97, exploration: 0.01, score: 139
Scores: (min: 8, avg: 100.85567010309278, max: 332)

Run: 98, exploration: 0.01, score: 208
Scores: (min: 8, avg: 101.94897959183673, max: 332)

Run: 189, exploration: 0.01, score: 132
Scores: (min: 8, avg: 122.62, max: 262)

Run: 190, exploration: 0.01, score: 139
Scores: (min: 8, avg: 122.49, max: 262)

Run: 191, exploration: 0.01, score: 192
Scores: (min: 8, avg: 122.5, max: 262)

Run: 192, exploration: 0.01, score: 166
Scores: (min: 8, avg: 122.64, max: 262)

Run: 193, exploration: 0.01, score: 164
Scores: (min: 8, avg: 122.33, max: 262)

Run: 194, exploration: 0.01, score: 147
Scores: (min: 8, avg: 121.83, max: 262)

Run: 195, exploration: 0.01, score: 168
Scores: (min: 8, avg: 121.57, max: 262)

Run: 196, exploration: 0.01, score: 142
Scores: (min: 8, avg: 121.45, max: 262)

Run: 197, exploration: 0.01, score: 141
Scores: (min: 8, avg: 121.47, max: 262)

Run: 198, exploration: 0.01, score: 245
Scores: (min: 8, avg: 121.84, max: 262)

Run: 199, exploration: 0.01, score: 148
Scores: (min: 8, avg: 121.33, max: 262)

Run: 200, exploration: 0.01, score: 177
Scores: (min: 8, avg: 121.19, max: 262)

Run: 201, exploration: 0.01, 

Run: 290, exploration: 0.01, score: 45
Scores: (min: 10, avg: 182.61, max: 442)

Run: 291, exploration: 0.01, score: 160
Scores: (min: 10, avg: 182.29, max: 442)

Run: 292, exploration: 0.01, score: 320
Scores: (min: 10, avg: 183.83, max: 442)

Run: 293, exploration: 0.01, score: 181
Scores: (min: 10, avg: 184, max: 442)

Run: 294, exploration: 0.01, score: 500
Scores: (min: 10, avg: 187.53, max: 500)

Run: 295, exploration: 0.01, score: 146
Scores: (min: 10, avg: 187.31, max: 500)

Run: 296, exploration: 0.01, score: 136
Scores: (min: 10, avg: 187.25, max: 500)

Run: 297, exploration: 0.01, score: 155
Scores: (min: 10, avg: 187.39, max: 500)

Run: 298, exploration: 0.01, score: 266
Scores: (min: 10, avg: 187.6, max: 500)

Run: 299, exploration: 0.01, score: 135
Scores: (min: 10, avg: 187.47, max: 500)

Run: 300, exploration: 0.01, score: 130
Scores: (min: 10, avg: 187, max: 500)

Run: 301, exploration: 0.01, score: 180
Scores: (min: 10, avg: 187.02, max: 500)

Run: 302, exploration: 0

Run: 392, exploration: 0.01, score: 133
Scores: (min: 9, avg: 162.14, max: 500)

Run: 393, exploration: 0.01, score: 177
Scores: (min: 9, avg: 162.1, max: 500)

Run: 394, exploration: 0.01, score: 191
Scores: (min: 9, avg: 159.01, max: 500)

Run: 395, exploration: 0.01, score: 99
Scores: (min: 9, avg: 158.54, max: 500)

Run: 396, exploration: 0.01, score: 414
Scores: (min: 9, avg: 161.32, max: 500)

Run: 397, exploration: 0.01, score: 301
Scores: (min: 9, avg: 162.78, max: 500)

Run: 398, exploration: 0.01, score: 121
Scores: (min: 9, avg: 161.33, max: 500)

Run: 399, exploration: 0.01, score: 146
Scores: (min: 9, avg: 161.44, max: 500)

Run: 400, exploration: 0.01, score: 394
Scores: (min: 9, avg: 164.08, max: 500)

Run: 401, exploration: 0.01, score: 206
Scores: (min: 9, avg: 164.34, max: 500)

Run: 402, exploration: 0.01, score: 158
Scores: (min: 9, avg: 163.79, max: 500)

Run: 403, exploration: 0.01, score: 256
Scores: (min: 9, avg: 164.12, max: 500)

Run: 404, exploration: 0.01, s

NameError: name 'exit' is not defined

Note: If the code is running properly, you should begin to see output appearing above this code block. It will take several minutes, so it is recommended that you let this code run in the background while completing other work. When the code has finished, it will print output saying, "Solved in _ runs, _ total runs."

You may see an error about not having an exit command. This error does not affect the program's functionality and results from the steps taken to convert the code from Python 2.x to Python 3. Please disregard this error.