# 6-2 Assignment: Cartpole Revisited
---
<div class="alert alert-block alert-success" style="color:black;">
<b>To Begin:</b> Run all code blocks and observe the output. Once you have reviewed the sample output. Use the <b>LastName_FirstName_Assignment2.ipynb</b> file to complete your assignment.
</div>

<div class="alert alert-block alert-warning" style="color:black;">
    <b>Note:</b> For compatability purposes, libraries have been updated from those used in the required readings to match to current versions; hence some of the package invocations may differ slightly from the book. The affected lines of code have comments added to the right as applicable, with the old code commented out above for reference.
</div>

<div class="alert alert-block alert-danger" style="color:black;">
<b>GPU/CUDA/Memory Warnings/Errors:</b> You may receive some errors referencing that GPUs will not be used, CUDA could not be found, or free system memory allocation errors. These and a few others, are standard errors that can be ignored here as they are environment based.<br><br>
<b>Example messages:</b>
    <ul>
        <li>Could not find cuda drivers on your machine, GPU will not be used.</li>
        <li>Please check linkage and avoid linking the same target more than once.</li>
        <li>E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)</li>
        <li>Allocation of ######## exceeds 10% of free system memory</li>
    </ul>
</div>

---

### Installing Required Packages
This is to install necessary components to run the assignment

In [1]:
!pip install -r requirements.txt

Collecting tensorflow==2.19.0 (from -r requirements.txt (line 3))
  Using cached tensorflow-2.19.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.1 kB)
Using cached tensorflow-2.19.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (644.9 MB)
Installing collected packages: tensorflow
  Attempting uninstall: tensorflow
    Found existing installation: tensorflow 2.17.0
    Uninstalling tensorflow-2.17.0:
      Successfully uninstalled tensorflow-2.17.0
  Rolling back uninstall of tensorflow
  Moving to /home/codio/.pyenv/versions/3.11.9/bin/import_pb_to_tensorboard
   from /tmp/pip-uninstall-p35hzg8u/import_pb_to_tensorboard
  Moving to /home/codio/.pyenv/versions/3.11.9/bin/saved_model_cli
   from /tmp/pip-uninstall-p35hzg8u/saved_model_cli
  Moving to /home/codio/.pyenv/versions/3.11.9/bin/tensorboard
   from /tmp/pip-uninstall-p35hzg8u/tensorboard
  Moving to /home/codio/.pyenv/versions/3.11.9/bin/tf_upgrade_v2
   from /tmp/pip-uninstall-p35hzg8u/tf_

<div class="alert alert-block alert-warning" style="color:black;">
    <b>Note:</b> If the assignment takes too long, you may want to download the code and run it locally to take advantage of your GPU.
</div>


In [2]:
# Package Imports
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

import numpy as np
import tensorflow as tf
import gymnasium as gym
from collections import deque
import random
import sys

# Environment setup and Variables
try:
    env = gym.make("CartPole-v1", render_mode=None)
except Exception:
    print('Failed to initialize environment! Make sure that gymnasium was installed correctly!')
    sys.exit(1)

state_shape = int(env.observation_space.shape[0])
action_size = int(env.action_space.n)

# Hyperparameters
GAMMA = 0.99
EXPLORATION_MAX = 1.0
EXPLORATION_MIN = 0.01
EXPLORATION_DECAY = 0.990
LEARNING_RATE = 0.001
BATCH_SIZE = 64
TRAIN_START = 1000
MEMORY_SIZE = 2000
EPISODES = 300

# Counters during training
train_freq = 4
step_count = 0
target_update_freq = 10

<div class="alert alert-block alert-warning" style="color:black;">
    <b>Note:</b> If an error occurs below, it is because the environment is looking for a GPU.
</div>


In [3]:
memory = deque(maxlen=MEMORY_SIZE)

# DQN Builder
def build_dqn_model():
    try:
        model = tf.keras.Sequential([
            tf.keras.layers.Input(shape=(state_shape,)),
            tf.keras.layers.Dense(64, activation='relu'),
            tf.keras.layers.Dense(64, activation='relu'),
            tf.keras.layers.Dense(action_size, activation='linear')
        ])
        model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=LEARNING_RATE), loss='mse')
        return model
    except Exception:
        print('Error occurred while building the DQN model!')
        sys.exit(1)

In [4]:
# Calling the build_dqn_model method
model = build_dqn_model()
target_model = build_dqn_model()
target_model.set_weights(model.get_weights())

<div class="alert alert-block alert-warning" style="color:black;">
    <b>Note:</b> If an error displays, it is simply trying to connect to the GPU
</div>


In [5]:
# Random Action selection for exploration
def get_action(state, EXPLORATION_MAX):
    try:
        if np.random.rand() <= EXPLORATION_MAX:
            return random.randrange(action_size)
        q_values = model.predict(np.array([state]), verbose=0) #Remove verbose=0 to see every prediction display
        return np.argmax(q_values[0])
    except Exception as e:
            print("Error during get_action method!")
            raise e
            # sys.exit(1)

In [6]:
# Experience Replay
def experience_replay():
    if len(memory) < TRAIN_START:
        return

    batch = random.sample(memory, BATCH_SIZE)
    states = np.zeros((BATCH_SIZE, state_shape))
    next_states = np.zeros((BATCH_SIZE, state_shape))
    actions, rewards, completions = [], [], []

    try:
        for i, (state, action, reward, state_next, terminal) in enumerate(batch):
            states[i] = state
            next_states[i] = state_next
            actions.append(action)
            rewards.append(reward)
            completions.append(terminal)

        target_q = model.predict(states, verbose=0) #Remove verbose=0 to see every prediction display
        next_q = target_model.predict(next_states, verbose=0) #Remove verbose=0 to see every prediction display

        for i in range(BATCH_SIZE):
            if completions[i]:
                target_q[i][actions[i]] = rewards[i]
            else:
                target_q[i][actions[i]] = rewards[i] + GAMMA * np.max(next_q[i])

        model.fit(states, target_q, batch_size=BATCH_SIZE, verbose=0)
    except Exception:
        print("Error during replay")
        sys.exit(1)

In [7]:
# A list to keep track of all of the episodes
episode_rewards = []

# Main loop
# Will iterate through based on the number of EPISODES originally listed up top
for e in range(EPISODES):
    state, _ = env.reset()
    done = False
    total_reward = 0

    while not done:
        action = get_action(state, EXPLORATION_MAX)
        next_state, reward, terminated, truncated, _ = env.step(action)
        done = terminated or truncated
        
        reward = reward if not done else -10

        memory.append((state, action, reward, next_state, done))
        state = next_state
        total_reward += reward
        
        step_count += 1
        if step_count % train_freq == 0:
            experience_replay()

    if EXPLORATION_MAX > EXPLORATION_MIN:
        EXPLORATION_MAX *= EXPLORATION_DECAY
        EXPLORATION_MAX = max(EXPLORATION_MIN, EXPLORATION_MAX)

    if e % target_update_freq == 0:
        target_model.set_weights(model.get_weights())

    episode_rewards.append(total_reward)
    # This will display the average reward of every 10 episodes occurring.
    # Comment this section and add the below print method if you want to display every episode
    if e % 10 == 0:
        avg_reward = np.mean(episode_rewards[-10:])
        print(f"Episode {e+1}: Average Reward = {avg_reward:.2f}, Exploration = {EXPLORATION_MAX:.3f}")
    
    # Logic to complete execution if average is greater than 195
    if len(episode_rewards) >= 100:
        avg_last_hun = np.mean(episode_rewards[-100:])
        if avg_last_hun >= 195:
            print(f"Solved at episode {e}: Average reward over last 100: {avg_last_hun:.2f}")
            break
        
    # Comment this in if you want to see every iteration of the training occurring     
    # print(f"Episode {e+1}: Total Reward = {total_reward}, Exploration = {EXPLORATION_MAX:.3f}")

# Saving model as needed    
model.save("Cartpole_model.h5")
print("Training complete! Model saved as Cartpole_model")


Episode 1: Average Reward = 22.00, Exploration = 0.990
Episode 11: Average Reward = 15.70, Exploration = 0.895
Episode 21: Average Reward = 6.70, Exploration = 0.810
Episode 31: Average Reward = 6.60, Exploration = 0.732
Episode 41: Average Reward = 7.80, Exploration = 0.662
Episode 51: Average Reward = 3.10, Exploration = 0.599
Episode 61: Average Reward = 6.40, Exploration = 0.542
Episode 71: Average Reward = 27.80, Exploration = 0.490
Episode 81: Average Reward = 2.60, Exploration = 0.443
Episode 91: Average Reward = 62.80, Exploration = 0.401
Episode 101: Average Reward = 95.50, Exploration = 0.362
Episode 111: Average Reward = 151.50, Exploration = 0.328
Episode 121: Average Reward = 162.40, Exploration = 0.296
Episode 131: Average Reward = 170.00, Exploration = 0.268
Episode 141: Average Reward = 110.40, Exploration = 0.242
Episode 151: Average Reward = 161.30, Exploration = 0.219
Episode 161: Average Reward = 218.30, Exploration = 0.198
Episode 171: Average Reward = 287.40, Expl



Training complete! Model saved as Cartpole_model


<div class="alert alert-block alert-warning" style="color:black;">
    <b>Note:</b> <br>
    If an error displays, it is attempting to connect to the GPU.<br>
    It will not connect and run on CPU. <br>
    Code will take some time to run, which is commonplace for real life models.<br>
    <b>-- Make sure your computer does not go to sleep, and take a well-deserved break! --</b>
</div>


<div class="alert alert-block alert-success" style="color:black;">
<b>Make your observations here as it pertains to the assignment rubric.
</div>