# Assessment 3: RL Gym
### Game Selection: Breakout for Atari
For this assignment I have chosen the Atari game Breakout, as it is simple enough that it should be relatively easy to get a demonstrable model working without an excessive amount of training. https://gymnasium.farama.org/environments/atari/breakout/
#### SCORING
The player scores points by hitting one of the wall's bricks. The number of points is determined by the brick's color:
- Red - 7 points
- Orange - 7 points
- Yellow - 4 points
- Green - 4 points
- Aqua - 1 point
- Blue - 1 point

In [None]:
#Pre-setup installs
%pip install gymnasium[atari]
%pip install gymnasium[accept-rom-license]
%pip install tensorflow

In [None]:
# Setup/imports
from collections import defaultdict
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.patches import Patch
from tqdm import tqdm
import tensorflow as tf
import gymnasium

env = gymnasium.make("ALE/Breakout-v5", obs_type="rgb") # create the environment used for the game

### Model Implementation: 
Implement and train an RL model using an algorithm like Q-learning, Deep Q-Networks (DQN), or any other suitable method. Explain your choice of algorithm and any modifications you made. Comment on the hyperparameters and why you chose them.

In [None]:
# Define the agent for playing Breakout
class BreakoutAgent:
    def __init__(self, learning_rate, discount_factor, exploration):
        self.learning_rate = learning_rate
        self.discount_factor = discount_factor
        self.exploration = exploration
        self.rewards = []

    def take_action(self, observation):

        return env.action_space.sample()

In [None]:
# Create agent and define hyperparameters
number_of_runs = 1000
learning_rate = 0.15
discount_factor = 0.99
exploration = lambda run: 50. / (run + 10)

agent = BreakoutAgent(learning_rate, discount_factor, exploration)

### Training Process: 
Describe the training process, including any pre-processing steps such as frame stacking or converting frames to grayscale. Take short (<10 sec) videos at suitable training steps to demonstrate the agent's progress. Provide commentary on the agent's performance and any notable observations.

In [None]:
# Start training
observations, actions = env.observation_space, env.action_space

for run in range(1, number_of_runs + 1):
    observation, info = env.reset()
    done = False
    while not done:
        if np.random.random() < agent.exploration:
            action = env.action_space.sample() # Explore action space
        else:
            action = agent.take_action() # Exploit learned values

        next_state, reward, done, info = env.step(action) 
        
        old_value = q_table[state, action]
        next_max = np.max(q_table[next_state])
        
        new_value = (1 - agent.learning_rate) * old_value + agent.learning_rate * (reward + next_max)
        q_table[state, action] = new_value

        state = next_state

        action = agent.take_action()
        observation, reward, terminated, truncated, info = env.step(action)

        if terminated or truncated:
            done = True
            observation, info = env.reset()
            agent.rewards.append(reward)
            #if run % (number_of_runs / 10) == 0:
                #print(agent.rewards)

env.close()

### Evaluation and Performance Metrics: 
Evaluate the performance of your trained model. Provide relevant metrics such as average reward, episodes needed to solve the game, and any additional visualizations or graphs. Comment on the strengths and limitations of your trained agent.

### Documentation and Report: 
Provide a clear and detailed report of your process, including decisions, challenges, and any improvements made during the training. Include commentary on the weights chosen and any pre-processing techniques applied.