# RL Final Project

Now it's finally time to put into use what we have learned so far in this course!

The aim of this project is to assess your practical knowledge in Reinforcement Learning.

your project consist of 2 parts. you will get the chance to work with 2 different environment.


## 2.Atari Game Pong


<img src="zzzzzzzzzzzzzzzzzc"/>

**[Pong](https://www.gymlibrary.dev/environments/atari/pong/)** is a famus atari game that almost all of us have played it at least once!
The goal of this task is to get engage with **gym** library and use Deep Reinforcement Learning to train an agent which can actually play this game!

In [1]:
# !pip install ALE
# !pip install gym
# !pip install opencv-python
#
# !pip install "tensorflow==2.10"
# !pip install "tensorflow-gpu==2.10"
#
# !pip install tqdm
# !pip install jdc
#
# !pip list

In [2]:
import gym
import cv2
import jdc
import random
import warnings

import numpy as np
import tensorflow as tf

from IPython.utils import io
from tqdm.notebook import tqdm

In [3]:
warnings.filterwarnings('ignore')

In [4]:
tf.config.list_physical_devices('GPU')

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

In [5]:
TRAIN = True
GAME_NAME = 'ALE/Pong-v5'
MODEL_PATH = './pong-dqn.h5'
MODEL_ACTIVATION = 'relu'
INPUT_SHAPE = (84, 84, 1)

In [6]:
BATCH_SIZE = 32
GAMMA = 0.95
EPSILON = 1.0
MIN_EPSILON = 0.1
EPSILON_DECAY = 0.995
LEARNING_RATE = 0.001

In [7]:
class DQNAgent:
    def __init__(self, state_size, action_size):
        self.state_size = state_size
        self.action_size = action_size
        self.memory = []
        self.epsilon = EPSILON
        self.model = self.build_model()

    def build_model(self):
        model = tf.keras.models.Sequential()
        model.add(tf.keras.layers.Conv2D(32, kernel_size=3, activation=MODEL_ACTIVATION, input_shape=self.state_size))
        model.add(tf.keras.layers.Conv2D(64, kernel_size=3, activation=MODEL_ACTIVATION))
        model.add(tf.keras.layers.Conv2D(128, kernel_size=3, activation=MODEL_ACTIVATION))
        model.add(tf.keras.layers.Flatten())
        model.add(tf.keras.layers.Dense(256, activation=MODEL_ACTIVATION))
        model.add(tf.keras.layers.Dense(self.action_size, activation='linear'))
        model.compile(loss='mse', optimizer=tf.keras.optimizers.Adam(learning_rate=LEARNING_RATE))
        return model

In [8]:
%%add_to DQNAgent

def choose_action(self, state):
    if np.random.rand() <= self.epsilon:
        return random.randrange(self.action_size)
    else:
        return np.argmax(self.model.predict(state)[0])

In [9]:
%%add_to DQNAgent

def run_episode(self, batch_size=32):
    minibatch = random.sample(self.memory, batch_size)
    for state, action, reward, next_state, done in minibatch:
        target = reward + (0 if done else GAMMA * np.amax(self.model.predict(next_state)[0]))
        target_f = self.model.predict(state)
        target_f[0][action] = target
        self.model.fit(state, target_f, epochs=1, verbose=0)
    if self.epsilon > MIN_EPSILON:
        self.epsilon *= EPSILON_DECAY

In [10]:
%%add_to DQNAgent

def remember(self, state, action, reward, next_state, done):
    self.memory.append((state, action, reward, next_state, done))

In [11]:
%%add_to DQNAgent

def load(self, name):
    self.model.load_weights(name)


def save(self, name):
    self.model.save_weights(name)

In [12]:
def preprocess_frame(frame):
    frame = frame[0]
    if len(frame.shape) == 3 and frame.shape[2] == 3:
        frame = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY)
    resized = cv2.resize(frame, (84, 84), interpolation=cv2.INTER_AREA)
    return np.reshape(resized, INPUT_SHAPE)

In [13]:
def train(episodes):
    env = gym.make(GAME_NAME)
    state_size = INPUT_SHAPE
    action_size = env.action_space.n
    agent = DQNAgent(state_size, action_size)

    bar_format = 'Training: {percentage:3.0f}% |{bar}| Elapsed: {elapsed} Remaining: {remaining}{postfix}'
    training_pbar = tqdm(total=episodes, bar_format=bar_format, unit='episode')

    for e in range(episodes):
        state = preprocess_frame(env.reset())
        state = np.expand_dims(state, axis=0)
        done = False
        total_reward = 0
        while not done:
            with io.capture_output() as captured:
                action = agent.choose_action(state)
            next_state, reward, done, _, _ = env.step(action)
            total_reward += reward

            next_state = preprocess_frame(next_state)
            next_state = np.expand_dims(next_state, axis=0)
            agent.remember(state, action, reward, next_state, done)
            state = next_state
            if done:
                agent.save(MODEL_PATH)

        if len(agent.memory) > BATCH_SIZE:
            with io.capture_output() as captured:
                agent.run_episode(BATCH_SIZE)

        training_pbar.set_postfix_str(f'Reward: {total_reward}')
        training_pbar.update(1)

    training_pbar.close()

In [14]:
def play_with_model():
    env = gym.make(GAME_NAME, render_mode='human')
    state_size = INPUT_SHAPE
    action_size = env.action_space.n
    agent = DQNAgent(state_size, action_size)
    agent.load(MODEL_PATH)

    state = preprocess_frame(env.reset())
    state = np.expand_dims(state, axis=0)
    done = False
    while not done:
        env.render()
        with io.capture_output() as captured:
            action = agent.choose_action(state)
        next_state, reward, done, _, _ = env.step(action)
        state = preprocess_frame(next_state)
        state = np.expand_dims(state, axis=0)

In [None]:
train(episodes=200)
play_with_model()

Training:   0% |          | Elapsed: 00:00 Remaining: ?

**Note**: Keep in mind that observation space for this environment are frames from environment. Observation space is an image of size (210, 160, 3). so you will need to implement an agent which can process images!(a CNN based agent). 

Make sure to do perform preprocessing on the frames. For example, you can convert the RBG image to gray. you can use [OpenCV](https://docs.opencv.org/4.x/d6/d00/tutorial_py_root.html) library to perform resize\ing, bluring or any applicable filtering on the frames.

## Grading criteria
Project: 35 points

* Final Viva: 10 points
* Implementation: 10 points
* Final Report: 15 points

For viva you will need to expilictly mention each team member's contribution.

You can write your report on this notebook. The report must include visualization of your results. Train your model at least with 2 different sets of hyperparameters and in visualization section compare their output.


### Good Luck!