- Adam Napora
- 18197892

## Intro

This is an evaluation script for the implementation of Deep Q-Learning algorithm heavily inspired by this post: https://medium.com/analytics-vidhya/building-a-powerful-dqn-in-tensorflow-2-0-explanation-tutorial-d48ea8f3177a.

Most important parts of the evaluation script:
- we are using an epsilon value set to 0, so the Agent chooses to exploit the learned actions, instead of randomly exploring the environment's state
- we are pointing the script to the latest training snapshot (using **RESTORE_PATH** variable)
- I did not commit the latest snapshot to the repository as the models are just too big (13MB and 36MB), also the replay buffer is very large (7GB)
- my final model, after over 2 days of training was able to consistenly achieve a score of 50, which is around a human (or better) level of performance
- I have noticed a glitch where the UI gets stuck when we stop the process and the window either needs to be killed from Terminal or Jupyter needs to be restarted all together (I've used Jupyter Lab on my own local machine for this project)

**Note:** Please see the full project and Training description in the main Training Notebook

## Model Evaluation

In [1]:
# Import Generic Libraries
import numpy as np
import time
import tensorflow as tf

In [2]:
# Import RL Application objects (described in detail in the other - Training Notebook)
from rl_imports import build_q_network
from rl_imports import GameWrapper
from rl_imports import ReplayBuffer
from rl_imports import Agent
from rl_imports import (BATCH_SIZE, CLIP_REWARD, DISCOUNT_FACTOR, ENV_NAME,
                    EVAL_LENGTH, FRAMES_BETWEEN_EVAL, INPUT_SHAPE,
                    LEARNING_RATE, LOAD_FROM, MAX_EPISODE_LENGTH,
                    MAX_NOOP_STEPS, MEM_SIZE, MIN_REPLAY_BUFFER_SIZE,
                    SAVE_PATH, TOTAL_FRAMES, UPDATE_FREQ, WRITE_TENSORBOARD)

In [None]:
# My installations require I run this to avoid errors with cuDNN.
# You can remove it if your system doesn't require it.
# (it shouldn't mess anything up if you keep it in)
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
    except RuntimeError as e:
        print(e)

# Change this to the path of the model you would like to visualize
RESTORE_PATH = './breakout-saves/save-04319620'

# Create environment
game_wrapper = GameWrapper(ENV_NAME, MAX_NOOP_STEPS)
print("The environment has the following {} actions: {}".format(game_wrapper.env.action_space.n, game_wrapper.env.unwrapped.get_action_meanings()))

# Create agent
MAIN_DQN = build_q_network(game_wrapper.env.action_space.n, LEARNING_RATE, input_shape=INPUT_SHAPE)
TARGET_DQN = build_q_network(game_wrapper.env.action_space.n, input_shape=INPUT_SHAPE)

replay_buffer = ReplayBuffer(size=MEM_SIZE, input_shape=INPUT_SHAPE)
agent = Agent(MAIN_DQN, TARGET_DQN, replay_buffer, game_wrapper.env.action_space.n, input_shape=INPUT_SHAPE, 
              batch_size=BATCH_SIZE)

print('Loading model...')
agent.load(RESTORE_PATH)
print('Loaded')

terminal = True
eval_rewards = []

for frame in range(EVAL_LENGTH):
    if terminal:
        game_wrapper.reset(evaluation=True)
        life_lost = True
        episode_reward_sum = 0
        terminal = False

    # Breakout require a "fire" action (action #1) to start the
    # game each time a life is lost.
    # Otherwise, the agent would sit around doing nothing.
    action = 1 if life_lost else agent.get_action(0, game_wrapper.state, evaluation=True)

    # Step action
    _, reward, terminal, life_lost = game_wrapper.step(action, render_mode='human')
    time.sleep(0.02)

    episode_reward_sum += reward

    # On game-over
    if terminal:
        print(f'Game over, reward: {episode_reward_sum}, frame: {frame}/{EVAL_LENGTH}')
        eval_rewards.append(episode_reward_sum)

print('Average reward:', np.mean(eval_rewards) if len(eval_rewards) > 0 else episode_reward_sum)

The environment has the following 4 actions: ['NOOP', 'FIRE', 'RIGHT', 'LEFT']
Loading model...
Loaded
Game over, reward: 64.0, frame: 1319/10000
