# PRA2 Supplementary Information

## Part 2: Playing at Pong

Sample code to test the model in `PongNoFrameskip-v4` environment, using the `AtariWrapper` and `frame_stack_v1` wrappers.

In [1]:
import warnings
warnings.filterwarnings('ignore')

num_episodes = 10
MODEL_NAME = "dqn-v1"
ENV_NAME = "PongNoFrameskip-v4"

In [2]:
import gymnasium as gym
import supersuit as ss
import numpy as np

env = gym.make(ENV_NAME, render_mode="rgb_array")

print("Environment created!")

Environment created!


A.L.E: Arcade Learning Environment (version 0.8.1+53f58b7)
[Powered by Stella]


In [3]:
from stable_baselines3.common.atari_wrappers import AtariWrapper

env = AtariWrapper(env,
                   noop_max=30, 
                   frame_skip=4, 
                   screen_size=84, 
                   terminal_on_life_loss=True, 
                   clip_reward=True, 
                   action_repeat_probability=0.0)
env = ss.frame_stack_v1(env, 4)

print("Environment wrapped!")

Environment wrapped!


In [4]:
obs, _ = env.reset()

print(obs.shape)
print(obs)

(84, 84, 4)
[[[  0   0   0  52]
  [  0   0   0  52]
  [  0   0   0  52]
  ...
  [  0   0   0  87]
  [  0   0   0  87]
  [  0   0   0  87]]

 [[  0   0   0  87]
  [  0   0   0  87]
  [  0   0   0  87]
  ...
  [  0   0   0  87]
  [  0   0   0  87]
  [  0   0   0  87]]

 [[  0   0   0  87]
  [  0   0   0  87]
  [  0   0   0  87]
  ...
  [  0   0   0  87]
  [  0   0   0  87]
  [  0   0   0  87]]

 ...

 [[  0   0   0 236]
  [  0   0   0 236]
  [  0   0   0 236]
  ...
  [  0   0   0 236]
  [  0   0   0 236]
  [  0   0   0 236]]

 [[  0   0   0 236]
  [  0   0   0 236]
  [  0   0   0 236]
  ...
  [  0   0   0 236]
  [  0   0   0 236]
  [  0   0   0 236]]

 [[  0   0   0 236]
  [  0   0   0 236]
  [  0   0   0 236]
  ...
  [  0   0   0 236]
  [  0   0   0 236]
  [  0   0   0 236]]]


In [5]:
from stable_baselines3 import DQN

model = DQN.load("models/"+ MODEL_NAME+ ".zip")

print("Model '{}' loaded!".format(MODEL_NAME))

Model 'dqn-v1' loaded!


In [8]:
import time
import imageio
from PIL import Image
import PIL.ImageDraw as ImageDraw

rewards_glb = []
export_gif = True

for i in range(num_episodes):
    frames = []
    rewards_episode = []
    done = False
    obs, _ = env.reset()
  
    while not done:
        action, _ = model.predict(obs)
        obs, reward, terminated, truncated, info = env.step(action)
        done = terminated or truncated
        rewards_episode.append(reward)

        # render to export gif
        if export_gif:
            frames.append(env.render())

    # debug
    print("Episode {} finished with length = {} and reward = {}".format(i+1, len(rewards_episode), sum(rewards_episode)))

    rewards_glb.append(sum(rewards_episode))

    if export_gif:
        # e.g. fps=50 == duration=20 (1000 * 1/50)
        imageio.mimwrite("./videos/"+ MODEL_NAME +'_'+ time.strftime('%Y%m%d-%H%M%S') +'.gif', frames, duration=20)

print("\nTest reward: {} +- {:.4f} \n{}".format(np.mean(rewards_glb), np.std(rewards_glb), rewards_glb))

Episode 1 finished with length = 1652 and reward = 21.0
Episode 2 finished with length = 1661 and reward = 20.0
Episode 3 finished with length = 1781 and reward = 19.0
Episode 4 finished with length = 1645 and reward = 21.0
Episode 5 finished with length = 1693 and reward = 20.0
Episode 6 finished with length = 1645 and reward = 21.0
Episode 7 finished with length = 1633 and reward = 21.0
Episode 8 finished with length = 1632 and reward = 21.0
Episode 9 finished with length = 1835 and reward = 19.0
Episode 10 finished with length = 1668 and reward = 20.0

Test reward: 20.3 +- 0.7810 
[21.0, 20.0, 19.0, 21.0, 20.0, 21.0, 21.0, 21.0, 19.0, 20.0]
