The Pacman environment
----------------------
We are using the Open AI gym toolkit to test our reinforcement learning algorithm on the Atari Pacman game.
The environment we are using is MsPacmanNoFrameskip-v4
Please refer to the config.py for the hyperparameter settings
You need to install the gym library which you can do with the command;
pip install gym

In [2]:
import gym
import numpy as np
from sac_torch import Agent
from utils import plot_learning_curve
import numpy as np
from config import Hyper, Constants
from atari_image import make_env
import os
CUDA_LAUNCH_BLOCKING=1


Hyper.init()
env = make_env(Constants.env_id)    # See wrapper code for environment in atari_image.py
Hyper.n_actions = env.action_space.n
shape = (env.observation_space.shape)
agent = Agent(input_dims=shape, env=env, n_actions=env.action_space.n)
filename = f"{Constants.env_id}_games{Hyper.n_games}_alpha{Hyper.alpha}.png"
figure_file = f'plots/{filename}'

best_score = env.reward_range[0]
score_history = []
load_checkpoint = False
if load_checkpoint:
    agent.load_models()
    env.render(mode='human')
total_steps = 0
for i in range(Hyper.n_games):
    observation = env.reset()
    done = False
    steps = 0
    score = 0
    while not done:
        # Sample action from the policy
        action = agent.choose_action(observation) 

        # Sample transition from the environment  
        new_observation, reward, done, info = env.step(action)
        steps += 1
        total_steps += 1

        # Store transition in the replay buffer
        agent.remember(observation, action, reward, new_observation, done)
        if not load_checkpoint:
            agent.learn()
        score += reward
        observation = new_observation
    score_history.append(score)
    avg_score = np.mean(score_history[-100:])

    if avg_score > best_score:
        best_score = avg_score
        if not load_checkpoint:
            agent.save_models()

    episode = i + 1
    print(f"episode {episode}: score {score}, trailing 100 games avg {avg_score}, steps {steps}, total steps {total_steps}")

print(f"total number of steps taken: {total_steps}")
if not load_checkpoint:
    x = [i+1 for i in range(Hyper.n_games)]
    plot_learning_curve(x, score_history, figure_file)







****************************************************************************************************
Hyperparameters used:
---------------------
environment = MsPacmanNoFrameskip-v4
alpha = 0.0003
beta = 0.0003
gamma = 0.99
tau = 0.005
batch size = 100
number of games = 250
****************************************************************************************************
.... saving models ....
episode 1: score 290.0, trailing 100 games avg 290.0, steps 651, total steps 651
.... saving models ....
episode 2: score 620.0, trailing 100 games avg 455.0, steps 549, total steps 1200
episode 3: score 140.0, trailing 100 games avg 350.0, steps 351, total steps 1551
episode 4: score 190.0, trailing 100 games avg 310.0, steps 393, total steps 1944
episode 5: score 220.0, trailing 100 games avg 292.0, steps 463, total steps 2407
episode 6: score 230.0, trailing 100 games avg 281.6666666666667, steps 513, total steps 2920


KeyboardInterrupt: 