# Task 3: Test of Proximal Policy Optimization (PPO)
### By Nattaphat Thanaussawanun and Prapas Rakchartkiattikul

## Contents

- <a href='#ev_1'>Test of the experiment 1: the small environment (N = 20)</a>
- <a href='#ev_2'>Test of the experiment 2: the medium environment (N = 40)</a>
- <a href='#ev_3'>Test of the experiment 3: the large environment (N = 60)</a>

In [1]:
from ppo_algorithm import *
import pickle

In [2]:
# Help function to run several experiments
def run_experiments(env, actor_model, num_exps):
    with torch.no_grad():
        all_rewards = []
        all_action_counts = []

        for _ in range(num_exps):
            obs = env.reset()
            done = False

            total_reward = 0
            action_count = 0

            while not done:
                dist = Categorical(actor_model(obs))
                action_index = torch.squeeze(dist.sample()).item()
                action_name = index_to_actions[action_index].name
                obs, reward, done = env.step(action_name)
                total_reward += reward
                action_count += 1

            all_rewards.append(total_reward)
            all_action_counts.append(action_count)

        max_reward = max(all_rewards)
        avg_reward = np.mean(all_rewards)
        var_reward = np.std(all_rewards)
        avg_action_count = np.mean(all_action_counts)

    return max_reward, avg_reward, var_reward, avg_action_count

## Test of the experiment 1: the small environment (N = 20) <a id='ev_1'></a> 

In [3]:
# Import the Actor model
PATH_MLP = 'Actor_Model_20.pth'
model = torch.load(PATH_MLP)
model.eval()

# Import the same environment as training
FILENAME = open('dungeon_20.p', 'rb')
dungeon = pickle.load(FILENAME)
FILENAME.close()

dungeon.reset()
dungeon.display()

# Running the 1000 experiments
max_r, avg_r, var_r, avg_ac = run_experiments(dungeon, model, 1000)
avg_r_per_ac = avg_r/avg_ac

X X X X X X X X X X X X X X X X X X X X 
X . . . . . L . . . . . . . . . . . . X 
X . . . . . . . L . . . . L . . . . . X 
X . . . . . . . . . . . . . X . . . . X 
X . . . . . . . . . . . . . . L . . . X 
X . . . . . . . . E . . . . X . X . . X 
X . X . . . . . . . . . . . . . . . . X 
X . . . . . . . . . . . . . . . . . . X 
X . . . . . . X . L . . . . . . . . . X 
X . . X . . . . . . . . . . . . X . . X 
X . . . . . . . X . . . . . . . . . . X 
X . . . . L . . . . . . . . . . . . . X 
X . . . . . . . . . . . . . . . . . . X 
X . . . . . . . . . L . . . . . . . . X 
X . . . . . . . . . . . L . . . . . . X 
X . . . . L . . . . . . . . . . . . . X 
X . . . . . . . . L . . . . . . . . . X 
X . A . . . . . . . . . . . . . . . . X 
X . . . . X . . . . . . . . . . . . . X 
X X X X X X X X X X X X X X X X X X X X 



In [4]:
print(f'The environment size ({dungeon.size} x {dungeon.size}) of the dungeon shows the following result:')
print(f'Maximum Reward: {max_r:0.2f} | Average Reward: {avg_r:0.2f} | Variance Reward: {var_r:0.2f} | '
          f'Average Action: {avg_ac:0.2f} | Average Reward Per Action: {avg_r_per_ac:0.2f}')

The environment size (20 x 20) of the dungeon shows the following result:
Maximum Reward: 399.00 | Average Reward: 299.82 | Variance Reward: 114.88 | Average Action: 62.51 | Average Reward Per Action: 4.80


## Test of the experiment 2: the medium environment (N = 40) <a id='ev_2'></a> 

In [5]:
# Import the Actor model
PATH_MLP = 'Actor_Model_40.pth'
model = torch.load(PATH_MLP)
model.eval()

# Import the same environment as training
FILENAME = open('dungeon_40.p', 'rb')
dungeon = pickle.load(FILENAME)
FILENAME.close()

dungeon.reset()
dungeon.display()

# Running the 1000 experiments
max_r, avg_r, var_r, avg_ac = run_experiments(dungeon, model, 1000)
avg_r_per_ac = avg_r/avg_ac

X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 
X . . . . . . . . . . . . . . . . . . . . . . X . . . . . . . . . . . . . . . X 
X X . . . . . L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X 
X . . . . . . . . L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X 
X . . . . . X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X 
X . . . . . . . . . . . L . X . . . . . . . . . . . . . . . . . . . . . . . . X 
X . . . . . . . . . . . . . . . . . . X . . . . . . . . . . . . . . . . . . . X 
X . . . . . L . . . . . . . . L . . . . . . . . . . . . . . . . . . . . . . . X 
X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X 
X . . . . . . . E . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X 
X . . . . . . . . . L . . . . . . . . . . . . . . . . . . . . . . X . . . . . X 
X . . . . . . . . . . L . . . X . . . . . . . . . . . . . . . . . . . . . . . X 
X . . . . . . . L . . . . . 

In [6]:
print(f'The environment size ({dungeon.size} x {dungeon.size}) of the dungeon shows the following result:')
print(f'Maximum Reward: {max_r:0.2f} | Average Reward: {avg_r:0.2f} | Variance Reward: {var_r:0.2f} | '
          f'Average Action: {avg_ac:0.2f} | Average Reward Per Action: {avg_r_per_ac:0.2f}')

The environment size (40 x 40) of the dungeon shows the following result:
Maximum Reward: 1599.00 | Average Reward: 1405.11 | Variance Reward: 147.25 | Average Action: 147.35 | Average Reward Per Action: 9.54


## Test of the experiment 3: the large environment (N = 60) <a id='ev_3'></a> 

In [7]:
# Import the Actor model
PATH_MLP = 'Actor_Model_60.pth'
model = torch.load(PATH_MLP)
model.eval()

# Import the same environment as training
FILENAME = open('dungeon_60.p', 'rb')
dungeon = pickle.load(FILENAME)
FILENAME.close()

dungeon.reset()
dungeon.display()

# Running the 1000 experiments
max_r, avg_r, var_r, avg_ac = run_experiments(dungeon, model, 1000)
avg_r_per_ac = avg_r/avg_ac

X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 
X . . L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X 
X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X 
X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X 
X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X 
X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . L . . . . . X 
X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X 
X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X 
X . . . . . . . . . . . . . . . 

In [8]:
print(f'The environment size ({dungeon.size} x {dungeon.size}) of the dungeon shows the following result:')
print(f'Maximum Reward: {max_r:0.2f} | Average Reward: {avg_r:0.2f} | Variance Reward: {var_r:0.2f} | '
          f'Average Action: {avg_ac:0.2f} | Average Reward Per Action: {avg_r_per_ac:0.2f}')

The environment size (60 x 60) of the dungeon shows the following result:
Maximum Reward: 3599.00 | Average Reward: 3516.16 | Variance Reward: 52.22 | Average Action: 51.91 | Average Reward Per Action: 67.74
