# Evaluating Trained Othello Agents

This notebook demonstrates how to evaluate and analyze trained Othello agents, including:
- Loading trained models from checkpoints
- Evaluating against different opponents
- Analyzing game statistics and patterns
- Visualizing agent behavior
- Creating game replays
- Comparing multiple agents

## Setup

In [None]:
import ray
from ray.rllib.algorithms.ppo import PPO, PPOConfig
from ray.rllib.models import ModelCatalog
import gymnasium as gym
import numpy as np
import matplotlib.pyplot as plt
from collections import defaultdict
import aip_rl.othello

print("Libraries imported successfully!")

## 1. Loading a Trained Agent

First, let's load a trained agent from a checkpoint.

In [None]:
# Initialize Ray
ray.init(ignore_reinit_error=True)

# Path to your checkpoint (update this with your actual checkpoint path)
checkpoint_path = "/path/to/your/checkpoint"

# Note: For this demo, we'll train a quick agent
# In practice, you would load a pre-trained checkpoint
print("For this demo, we'll train a quick agent...")
print("In practice, you would load a pre-trained checkpoint.\n")

# Quick training for demo purposes
config = (
    PPOConfig()
    .environment(env="Othello-v0")
    .framework("torch")
    .env_runners(num_env_runners=2)
    .resources(num_gpus=0)
)

algo = config.build()

# Train for a few iterations
for i in range(5):
    result = algo.train()
    print(f"Training iteration {i+1}/5...")

print("\nAgent ready for evaluation!")

## 2. Evaluation Against Different Opponents

Let's evaluate the agent against random, greedy, and self-play opponents.

In [None]:
def evaluate_vs_opponent(algo, opponent_type, num_episodes=20, seed=42):
    """
    Evaluate agent against a specific opponent.
    
    Returns:
        Dictionary with wins, losses, draws, and statistics
    """
    env = gym.make("Othello-v0", opponent=opponent_type)
    
    wins = 0
    losses = 0
    draws = 0
    rewards = []
    lengths = []
    piece_diffs = []  # Final piece count difference
    
    for episode in range(num_episodes):
        observation, info = env.reset(seed=seed + episode)
        done = False
        episode_reward = 0
        steps = 0
        
        while not done:
            action = algo.compute_single_action(observation)
            observation, reward, terminated, truncated, info = env.step(action)
            episode_reward += reward
            steps += 1
            done = terminated or truncated
        
        rewards.append(episode_reward)
        lengths.append(steps)
        
        # Determine winner and piece difference
        black_count = info['black_count']
        white_count = info['white_count']
        piece_diff = black_count - white_count  # Agent is Black
        piece_diffs.append(piece_diff)
        
        if black_count > white_count:
            wins += 1
        elif white_count > black_count:
            losses += 1
        else:
            draws += 1
    
    return {
        'wins': wins,
        'losses': losses,
        'draws': draws,
        'win_rate': wins / num_episodes,
        'mean_reward': np.mean(rewards),
        'std_reward': np.std(rewards),
        'mean_length': np.mean(lengths),
        'mean_piece_diff': np.mean(piece_diffs),
        'rewards': rewards,
        'lengths': lengths,
        'piece_diffs': piece_diffs,
    }

print("Evaluation function defined!")

In [None]:
# Evaluate against different opponents
opponents = ["random", "greedy"]
results = {}

print("Evaluating agent against different opponents...\n")

for opponent in opponents:
    print(f"Evaluating vs {opponent}...")
    results[opponent] = evaluate_vs_opponent(algo, opponent, num_episodes=20)
    
    r = results[opponent]
    print(f"  Wins: {r['wins']}/20 ({r['win_rate']*100:.1f}%)")
    print(f"  Losses: {r['losses']}/20")
    print(f"  Draws: {r['draws']}/20")
    print(f"  Mean Reward: {r['mean_reward']:.2f} ± {r['std_reward']:.2f}")
    print(f"  Mean Episode Length: {r['mean_length']:.1f}")
    print(f"  Mean Piece Difference: {r['mean_piece_diff']:.1f}\n")

print("Evaluation complete!")

### Visualizing Results

In [None]:
# Create comparison plots
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Win rates
ax = axes[0, 0]
opponent_names = list(results.keys())
win_rates = [results[opp]['win_rate'] * 100 for opp in opponent_names]
ax.bar(opponent_names, win_rates, color=['skyblue', 'lightcoral'])
ax.set_ylabel('Win Rate (%)')
ax.set_title('Win Rate vs Different Opponents')
ax.set_ylim([0, 100])
ax.grid(True, alpha=0.3, axis='y')

# Mean rewards
ax = axes[0, 1]
mean_rewards = [results[opp]['mean_reward'] for opp in opponent_names]
std_rewards = [results[opp]['std_reward'] for opp in opponent_names]
ax.bar(opponent_names, mean_rewards, yerr=std_rewards, 
       color=['skyblue', 'lightcoral'], capsize=5)
ax.set_ylabel('Mean Reward')
ax.set_title('Mean Reward vs Different Opponents')
ax.grid(True, alpha=0.3, axis='y')

# Episode lengths
ax = axes[1, 0]
for opponent in opponent_names:
    ax.hist(results[opponent]['lengths'], alpha=0.6, label=opponent, bins=10)
ax.set_xlabel('Episode Length')
ax.set_ylabel('Frequency')
ax.set_title('Episode Length Distribution')
ax.legend()
ax.grid(True, alpha=0.3)

# Piece differences
ax = axes[1, 1]
for opponent in opponent_names:
    ax.hist(results[opponent]['piece_diffs'], alpha=0.6, label=opponent, bins=15)
ax.set_xlabel('Piece Difference (Agent - Opponent)')
ax.set_ylabel('Frequency')
ax.set_title('Final Piece Difference Distribution')
ax.axvline(x=0, color='black', linestyle='--', alpha=0.5)
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 3. Game Replay and Visualization

Let's record and visualize a complete game.

In [None]:
def record_game(algo, opponent="random", seed=42):
    """
    Record a complete game with all states and actions.
    """
    env = gym.make("Othello-v0", opponent=opponent, render_mode="rgb_array")
    
    observation, info = env.reset(seed=seed)
    
    game_record = {
        'observations': [observation.copy()],
        'actions': [],
        'rewards': [],
        'infos': [info.copy()],
        'frames': [env.render()],
    }
    
    done = False
    while not done:
        action = algo.compute_single_action(observation)
        observation, reward, terminated, truncated, info = env.step(action)
        
        game_record['actions'].append(action)
        game_record['rewards'].append(reward)
        game_record['observations'].append(observation.copy())
        game_record['infos'].append(info.copy())
        game_record['frames'].append(env.render())
        
        done = terminated or truncated
    
    return game_record

# Record a game
print("Recording a game...")
game = record_game(algo, opponent="greedy", seed=42)
print(f"Game recorded! {len(game['actions'])} moves played.")

In [None]:
# Visualize key moments from the game
key_moments = [0, len(game['frames'])//4, len(game['frames'])//2, 
               3*len(game['frames'])//4, -1]

fig, axes = plt.subplots(1, 5, figsize=(20, 4))

for idx, moment in enumerate(key_moments):
    axes[idx].imshow(game['frames'][moment])
    step = moment if moment >= 0 else len(game['frames']) + moment
    info = game['infos'][moment]
    axes[idx].set_title(f"Step {step}\nB:{info['black_count']} W:{info['white_count']}")
    axes[idx].axis('off')

plt.suptitle('Key Moments in the Game', fontsize=16)
plt.tight_layout()
plt.show()

## 4. Analyzing Agent Behavior

Let's analyze the agent's decision-making patterns.

In [None]:
# Analyze action distribution
action_counts = defaultdict(int)
for action in game['actions']:
    action_counts[action] += 1

# Convert actions to board positions
position_heatmap = np.zeros((8, 8))
for action, count in action_counts.items():
    row, col = action // 8, action % 8
    position_heatmap[row, col] = count

# Visualize action heatmap
plt.figure(figsize=(8, 8))
plt.imshow(position_heatmap, cmap='YlOrRd', interpolation='nearest')
plt.colorbar(label='Number of times played')
plt.title('Agent Action Heatmap\n(Darker = More Frequently Played)')
plt.xlabel('Column')
plt.ylabel('Row')

# Add grid
for i in range(9):
    plt.axhline(i-0.5, color='gray', linewidth=0.5)
    plt.axvline(i-0.5, color='gray', linewidth=0.5)

# Add text annotations
for i in range(8):
    for j in range(8):
        if position_heatmap[i, j] > 0:
            plt.text(j, i, int(position_heatmap[i, j]), 
                    ha='center', va='center', color='white', fontweight='bold')

plt.tight_layout()
plt.show()

print(f"\nMost played positions:")
sorted_actions = sorted(action_counts.items(), key=lambda x: x[1], reverse=True)[:5]
for action, count in sorted_actions:
    row, col = action // 8, action % 8
    print(f"  Position ({row}, {col}): {count} times")

### Reward Progression

In [None]:
# Plot reward progression
plt.figure(figsize=(12, 4))
plt.plot(game['rewards'], marker='o', markersize=3)
plt.xlabel('Step')
plt.ylabel('Reward')
plt.title('Reward Progression Throughout Game')
plt.grid(True, alpha=0.3)
plt.axhline(y=0, color='black', linestyle='--', alpha=0.5)
plt.tight_layout()
plt.show()

print(f"Total reward: {sum(game['rewards']):.2f}")
print(f"Final result: Black {game['infos'][-1]['black_count']} - {game['infos'][-1]['white_count']} White")

## 5. Statistical Analysis

Perform statistical analysis of agent performance.

In [None]:
# Collect statistics from multiple games
def collect_statistics(algo, opponent="random", num_games=50):
    """Collect detailed statistics from multiple games."""
    stats = {
        'game_lengths': [],
        'final_scores': [],
        'piece_diffs': [],
        'outcomes': [],  # 'win', 'loss', 'draw'
        'total_rewards': [],
    }
    
    env = gym.make("Othello-v0", opponent=opponent)
    
    for game_idx in range(num_games):
        observation, info = env.reset(seed=42 + game_idx)
        done = False
        steps = 0
        total_reward = 0
        
        while not done:
            action = algo.compute_single_action(observation)
            observation, reward, terminated, truncated, info = env.step(action)
            total_reward += reward
            steps += 1
            done = terminated or truncated
        
        # Record statistics
        black_count = info['black_count']
        white_count = info['white_count']
        
        stats['game_lengths'].append(steps)
        stats['final_scores'].append((black_count, white_count))
        stats['piece_diffs'].append(black_count - white_count)
        stats['total_rewards'].append(total_reward)
        
        if black_count > white_count:
            stats['outcomes'].append('win')
        elif white_count > black_count:
            stats['outcomes'].append('loss')
        else:
            stats['outcomes'].append('draw')
    
    return stats

print("Collecting statistics from 50 games vs random opponent...")
stats = collect_statistics(algo, opponent="random", num_games=50)
print("Statistics collected!")

In [None]:
# Analyze statistics
wins = stats['outcomes'].count('win')
losses = stats['outcomes'].count('loss')
draws = stats['outcomes'].count('draw')

print("\n=== Performance Summary ===")
print(f"Games played: {len(stats['outcomes'])}")
print(f"Wins: {wins} ({wins/len(stats['outcomes'])*100:.1f}%)")
print(f"Losses: {losses} ({losses/len(stats['outcomes'])*100:.1f}%)")
print(f"Draws: {draws} ({draws/len(stats['outcomes'])*100:.1f}%)")
print(f"\nMean game length: {np.mean(stats['game_lengths']):.1f} ± {np.std(stats['game_lengths']):.1f}")
print(f"Mean piece difference: {np.mean(stats['piece_diffs']):.1f} ± {np.std(stats['piece_diffs']):.1f}")
print(f"Mean total reward: {np.mean(stats['total_rewards']):.2f} ± {np.std(stats['total_rewards']):.2f}")

# Winning margin analysis
win_margins = [diff for diff, outcome in zip(stats['piece_diffs'], stats['outcomes']) if outcome == 'win']
loss_margins = [abs(diff) for diff, outcome in zip(stats['piece_diffs'], stats['outcomes']) if outcome == 'loss']

if win_margins:
    print(f"\nAverage winning margin: {np.mean(win_margins):.1f} pieces")
if loss_margins:
    print(f"Average losing margin: {np.mean(loss_margins):.1f} pieces")

### Visualizing Statistics

In [None]:
# Create comprehensive statistics visualization
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Outcome distribution
ax = axes[0, 0]
outcome_counts = [wins, losses, draws]
colors = ['green', 'red', 'gray']
ax.pie(outcome_counts, labels=['Wins', 'Losses', 'Draws'], 
       autopct='%1.1f%%', colors=colors, startangle=90)
ax.set_title('Game Outcomes')

# Game length distribution
ax = axes[0, 1]
ax.hist(stats['game_lengths'], bins=15, color='skyblue', edgecolor='black')
ax.axvline(np.mean(stats['game_lengths']), color='red', 
          linestyle='--', label=f"Mean: {np.mean(stats['game_lengths']):.1f}")
ax.set_xlabel('Game Length')
ax.set_ylabel('Frequency')
ax.set_title('Game Length Distribution')
ax.legend()
ax.grid(True, alpha=0.3)

# Piece difference distribution
ax = axes[1, 0]
ax.hist(stats['piece_diffs'], bins=20, color='lightcoral', edgecolor='black')
ax.axvline(0, color='black', linestyle='-', linewidth=2, label='Draw line')
ax.axvline(np.mean(stats['piece_diffs']), color='red', 
          linestyle='--', label=f"Mean: {np.mean(stats['piece_diffs']):.1f}")
ax.set_xlabel('Piece Difference (Agent - Opponent)')
ax.set_ylabel('Frequency')
ax.set_title('Final Piece Difference Distribution')
ax.legend()
ax.grid(True, alpha=0.3)

# Reward distribution
ax = axes[1, 1]
ax.hist(stats['total_rewards'], bins=15, color='lightgreen', edgecolor='black')
ax.axvline(np.mean(stats['total_rewards']), color='red', 
          linestyle='--', label=f"Mean: {np.mean(stats['total_rewards']):.2f}")
ax.set_xlabel('Total Reward')
ax.set_ylabel('Frequency')
ax.set_title('Total Reward Distribution')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 6. Cleanup

In [None]:
# Stop algorithm and shutdown Ray
algo.stop()
ray.shutdown()
print("Cleanup complete!")

## Summary

In this notebook, we covered:
1. Loading trained agents from checkpoints
2. Evaluating against different opponent types
3. Recording and visualizing complete games
4. Analyzing agent behavior and decision patterns
5. Statistical analysis of performance
6. Comprehensive visualization of results

Key insights:
- Win rate and margin analysis help understand agent strength
- Action heatmaps reveal strategic preferences
- Game length and piece difference distributions show consistency
- Comparing against multiple opponents provides robust evaluation

Next steps:
- Compare multiple trained agents
- Analyze learning progression over training
- Implement tournament-style evaluation
- Study specific game positions and tactics
- Create interactive visualization tools