# 🔬 Explore AI Agents

Educational deep dive - learn how different agents think!

---

## Setup

In [None]:
import micropip
import js

# Get current page URL and extract base (handles both localhost and GitHub Pages)
current_path = str(js.location.href)
# Remove everything after /lab/ to get base URL
if '/lab/' in current_path:
    base_url = current_path.split('/lab/')[0]
else:
    base_url = str(js.location.origin)

wheel_url = f'{base_url}/pyodide/utala_kaos_9-0.1.1-py3-none-any.whl'

print(f"Installing from: {wheel_url}")
await micropip.install(wheel_url)
print("✓ Game installed successfully!")

In [None]:
from utala.agents.random_agent import RandomAgent
from utala.agents.heuristic_agent import HeuristicAgent
from utala.agents.monte_carlo_agent import FastMonteCarloAgent, MonteCarloAgent
from utala.evaluation.harness import Harness
import inspect

print("✓ Imports complete!")

## Agent Architecture

All agents implement the same interface:

```python
class Agent(ABC):
    @abstractmethod
    def select_action(
        self,
        state: GameState,
        legal_actions: list[int],
        player: Player
    ) -> int:
        """Select an action from legal_actions."""
        pass
```

Agents **propose** actions, the engine **validates** and **applies** them.

---

## 1. Random Agent

**Strategy**: Uniform random selection from legal actions

**Strengths**: 
- Establishes baseline performance
- Deterministic with seed (reproducible)
- No computation overhead

**Weaknesses**:
- No strategic thinking
- Ignores board state
- Baseline for comparison

In [None]:
# View Random Agent source
print("Random Agent select_action method:")
print("=" * 70)
print(inspect.getsource(RandomAgent.select_action))

## 2. Heuristic Agent

**Strategy**: Rule-based strategic decisions

### Placement Strategy
- **Center prioritization** (1,1) - forms 4 potential lines
- **Edge squares** - form 2-3 potential lines
- **Corner squares** - form 2 potential lines
- **3-in-a-row detection** - complete winning lines
- **Blocking** - prevent opponent victories
- **Face-down deception** - use cards 2,3,9,10 strategically

### Dogfight Strategy
- **As underdog**: Attack when power difference ≤ -2
- **As favorite**: Pass and win via Kaos resolution
- **Weapon conservation**: Save for critical fights
- **Joker awareness**: Consider tie-breaker advantage

**Strengths**:
- Beats Random consistently (~65% win rate)
- Interpretable decisions
- Fast execution

**Weaknesses**:
- Fixed rules, no adaptation
- No look-ahead
- Vulnerable to specific patterns

In [None]:
# View key Heuristic Agent methods
print("Heuristic Agent select_action method (first 30 lines):")
print("=" * 70)
source = inspect.getsource(HeuristicAgent.select_action)
lines = source.split('\n')[:30]
print('\n'.join(lines))
print("\n... (method continues)")

## 3. Monte Carlo Agent

**Strategy**: Simulate future game states to evaluate actions

### Algorithm
1. For each legal action:
   - Clone game state
   - Apply candidate action
   - Simulate N random rollouts to game end
   - Track win rate
2. Select action with highest win rate

### Variants
- **FastMonteCarloAgent**: 10 rollouts (browser-friendly)
- **MonteCarloAgent**: 50 rollouts (default)
- **UltraStrongMonteCarloAgent**: 100 rollouts (slow but strong)

**Strengths**:
- Look-ahead capability
- Adapts to game state
- Beats Random consistently (~79% win rate)
- Often beats Heuristic (~60-70% depending on rollouts)

**Weaknesses**:
- Computationally expensive
- Rollout quality depends on baseline policy
- No explicit strategic knowledge

In [None]:
# View Monte Carlo Agent evaluation method (first 40 lines)
print("Monte Carlo Agent _evaluate_action method:")
print("=" * 70)
source = inspect.getsource(MonteCarloAgent._evaluate_action)
lines = source.split('\n')[:40]
print('\n'.join(lines))
print("\n... (method continues)")

## Comparative Performance

Let's run a quick comparison:

In [None]:
# Create agents
random = RandomAgent("Random", seed=42)
heuristic = HeuristicAgent("Heuristic", seed=123)
mc_fast = FastMonteCarloAgent("MC-Fast", seed=456)

# Run matches
harness = Harness(verbose=False)

print("Running comparative analysis (3 games each)...\n")

# Heuristic vs Random
result1 = harness.run_match(heuristic, random, num_games=3, starting_seed=5000)
print(f"Heuristic vs Random: {result1.player_one_wins} - {result1.player_two_wins} (draws: {result1.draws})")

# Monte Carlo vs Random  
result2 = harness.run_match(mc_fast, random, num_games=3, starting_seed=5100)
print(f"MC-Fast vs Random: {result2.player_one_wins} - {result2.player_two_wins} (draws: {result2.draws})")

# Monte Carlo vs Heuristic
result3 = harness.run_match(mc_fast, heuristic, num_games=3, starting_seed=5200)
print(f"MC-Fast vs Heuristic: {result3.player_one_wins} - {result3.player_two_wins} (draws: {result3.draws})")

print("\n✓ Analysis complete!")

## Key Insights

### Phase 1 Design Principles

1. **Determinism**: All randomness lives in the engine, agents are deterministic
2. **Fixed Action Space**: 86 actions total (81 placements + 4 weapons + 1 pass)
3. **Action Masking**: Illegal actions masked, never removed
4. **State Immutability**: Agents receive read-only state copies
5. **No External Dependencies**: Pure Python, standard library only

### Why No ML Frameworks?

Phase 1 deliberately avoids ML frameworks to:
- Establish **interpretable baselines**
- Verify game has **skill gradient** (not pure luck)
- Understand **what makes strategies strong**
- Create **measurement infrastructure**

Phase 2 will introduce learning methods only after baseline verification.

---

## Experiment Ideas

Want to experiment? Try:

1. **Modify rollout counts**: Create agents with different N values
2. **Hybrid strategies**: Combine heuristics with Monte Carlo
3. **Analyze specific games**: Set seeds and examine decision points
4. **Measure computation time**: Profile agent performance

All source code is accessible - feel free to inspect and modify!

---

[Return to home](index.ipynb) | [Play vs AI](play-vs-ai.ipynb) | [Watch Tournament](watch-tournament.ipynb)