# Multi-Agent Roles & NPCs Demo Notebook

This notebook demonstrates a minimal multi-agent environment with flexible agent roles/tags (Ally, Competitive, Dialogue, etc.), allowing for easy experimentation with agents that help or hinder each other via parameterized policies or goals.

**Features:**
- `NPCPolicy`: Agents/NPCs with roles, names, and parameterizable behavior
- `TwoAgentEnv`: A simple world for two agents with collision and goal logic
- Example policies for Ally, Competitive, Dialogue, etc.
- Easy extension for more complex experiments


In [None]:
# Imports
import numpy as np
import pandas as pd

## Agent Policy Class with Role Tagging

In [None]:
class NPCPolicy:
    """
    Policy for an agent/NPC with a role (Ally, Competitive, Dialogue, Main, etc.),
    a policy function, and optional goal.
    """
    def __init__(self, role="Ally", policy_func=None, goal=None, name=None):
        self.role = role
        self.policy_func = policy_func or (lambda env, state: 0)  # default: always left
        self.goal = goal
        self.name = name or role

    def act(self, env, state):
        return self.policy_func(env, state)

    def __repr__(self):
        return f"NPCPolicy(role={self.role}, name={self.name})"

## Two-Agent Environment (with Role Metadata)

In [None]:
class TwoAgentEnv:
    """
    Minimal 2-agent 1D world with role metadata.
    - State: [pos_agent0, pos_agent1] (positions on line of length 5)
    - Actions: 0=left, 1=right
    - Goal at position 4
    - Collision penalty if both agents land on same (non-goal) position
    """
    def __init__(self, agent_policies):
        assert len(agent_policies) == 2, "This environment expects exactly 2 agents."
        self.num_positions = 5
        self.goal = 4
        self.agent_policies = agent_policies
        self.reset()

    def reset(self):
        self.positions = [0, 2]
        return self.positions.copy()

    def step(self):
        actions = [agent.act(self, pos) for agent, pos in zip(self.agent_policies, self.positions)]
        rewards = [0.0, 0.0]
        done = False
        for i, action in enumerate(actions):
            if action == 0:
                self.positions[i] = max(0, self.positions[i] - 1)
            else:
                self.positions[i] = min(self.num_positions - 1, self.positions[i] + 1)
        if self.positions[0] == self.positions[1] and self.positions[0] != self.goal:
            rewards = [-1.0, -1.0]
        for i in range(2):
            if self.positions[i] == self.goal:
                rewards[i] += 1.0
                done = True
        info = [{"role": agent.role, "name": agent.name, "pos": pos} for agent, pos in zip(self.agent_policies, self.positions)]
        return self.positions.copy(), rewards, done, info

## Example Role Policies

In [None]:
def ally_policy(env, state):
    """Always move right toward the goal (helpful)."""
    return 1

def competitive_policy(env, state):
    """Always move left, away from the goal (hindering)."""
    return 0

def dialogue_policy(env, state):
    """Random action (placeholder for more complex communication)."""
    return np.random.randint(2)

## Instantiate Agents with Roles

In [None]:
agent_main = NPCPolicy(role="Main", policy_func=ally_policy, name="main_agent")
agent_ally = NPCPolicy(role="Ally", policy_func=ally_policy, name="ally_npc")
agent_comp = NPCPolicy(role="Competitive", policy_func=competitive_policy, name="competitor_npc")
agent_dialogue = NPCPolicy(role="Dialogue", policy_func=dialogue_policy, name="dialogue_npc")

## Run Experiments: Main Agent with Various NPC Roles

In [None]:
def run_experiment(env, max_steps=10):
    state = env.reset()
    log = []
    for step in range(1, max_steps + 1):
        state, rewards, done, info = env.step()
        log.append({'step': step, 'state': state.copy(), 'rewards': rewards.copy(), 'info': info})
        print(f"Step {step}: State: {state}, Rewards: {rewards}, Info: {info}")
        if done:
            print("\n--- Episode finished ---\n")
            break
    return log

## NPC Experiment Settings (Setting-01, Setting-02, ..., Setting-N) & Results Tracking

Define your experiment settings below. Each setting is a tuple: (description, [main agent, npc agent]). Add more settings as needed.

In [None]:
experiment_settings = [
    ("Main vs Ally", [agent_main, agent_ally]),
    ("Main vs Competitive", [agent_main, agent_comp]),
    ("Main vs Dialogue", [agent_main, agent_dialogue]),
    # Add more settings here (e.g., with parameterized policies or different agent order)
]

experiment_results = []

for setting_name, agents in experiment_settings:
    print(f"\n=== Running setting: {setting_name} ===")
    env = TwoAgentEnv(agents)
    log = run_experiment(env, max_steps=20)
    final = log[-1] if log else {}
    experiment_results.append({
        'Setting': setting_name,
        'Final State': final.get('state', None),
        'Final Rewards': final.get('rewards', None),
        'Steps': final.get('step', None),
        'Full Log': log
    })

# Show summary table
summary = pd.DataFrame([
    {k: v for k, v in result.items() if k != 'Full Log'} for result in experiment_results
])
display(summary)

## Conclusion & Next Steps

This notebook gives you a foundation for experiments with agent roles, interaction policies, and simple multi-agent environments. You can extend these classes for more agents, richer state/action spaces, or more sophisticated role behaviors as your research demands.

- Add more rows to `experiment_settings` to automate more experiment variants.
- Use the `Full Log` field in `experiment_results` for deeper analysis (e.g., plotting, statistics).
- Extend policies for learning, communication, or other advanced behaviors.