<a href="https://colab.research.google.com/github/Eran-BA/ART/blob/main/dcmrta_coalition_postrun.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# DCMRTA Coalition Formation with PoT (Pointer over Heads)

Supervised learning for **coalition formation** using the **official DCMRTA benchmark**.

## Key Innovation: Learnable Coalition Formation

Using the official DCMRTA benchmark (ICRA 2024):
- **20 robots, 50 tasks** per problem
- **Coalition size**: 1-5 robots per task
- **Greedy solution targets** for supervised learning

## Architecture: HybridHRM (identical to Scheduling)

Same PoT architecture with coalition-specific output:
- **Two-timescale reasoning**: H-level (slow/global) + L-level (fast/local)
- **Coalition output**: Sigmoid scores for (task, robot) pairs
- **Top-K selection**: Pick K robots based on coalition size requirement

| Component | Scheduling | Coalition |
|-----------|------------|----------|
| Output | Softmax logits | **Sigmoid scores** |
| Selection | argmax (1) | **Top-K** |
| Loss | Cross-entropy | **Binary CE** |

In [None]:
# üöÄ COLAB SETUP - Run this cell first
#
# Add your GitHub token to Colab Secrets:
#   1. Click üîë icon in left sidebar
#   2. Add secret: GITHUB_TOKEN = your PAT
#   3. Enable 'Notebook access'

import sys
import os

IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    print("üîß Setting up for Google Colab...")
    os.chdir('/content')

    from google.colab import userdata
    try:
        GITHUB_TOKEN = userdata.get('GITHUB_TOKEN')
        print("‚úì GitHub token loaded")
    except:
        raise ValueError("‚ùå Add GITHUB_TOKEN to Colab Secrets")

    GITHUB_USER = "Eran-BA"
    !git config --global url."https://{GITHUB_TOKEN}@github.com/".insteadOf "https://github.com/"

    print("üì¶ Installing dependencies...")
    !pip install torch tqdm matplotlib numpy wandb seaborn pyyaml ortools -q

    try:
        WANDB_KEY = userdata.get('WANDB_API_KEY')
        os.environ['WANDB_API_KEY'] = WANDB_KEY
        print("‚úì W&B API key loaded")
    except:
        print("‚ö†Ô∏è WANDB_API_KEY not found")

    # Clone SymbolicMultiRobotTaskAllocator
    if os.path.exists('/content/SymbolicMultiRobotTaskAllocator/.git'):
        print("üì• Updating SymbolicMultiRobotTaskAllocator...")
        !cd /content/SymbolicMultiRobotTaskAllocator && git pull
    else:
        print("üì• Cloning SymbolicMultiRobotTaskAllocator...")
        !git clone https://{GITHUB_TOKEN}@github.com/{GITHUB_USER}/SymbolicMultiRobotTaskAllocator.git /content/SymbolicMultiRobotTaskAllocator

    # Clone PoT
    if not os.path.exists('/content/PoT/.git'):
        print("üì• Cloning PoT...")
        !git clone --depth 1 https://{GITHUB_TOKEN}@github.com/{GITHUB_USER}/PoT.git /content/PoT

    # Clone DCMRTA (official benchmark)
    if not os.path.exists('/content/DCMRTA/.git'):
        print("üì• Cloning DCMRTA (official benchmark)...")
        !git clone --depth 1 https://github.com/marmotlab/DCMRTA.git /content/DCMRTA

    sys.path.insert(0, '/content/PoT')
    sys.path.insert(0, '/content/PoT/src')
    sys.path.insert(0, '/content/DCMRTA')
    sys.path.insert(0, '/content/SymbolicMultiRobotTaskAllocator')

    os.chdir('/content/SymbolicMultiRobotTaskAllocator')
    DCMRTA_PATH = '/content/DCMRTA'
    print("‚úì Colab setup complete!")
else:
    print("Running locally")
    POT_PATH = '/Users/rnbnrzy/Desktop/PoT'
    MRTA_PATH = '/Users/rnbnrzy/Desktop/SymbolicMultiRobotTaskAllocator'
    DCMRTA_PATH = '/Users/rnbnrzy/Desktop/DCMRTA'

    sys.path.insert(0, POT_PATH)
    sys.path.insert(0, f'{POT_PATH}/src')
    sys.path.insert(0, DCMRTA_PATH)
    sys.path.insert(0, MRTA_PATH)

import torch
import torch.nn.functional as F
import numpy as np
import pickle
from pathlib import Path
from tqdm.auto import tqdm
from torch.utils.data import Dataset, DataLoader

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'Using device: {device}')

üîß Setting up for Google Colab...
‚úì GitHub token loaded
üì¶ Installing dependencies...
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m29.8/29.8 MB[0m [31m90.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m135.8/135.8 kB[0m [31m14.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m323.3/323.3 kB[0m [31m32.5 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
grpcio-status 1.71.2 requires protobuf<6.0dev,>=5.26.1, but you have protobuf 6.33.4 which is incompatible.
tensorflow 2.

## 1. Load DCMRTA Benchmark Data

Using the **official test set** from the DCMRTA paper (ICRA 2024):
- 50 environments with 20 robots, 50 tasks each
- Coalition sizes: 1-5 robots per task
- Pre-computed results for comparison

In [None]:
# Load DCMRTA environments
from env.task_env import TaskEnv

def load_dcmrta_envs(test_set="testSet_20A_50T_CONDET", n_envs=None):
    """Load DCMRTA environments from official repo."""
    test_dir = Path(DCMRTA_PATH) / test_set

    envs = []
    pkl_files = sorted(test_dir.glob("env_*.pkl"))
    if n_envs:
        pkl_files = pkl_files[:n_envs]

    for f in tqdm(pkl_files, desc="Loading DCMRTA"):
        env = pickle.load(open(f, 'rb'))
        envs.append(env)

    return envs

# Load all 50 test environments
print("üì• Loading DCMRTA benchmark...")
envs = load_dcmrta_envs()

# Analyze
print(f"\nüìä DCMRTA Benchmark:")
print(f"   Environments: {len(envs)}")
print(f"   Robots per env: {len(envs[0].agent_dic)}")
print(f"   Tasks per env: {len(envs[0].task_dic)}")

# Coalition size distribution
coalition_sizes = []
for env in envs:
    for task in env.task_dic.values():
        coalition_sizes.append(int(task['requirements'][0]))

from collections import Counter
dist = Counter(coalition_sizes)
print(f"\nüìä Coalition Size Distribution:")
for size in sorted(dist.keys()):
    print(f"   Size {size}: {dist[size]} tasks ({dist[size]/len(coalition_sizes)*100:.1f}%)")

üì• Loading DCMRTA benchmark...


Loading DCMRTA:   0%|          | 0/50 [00:00<?, ?it/s]


üìä DCMRTA Benchmark:
   Environments: 50
   Robots per env: 20
   Tasks per env: 50

üìä Coalition Size Distribution:
   Size 1: 532 tasks (21.3%)
   Size 2: 467 tasks (18.7%)
   Size 3: 490 tasks (19.6%)
   Size 4: 490 tasks (19.6%)
   Size 5: 521 tasks (20.8%)


## 2. Create Dataset

Convert DCMRTA environments to PyTorch dataset:
- Task features: location (x, y) + coalition size
- Robot features: location (x, y) + velocity
- Generate training targets using greedy assignment

In [None]:
class DCMRTADataset(Dataset):
    """DCMRTA Coalition Formation Dataset."""

    def __init__(self, envs, n_tasks=50, n_robots=20):
        self.envs = envs
        self.n_tasks = n_tasks
        self.n_robots = n_robots

    def __len__(self):
        return len(self.envs)

    def __getitem__(self, idx):
        env = self.envs[idx]

        # Task features: [n_tasks, 3] = (x, y, coalition_size/5)
        task_features = torch.zeros(self.n_tasks, 3)
        coalition_sizes = torch.zeros(self.n_tasks, dtype=torch.long)

        for i, task in env.task_dic.items():
            if i < self.n_tasks:
                task_features[i, 0] = task['location'][0]
                task_features[i, 1] = task['location'][1]
                task_features[i, 2] = task['requirements'][0] / 5.0
                coalition_sizes[i] = int(task['requirements'][0])

        # Robot features: [n_robots, 3] = (x, y, velocity)
        robot_features = torch.zeros(self.n_robots, 3)
        for i, agent in env.agent_dic.items():
            if i < self.n_robots:
                robot_features[i, 0] = agent['location'][0]
                robot_features[i, 1] = agent['location'][1]
                robot_features[i, 2] = agent['velocity']

        # Greedy targets
        targets = self._greedy_assignment(env)

        return {
            'task_features': task_features,
            'robot_features': robot_features,
            'coalition_sizes': coalition_sizes,
            'targets': targets,
        }

    def _greedy_assignment(self, env):
        """Generate greedy coalition assignments."""
        targets = torch.zeros(self.n_tasks, self.n_robots)

        for i, task in env.task_dic.items():
            if i >= self.n_tasks:
                continue
            k = int(task['requirements'][0])
            task_loc = np.array(task['location'])

            # Find k closest robots
            dists = []
            for j, agent in env.agent_dic.items():
                if j >= self.n_robots:
                    continue
                d = np.linalg.norm(task_loc - np.array(agent['location']))
                dists.append((j, d))
            dists.sort(key=lambda x: x[1])

            for j, _ in dists[:k]:
                targets[i, j] = 1.0

        return targets

# Split data: 40 train, 10 test
train_envs = envs[:40]
test_envs = envs[40:]

train_dataset = DCMRTADataset(train_envs)
test_dataset = DCMRTADataset(test_envs)

train_loader = DataLoader(train_dataset, batch_size=8, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=8)

print(f"\nüìä Dataset:")
print(f"   Train: {len(train_dataset)} environments")
print(f"   Test: {len(test_dataset)} environments")


üìä Dataset:
   Train: 40 environments
   Test: 10 environments


## 3. Create Model

**CoalitionHybridHRM** - identical architecture to SchedulingHybridHRM with coalition output.

In [None]:
from src.model import CoalitionHybridHRM, create_coalition_model

# Model config (identical to Scheduling)
config = {
    'd_model': 256,
    'd_ctrl': 256,
    'd_ff': 512,
    'n_heads': 8,
    'H_layers': 2,
    'L_layers': 2,
    'H_cycles': 2,
    'L_cycles': 4,
    'halt_max_steps': 2,
    'dropout': 0.1,
    'max_tasks': 50,
    'max_robots': 20,
    'task_feature_dim': 3,
    'robot_feature_dim': 3,
}

model = create_coalition_model(config, device)

print(f"\n‚úÖ Model: CoalitionHybridHRM")
print(f"   Architecture: HybridHRM (identical to Scheduling)")
print(f"   Output: Sigmoid scores for coalition membership")



Created CoalitionHybridHRM with 8,789,546 parameters

‚úÖ Model: CoalitionHybridHRM
   Architecture: HybridHRM (identical to Scheduling)
   Output: Sigmoid scores for coalition membership


## 4. Training

Supervised training with binary cross-entropy loss.

In [None]:
def train_epoch(model, loader, optimizer, device):
    model.train()
    total_loss = 0

    for batch in loader:
        task_features = batch['task_features'].to(device)
        robot_features = batch['robot_features'].to(device)
        targets = batch['targets'].to(device)

        optimizer.zero_grad()

        logits = model(task_features, robot_features)
        loss = F.binary_cross_entropy_with_logits(logits, targets)

        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        optimizer.step()

        total_loss += loss.item()

    return total_loss / len(loader)


def evaluate(model, loader, device):
    model.eval()

    coalition_accs = []
    size_matches = []

    with torch.no_grad():
        for batch in loader:
            task_features = batch['task_features'].to(device)
            robot_features = batch['robot_features'].to(device)
            coalition_sizes = batch['coalition_sizes'].to(device)
            targets = batch['targets'].to(device)

            coalitions = model.predict_coalitions(
                task_features, robot_features, coalition_sizes
            )

            B, T, R = targets.shape
            for b in range(B):
                for t in range(T):
                    if coalition_sizes[b, t] > 0:
                        pred = set(coalitions[b][t])
                        true = set(r for r in range(R) if targets[b, t, r] > 0.5)

                        inter = len(pred & true)
                        union = len(pred | true)
                        coalition_accs.append(inter / union if union > 0 else 0)
                        size_matches.append(1.0 if len(pred) == len(true) else 0.0)

    return {
        'coalition_acc': np.mean(coalition_accs) if coalition_accs else 0,
        'size_match': np.mean(size_matches) if size_matches else 0,
    }


# Training
EPOCHS = 50
LR = 3e-4

optimizer = torch.optim.AdamW(model.parameters(), lr=LR, weight_decay=0.01)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, EPOCHS)

print(f"{'='*60}")
print(f"Training CoalitionHybridHRM for {EPOCHS} epochs on {device}")
print(f"{'='*60}")

best_acc = 0
for epoch in range(EPOCHS):
    train_loss = train_epoch(model, train_loader, optimizer, device)

    if (epoch + 1) % 10 == 0 or epoch == 0:
        metrics = evaluate(model, test_loader, device)

        status = ""
        if metrics['coalition_acc'] > best_acc:
            best_acc = metrics['coalition_acc']
            status = "‚úì NEW BEST"

        print(f"Epoch {epoch+1:3d} | Loss: {train_loss:.4f} | "
              f"Coalition Acc: {metrics['coalition_acc']*100:.1f}% | "
              f"Size Match: {metrics['size_match']*100:.1f}% {status}")

    scheduler.step()

print(f"{'='*60}")
print(f"\nüèÜ Best Coalition Accuracy: {best_acc*100:.1f}%")

Training CoalitionHybridHRM for 50 epochs on cuda
Epoch   1 | Loss: 0.5181 | Coalition Acc: 33.0% | Size Match: 100.0% ‚úì NEW BEST
Epoch  10 | Loss: 0.3693 | Coalition Acc: 77.6% | Size Match: 100.0% ‚úì NEW BEST
Epoch  20 | Loss: 0.0811 | Coalition Acc: 100.0% | Size Match: 100.0% ‚úì NEW BEST
Epoch  30 | Loss: 0.0428 | Coalition Acc: 100.0% | Size Match: 100.0% 
Epoch  40 | Loss: 0.0367 | Coalition Acc: 100.0% | Size Match: 100.0% 
Epoch  50 | Loss: 0.0361 | Coalition Acc: 100.0% | Size Match: 100.0% 

üèÜ Best Coalition Accuracy: 100.0%


## 5. Comparison with Paper Benchmarks

Compare our results with the official DCMRTA paper results.

In [None]:
# Load official paper results
import pandas as pd

print("\n" + "="*60)
print("üìä DCMRTA PAPER BENCHMARK RESULTS")
print("="*60)

results_dir = Path(DCMRTA_PATH) / "testSet_20A_50T_CONDET"

for csv_file in ['REINFORCE_LF.csv', 'REINFORCE_IA.csv', 'OR-Tools.csv', 'CTAS-D_300s.csv']:
    csv_path = results_dir / csv_file
    if csv_path.exists():
        df = pd.read_csv(csv_path)
        print(f"\n{csv_file}:")
        print(f"   Success Rate: {df['success_rate'].mean()*100:.1f}%")
        print(f"   Avg Makespan: {df['makespan'].mean():.2f}")

# Our results
final_metrics = evaluate(model, test_loader, device)

print("\n" + "="*60)
print("üìä OUR RESULTS (PoT CoalitionHybridHRM)")
print("="*60)
print(f"\nüéØ Coalition Accuracy: {final_metrics['coalition_acc']*100:.1f}%")
print(f"üìè Size Match: {final_metrics['size_match']*100:.1f}%")

print("\n" + "="*60)
print("üìã COMPARISON")
print("="*60)
print(f"\n{'Method':<25} {'Metric':>20}")
print("-"*50)
print(f"{'REINFORCE_LF (paper)':<25} {'100% success, 34.8 makespan':>20}")
print(f"{'OR-Tools (paper)':<25} {'100% success, 42.0 makespan':>20}")
print(f"{'Ours (PoT)':<25} {f'{final_metrics["coalition_acc"]*100:.1f}% coalition acc':>20}")

print("\n" + "="*60)
print("Note: Paper reports success_rate and makespan (routing metric).")
print("We report coalition accuracy (correct robot selection).")
print("="*60)


üìä DCMRTA PAPER BENCHMARK RESULTS

REINFORCE_LF.csv:
   Success Rate: 100.0%
   Avg Makespan: 34.83

REINFORCE_IA.csv:
   Success Rate: 100.0%
   Avg Makespan: 35.02

OR-Tools.csv:
   Success Rate: 100.0%
   Avg Makespan: 42.00

CTAS-D_300s.csv:
   Success Rate: 100.0%
   Avg Makespan: 36.91

üìä OUR RESULTS (PoT CoalitionHybridHRM)

üéØ Coalition Accuracy: 100.0%
üìè Size Match: 100.0%

üìã COMPARISON

Method                                  Metric
--------------------------------------------------
REINFORCE_LF (paper)      100% success, 34.8 makespan
OR-Tools (paper)          100% success, 42.0 makespan
Ours (PoT)                100.0% coalition acc

Note: Paper reports success_rate and makespan (routing metric).
We report coalition accuracy (correct robot selection).
