# Agentic Workflow with Reinforcement Learning Loop - Collaboration Boost Edition

This notebook demonstrates a sophisticated multi-agent system where agents learn optimal behaviors through reinforcement learning. **This enhanced version** includes proportional collaboration based on the collaboration matrix.

## Key Enhancements in This Version

1. **Proportional Collaboration Boost**: Task completion probability scales with collaboration matrix values
2. **Quality Boost from Collaboration**: Task quality improves when strong collaborators work together
3. **Scaled Collaboration Rewards**: Rewards scale with collaboration strength, not flat bonuses
4. **Expertise Sharing**: Agents can "borrow" skills from collaborators proportional to their history
5. **Collaboration Decay**: Unused collaboration paths slowly decay back to neutral

The architecture combines:
- **Multi-Agent Coordination**: Agents with specialized roles
- **Reinforcement Learning**: PPO-based policy optimization
- **Tool Integration**: Agents can use external tools/APIs
- **Adaptive Workflow**: Learning-based task allocation
- **Production Patterns**: Monitoring, logging, and error handling


## 1. Environment Setup and Dependencies


In [None]:
# Install required packages
!pip install stable-baselines3 gymnasium numpy pandas matplotlib seaborn
!pip install langchain langchain-openai langgraph tensorboard
!pip install ray[rllib] wandb mlflow


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from typing import Dict, List, Tuple, Optional, Any, Union
from dataclasses import dataclass, field
from enum import Enum
import asyncio
import json
import logging
from datetime import datetime
import gymnasium as gym
from gymnasium import spaces
import torch
import torch.nn as nn
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv, SubprocVecEnv
from stable_baselines3.common.callbacks import EvalCallback, CheckpointCallback
from collections import deque
import warnings
warnings.filterwarnings('ignore')

# Setup logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


## 2. Multi-Agent Environment Definition with Collaboration Boost

Define a custom environment where multiple agents collaborate on tasks, learning optimal policies through RL. **This version includes proportional collaboration based on the collaboration matrix.**


In [None]:
class AgentRole(Enum):
    """Defines specialized agent roles in the system"""
    RESEARCHER = "researcher"
    ANALYZER = "analyzer"
    EXECUTOR = "executor"
    VALIDATOR = "validator"
    COORDINATOR = "coordinator"

@dataclass
class Task:
    """Represents a task in the workflow"""
    id: str
    type: str
    complexity: float
    requirements: List[str]
    deadline: float
    priority: float
    status: str = "pending"
    assigned_agent: Optional[str] = None
    completion_time: Optional[float] = None
    quality_score: Optional[float] = None

@dataclass
class AgentState:
    """Tracks individual agent state"""
    id: str
    role: AgentRole
    capacity: float
    expertise: Dict[str, float]
    current_load: float = 0.0
    completed_tasks: int = 0
    success_rate: float = 1.0
    collaboration_score: float = 1.0


In [None]:
class MultiAgentTaskEnvironmentCollabBoost(gym.Env):
    """
    Multi-agent environment for collaborative task execution with RL.
    
    ENHANCED VERSION with Proportional Collaboration:
    - Task completion probability scales with collaboration matrix values
    - Quality scores boosted by collaboration strength
    - Rewards scale with collaboration history
    - Expertise sharing between collaborating agents
    - Collaboration decay for unused partnerships
    """

    def __init__(self, n_agents: int = 4, max_tasks: int = 10):
        super().__init__()
        self.n_agents = n_agents
        self.max_tasks = max_tasks
        self.current_step = 0
        self.max_steps = 200
        
        # Collaboration parameters
        self.collab_growth_rate = 1.02      # How fast collaboration strengthens
        self.collab_decay_rate = 0.998      # How fast unused collaborations decay
        self.collab_reward_scale = 0.5      # Base collaboration reward multiplier
        self.expertise_share_factor = 0.3   # How much expertise can be shared

        # Initialize agents with different roles
        self.agents = self._initialize_agents()

        # Task queue and completed tasks
        self.task_queue = deque()
        self.active_tasks = {}
        self.completed_tasks = []

        # Define observation and action spaces
        # Observation: [agent_states, task_queue_state, collaboration_matrix]
        obs_dim = n_agents * 7 + max_tasks * 5 + n_agents * n_agents
        self.observation_space = spaces.Box(
            low=-np.inf, high=np.inf, shape=(obs_dim,), dtype=np.float32
        )

        # Action: [task_assignment, resource_allocation, collaboration_request]
        self.action_space = spaces.Box(
            low=0, high=1, shape=(n_agents * 3,), dtype=np.float32
        )

        # Metrics tracking
        self.episode_rewards = []
        self.task_completion_times = []
        self.collaboration_matrix = np.ones((n_agents, n_agents))
        
        # Track collaboration events for analysis
        self.collaboration_history = []

    def _initialize_agents(self) -> List[AgentState]:
        """Initialize agents with diverse roles and capabilities"""
        agents = []
        roles = list(AgentRole)[:self.n_agents]

        for i, role in enumerate(roles):
            expertise = {
                "research": np.random.uniform(0.5, 1.0),
                "analysis": np.random.uniform(0.5, 1.0),
                "execution": np.random.uniform(0.5, 1.0),
                "validation": np.random.uniform(0.5, 1.0)
            }

            # Boost expertise based on role
            if role == AgentRole.RESEARCHER:
                expertise["research"] = min(1.0, expertise["research"] + 0.3)
            elif role == AgentRole.ANALYZER:
                expertise["analysis"] = min(1.0, expertise["analysis"] + 0.3)

            agents.append(AgentState(
                id=f"agent_{i}",
                role=role,
                capacity=np.random.uniform(0.8, 1.0),
                expertise=expertise
            ))

        return agents

    def _generate_task(self) -> Task:
        """Generate a new task with random properties"""
        task_types = ["research", "analysis", "execution", "validation"]
        task_type = np.random.choice(task_types)

        return Task(
            id=f"task_{np.random.randint(10000)}",
            type=task_type,
            complexity=np.random.uniform(0.3, 1.0),
            requirements=[np.random.choice(task_types) for _ in range(np.random.randint(1, 3))],
            deadline=np.random.uniform(10, 50),
            priority=np.random.uniform(0.1, 1.0)
        )

    def _get_observation(self) -> np.ndarray:
        """Construct observation vector from current state"""
        obs = []

        # Agent states
        for agent in self.agents:
            obs.extend([
                agent.capacity,
                agent.current_load,
                agent.completed_tasks / max(1, self.current_step),
                agent.success_rate,
                agent.collaboration_score,
                agent.expertise.get("research", 0),
                agent.expertise.get("analysis", 0)
            ])

        # Task queue state
        for i in range(self.max_tasks):
            if i < len(self.task_queue):
                task = list(self.task_queue)[i]
                obs.extend([
                    task.complexity,
                    task.priority,
                    task.deadline,
                    1.0,  # task exists
                    0.0   # not yet assigned
                ])
            else:
                obs.extend([0, 0, 0, 0, 0])

        # Collaboration matrix (flattened)
        obs.extend(self.collaboration_matrix.flatten())

        return np.array(obs, dtype=np.float32)
    
    def _get_collaborating_agents(self, action: np.ndarray) -> set:
        """Identify which agents are signaling collaboration this step"""
        collaborating = set()
        for i in range(self.n_agents):
            if action[i, 2] > 0.7:  # Collaboration threshold
                collaborating.add(i)
        return collaborating
    
    def _calculate_collaboration_boost(self, agent_idx: int, task_type: str, 
                                        collaborating_agents: set) -> float:
        """
        Calculate the collaboration boost for an agent based on:
        - Which other agents are collaborating
        - The collaboration matrix values (history)
        - The expertise of collaborating agents
        """
        if agent_idx not in collaborating_agents:
            return 1.0  # No boost if not collaborating
        
        boost = 1.0
        for other_idx in collaborating_agents:
            if other_idx != agent_idx:
                # Get collaboration strength from matrix
                collab_strength = self.collaboration_matrix[agent_idx, other_idx]
                
                # Get other agent's expertise in the task type
                other_agent = self.agents[other_idx]
                other_expertise = other_agent.expertise.get(task_type, 0.5)
                
                # Calculate contribution: stronger history + better expertise = more boost
                # The boost is proportional to how much the matrix has grown above 1.0
                contribution = (collab_strength - 1.0) * other_expertise * self.expertise_share_factor
                boost += max(0, contribution)
        
        return boost
    
    def _get_best_collaborator(self, agent_idx: int, task_type: str) -> Optional[int]:
        """Find the best potential collaborator for an agent based on matrix and expertise"""
        best_score = 0
        best_idx = None
        
        for j in range(self.n_agents):
            if j != agent_idx:
                collab_strength = self.collaboration_matrix[agent_idx, j]
                expertise = self.agents[j].expertise.get(task_type, 0.5)
                availability = 1.0 - self.agents[j].current_load
                
                # Score combines collaboration history, expertise, and availability
                score = collab_strength * expertise * availability
                
                if score > best_score:
                    best_score = score
                    best_idx = j
        
        return best_idx if best_score > 1.0 else None

    def step(self, action: np.ndarray) -> Tuple[np.ndarray, float, bool, bool, Dict]:
        """Execute action and return new state with COLLABORATION BOOST"""
        self.current_step += 1

        # Parse actions
        action = action.reshape(self.n_agents, 3)
        
        # Identify collaborating agents
        collaborating_agents = self._get_collaborating_agents(action)

        # Process task assignments
        reward = 0
        for i, agent in enumerate(self.agents):
            if len(self.task_queue) > 0 and action[i, 0] > 0.5:
                task = self.task_queue.popleft()
                self._assign_task(agent, task)

                # Calculate immediate reward for assignment
                match_score = agent.expertise.get(task.type, 0.5)
                reward += match_score * task.priority

        # Process active tasks with COLLABORATION BOOST
        completed_this_step = []
        for task_id, (task, agent) in list(self.active_tasks.items()):
            agent_idx = int(agent.id.split("_")[1])
            
            # Base progress from agent's own capabilities
            base_progress = agent.expertise.get(task.type, 0.5) * agent.capacity
            
            # Calculate collaboration boost
            collab_boost = self._calculate_collaboration_boost(
                agent_idx, task.type, collaborating_agents
            )
            
            # Apply boost to progress probability
            effective_progress = min(0.95, base_progress * collab_boost)

            # Check if task completed (probability boosted by collaboration)
            if np.random.random() < effective_progress:
                self._complete_task(task, agent, collab_boost)
                completed_this_step.append(task)

                # Calculate completion reward (quality includes collab boost)
                time_bonus = max(0, 1 - (self.current_step / task.deadline))
                quality = task.quality_score
                reward += (quality * task.priority * (1 + time_bonus)) * 10

        # Update collaboration matrix with PROPORTIONAL rewards
        for i in range(self.n_agents):
            for j in range(self.n_agents):
                if i != j and action[i, 2] > 0.7 and action[j, 2] > 0.7:
                    # Grow the collaboration strength
                    self.collaboration_matrix[i, j] *= self.collab_growth_rate
                    
                    # Cap collaboration strength
                    self.collaboration_matrix[i, j] = min(3.0, self.collaboration_matrix[i, j])
                    
                    # Reward SCALES with collaboration strength (not flat!)
                    collab_reward = self.collab_reward_scale * self.collaboration_matrix[i, j]
                    reward += collab_reward
                    
                    # Record collaboration event
                    self.collaboration_history.append({
                        'step': self.current_step,
                        'agents': (i, j),
                        'strength': self.collaboration_matrix[i, j]
                    })
        
        # Apply collaboration DECAY for unused partnerships
        for i in range(self.n_agents):
            for j in range(self.n_agents):
                if i != j:
                    # Only decay if they didn't collaborate this step
                    if not (action[i, 2] > 0.7 and action[j, 2] > 0.7):
                        # Decay towards 1.0 (neutral)
                        current = self.collaboration_matrix[i, j]
                        self.collaboration_matrix[i, j] = (
                            1.0 + (current - 1.0) * self.collab_decay_rate
                        )

        # Generate new tasks
        if np.random.random() < 0.3:
            self.task_queue.append(self._generate_task())

        # Calculate penalties
        queue_penalty = len(self.task_queue) * 0.1
        overdue_penalty = sum(1 for t in self.active_tasks.values()
                            if self.current_step > t[0].deadline) * 0.5
        reward -= (queue_penalty + overdue_penalty)

        # Check termination
        done = self.current_step >= self.max_steps
        truncated = False

        # Prepare info with collaboration metrics
        info = {
            "completed_tasks": len(completed_this_step),
            "queue_length": len(self.task_queue),
            "active_tasks": len(self.active_tasks),
            "avg_success_rate": np.mean([a.success_rate for a in self.agents]),
            "avg_collab_strength": np.mean(self.collaboration_matrix),
            "max_collab_strength": np.max(self.collaboration_matrix),
            "n_collaborating": len(collaborating_agents)
        }

        return self._get_observation(), reward, done, truncated, info

    def _assign_task(self, agent: AgentState, task: Task):
        """Assign a task to an agent"""
        task.assigned_agent = agent.id
        task.status = "active"
        self.active_tasks[task.id] = (task, agent)
        agent.current_load += task.complexity

    def _complete_task(self, task: Task, agent: AgentState, collab_boost: float = 1.0):
        """Mark a task as completed with collaboration quality bonus"""
        task.status = "completed"
        task.completion_time = self.current_step
        
        # Quality score boosted by collaboration
        base_quality = agent.expertise.get(task.type, 0.5) * agent.success_rate
        task.quality_score = min(1.0, base_quality * collab_boost)  # Cap at 1.0

        self.completed_tasks.append(task)
        del self.active_tasks[task.id]

        agent.current_load = max(0, agent.current_load - task.complexity)
        agent.completed_tasks += 1
        agent.success_rate = 0.95 * agent.success_rate + 0.05 * task.quality_score
        
        # Boost collaboration score for agents who benefited from collaboration
        if collab_boost > 1.0:
            agent.collaboration_score = min(2.0, agent.collaboration_score * (1 + 0.02 * (collab_boost - 1)))

    def reset(self, seed=None, options=None) -> Tuple[np.ndarray, Dict]:
        """Reset environment to initial state"""
        super().reset(seed=seed)

        self.current_step = 0
        self.agents = self._initialize_agents()
        self.task_queue = deque([self._generate_task() for _ in range(3)])
        self.active_tasks = {}
        self.completed_tasks = []
        self.collaboration_matrix = np.ones((self.n_agents, self.n_agents))
        self.collaboration_history = []

        return self._get_observation(), {}

    def render(self):
        """Render current environment state with collaboration info"""
        print(f"\nStep: {self.current_step}")
        print(f"Tasks in queue: {len(self.task_queue)}")
        print(f"Active tasks: {len(self.active_tasks)}")
        print(f"Completed tasks: {len(self.completed_tasks)}")
        print(f"\nCollaboration Matrix (strength):")
        print(np.round(self.collaboration_matrix, 2))
        print(f"\nAgents:")
        for agent in self.agents:
            print(f"  {agent.id} ({agent.role.value}): Load={agent.current_load:.2f}, "
                  f"Completed={agent.completed_tasks}, Success={agent.success_rate:.2f}, "
                  f"CollabScore={agent.collaboration_score:.2f}")


## 3. Agentic Learning System with PPO


In [None]:
class AgenticRLSystemCollabBoost:
    """Orchestrates multi-agent reinforcement learning workflow with collaboration boost"""

    def __init__(self, env_class, n_envs: int = 4):
        self.env_class = env_class
        self.n_envs = n_envs

        # Create vectorized environment for parallel training
        self.env = DummyVecEnv([lambda: env_class() for _ in range(n_envs)])
        self.eval_env = DummyVecEnv([lambda: env_class()])

        # Initialize PPO model with custom network architecture
        self.model = self._create_model()

        # Metrics tracking
        self.training_history = {
            "episode_rewards": [],
            "episode_lengths": [],
            "success_rates": [],
            "collaboration_scores": [],
            "avg_collab_strength": [],
            "max_collab_strength": []
        }

        # Setup callbacks
        self.callbacks = self._setup_callbacks()

    def _create_model(self) -> PPO:
        """Create PPO model with custom architecture"""
        policy_kwargs = dict(
            net_arch=[
                dict(pi=[256, 256, 128], vf=[256, 256, 128])
            ],
            activation_fn=nn.ReLU
        )

        model = PPO(
            "MlpPolicy",
            self.env,
            learning_rate=3e-4,
            n_steps=2048,
            batch_size=64,
            n_epochs=10,
            gamma=0.99,
            gae_lambda=0.95,
            clip_range=0.2,
            clip_range_vf=None,
            ent_coef=0.01,
            vf_coef=0.5,
            max_grad_norm=0.5,
            policy_kwargs=policy_kwargs,
            verbose=1,
            tensorboard_log="./tensorboard_logs_collab_boost/"
        )

        return model

    def _setup_callbacks(self):
        """Setup training callbacks for monitoring and checkpointing"""
        eval_callback = EvalCallback(
            self.eval_env,
            best_model_save_path="./models/best_model_collab_boost/",
            log_path="./logs_collab_boost/",
            eval_freq=5000,
            deterministic=True,
            render=False,
            n_eval_episodes=10
        )

        checkpoint_callback = CheckpointCallback(
            save_freq=10000,
            save_path="./models/checkpoints_collab_boost/",
            name_prefix="agentic_rl_collab_boost"
        )

        return [eval_callback, checkpoint_callback]

    def train(self, total_timesteps: int = 100000):
        """Train the multi-agent system"""
        logger.info(f"Starting training for {total_timesteps} timesteps")

        self.model.learn(
            total_timesteps=total_timesteps,
            callback=self.callbacks,
            log_interval=10,
            progress_bar=True
        )

        logger.info("Training completed")
        return self.model

    def evaluate(self, n_episodes: int = 10) -> Dict[str, float]:
        """Evaluate the trained model with collaboration metrics"""
        env = self.env_class()

        episode_rewards = []
        episode_lengths = []
        task_completion_rates = []
        collab_strengths = []

        for episode in range(n_episodes):
            obs, _ = env.reset()
            done = False
            episode_reward = 0
            episode_length = 0

            while not done:
                action, _ = self.model.predict(obs, deterministic=True)
                obs, reward, done, truncated, info = env.step(action)
                episode_reward += reward
                episode_length += 1
                
                # Track collaboration metrics
                collab_strengths.append(info.get('avg_collab_strength', 1.0))

            episode_rewards.append(episode_reward)
            episode_lengths.append(episode_length)

            if len(env.completed_tasks) > 0:
                completion_rate = len(env.completed_tasks) / (len(env.completed_tasks) + len(env.task_queue))
                task_completion_rates.append(completion_rate)

        metrics = {
            "mean_reward": np.mean(episode_rewards),
            "std_reward": np.std(episode_rewards),
            "mean_episode_length": np.mean(episode_lengths),
            "task_completion_rate": np.mean(task_completion_rates) if task_completion_rates else 0,
            "avg_collaboration_strength": np.mean(collab_strengths),
            "max_collaboration_strength": np.max(collab_strengths)
        }

        return metrics

    def save_model(self, path: str):
        """Save the trained model"""
        self.model.save(path)
        logger.info(f"Model saved to {path}")

    def load_model(self, path: str):
        """Load a pre-trained model"""
        self.model = PPO.load(path, env=self.env)
        logger.info(f"Model loaded from {path}")


## 4. Training and Evaluation Pipeline


In [None]:
def train_agentic_rl_system_collab_boost():
    """Main training pipeline for the agentic RL system with collaboration boost"""

    # Initialize environment and system
    print("Initializing Agentic RL System with Collaboration Boost...")
    system = AgenticRLSystemCollabBoost(MultiAgentTaskEnvironmentCollabBoost, n_envs=4)

    # Training configuration
    training_config = {
        "total_timesteps": 50000,
        "eval_frequency": 5000,
        "n_eval_episodes": 10
    }

    print(f"\nStarting training with {training_config['total_timesteps']} timesteps...")

    # Train the system
    trained_model = system.train(total_timesteps=training_config['total_timesteps'])

    # Evaluate performance
    print("\nEvaluating trained model...")
    metrics = system.evaluate(n_episodes=training_config['n_eval_episodes'])

    print("\n" + "="*50)
    print("Training Complete - Performance Metrics:")
    print("="*50)
    for key, value in metrics.items():
        print(f"{key}: {value:.3f}")

    # Save the model
    system.save_model("./models/trained_agentic_rl_collab_boost")

    return system, metrics

# Train the system
# Uncomment to run training (will take some time)
# system, metrics = train_agentic_rl_system_collab_boost()


## 5. Collaboration Visualization


In [None]:
def visualize_collaboration_matrix(env):
    """Visualize the collaboration matrix as a heatmap"""
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Heatmap of collaboration matrix
    sns.heatmap(env.collaboration_matrix, annot=True, fmt='.2f', 
                cmap='YlOrRd', ax=axes[0],
                xticklabels=[f'Agent {i}' for i in range(env.n_agents)],
                yticklabels=[f'Agent {i}' for i in range(env.n_agents)])
    axes[0].set_title('Collaboration Matrix Strength')
    axes[0].set_xlabel('Agent')
    axes[0].set_ylabel('Agent')
    
    # Bar plot of collaboration history over time
    if env.collaboration_history:
        steps = [h['step'] for h in env.collaboration_history]
        strengths = [h['strength'] for h in env.collaboration_history]
        axes[1].scatter(steps, strengths, alpha=0.5, c='blue')
        axes[1].set_xlabel('Step')
        axes[1].set_ylabel('Collaboration Strength')
        axes[1].set_title('Collaboration Events Over Time')
        axes[1].grid(True, alpha=0.3)
    else:
        axes[1].text(0.5, 0.5, 'No collaboration history yet', 
                     ha='center', va='center', transform=axes[1].transAxes)
    
    plt.tight_layout()
    plt.show()

def demo_collaboration_boost():
    """Demo the collaboration boost environment"""
    env = MultiAgentTaskEnvironmentCollabBoost(n_agents=4)
    obs, _ = env.reset()
    
    print("="*60)
    print("COLLABORATION BOOST DEMO")
    print("="*60)
    
    total_reward = 0
    for step in range(50):
        # Random action with high collaboration signals
        action = np.random.uniform(0, 1, size=(env.n_agents * 3,))
        # Force some agents to collaborate
        action[2] = 0.9  # Agent 0 collaborates
        action[5] = 0.9  # Agent 1 collaborates
        
        obs, reward, done, truncated, info = env.step(action)
        total_reward += reward
        
        if step % 10 == 0:
            print(f"\nStep {step}:")
            print(f"  Reward: {reward:.2f}, Total: {total_reward:.2f}")
            print(f"  Avg Collab Strength: {info['avg_collab_strength']:.3f}")
            print(f"  Max Collab Strength: {info['max_collab_strength']:.3f}")
            print(f"  Completed Tasks: {info['completed_tasks']}")
    
    print("\n" + "="*60)
    print("FINAL STATE")
    print("="*60)
    env.render()
    
    # Visualize collaboration
    visualize_collaboration_matrix(env)
    
    return env

# Run demo
demo_env = demo_collaboration_boost()


## Summary - Collaboration Boost Enhancements

This notebook extends the original agentic RL workflow with **proportional collaboration** based on the collaboration matrix.

### Key Enhancements

| Feature | Original | Collaboration Boost Version |
|---------|----------|----------------------------|
| **Task Progress** | Fixed probability based on expertise | Probability scales with collaboration matrix values |
| **Quality Score** | Based only on agent expertise | Boosted by collaboration strength |
| **Collaboration Reward** | Flat 0.5 bonus | Scales with matrix value: `0.5 * collab_strength` |
| **Expertise Sharing** | None | Agents share skills proportional to collaboration history |
| **Matrix Dynamics** | Only grows | Grows when used, **decays** when unused |

### How Collaboration Boost Works

```
collaboration_boost = 1.0 + Î£ (collab_matrix[i,j] - 1.0) * other_expertise * share_factor
                           for all collaborating agents j

effective_progress = base_progress * collaboration_boost
task_quality = base_quality * collaboration_boost
```

### Configurable Parameters

- `collab_growth_rate` (default: 1.02): How fast collaboration strengthens per step
- `collab_decay_rate` (default: 0.998): How fast unused collaborations decay
- `collab_reward_scale` (default: 0.5): Base multiplier for collaboration rewards
- `expertise_share_factor` (default: 0.3): How much expertise can be shared between collaborators

### Expected Emergent Behaviors

1. **Team Formation**: Agents learn to form consistent pairs/teams
2. **Specialization**: Complementary agents collaborate more (researcher + analyzer)
3. **Strategic Collaboration**: Agents learn WHEN to collaborate (complex tasks) vs solo work
4. **Relationship Maintenance**: Agents maintain valuable partnerships to prevent decay
