# Tutorial 3: Building Agents

**Goal:** Create hierarchical agents (Field and Coordinator) with timing attributes.

**Time:** ~15 minutes

---

## Agent Hierarchy

HERON uses a 3-level hierarchy where each level has different timing:

```
Level 3: SystemAgent (grid operator)      — tick_interval: 300s
    │
Level 2: CoordinatorAgent (microgrid)     — tick_interval: 60s
    │
Level 1: FieldAgent (battery, generator)  — tick_interval: 1s
```

**Why timing matters:**
- In real systems, devices update fast, coordinators update slower
- HERON captures this with `tick_interval` on each agent
- Enables realistic event-driven testing (Tutorial 06)

## Step 1: Building a Field Agent (Battery)

Field agents are the lowest level - they control physical devices.

**Key responsibilities:**
- Maintain device state (SOC, power output)
- Execute actions (charge/discharge)
- Report state to coordinator

In [None]:
import numpy as np
from typing import Any, Dict, Optional
from dataclasses import dataclass

from heron.agents.field_agent import FieldAgent
from heron.core.action import Action
from heron.core.state import FieldAgentState
from heron.core.feature import FeatureProvider


# First, define our feature (from Tutorial 2)
@dataclass
class BatterySOC(FeatureProvider):
    """Battery state of charge."""
    visibility = ['owner', 'upper_level']
    
    soc: float = 0.5
    soc_min: float = 0.1
    soc_max: float = 0.9
    capacity_mwh: float = 1.0
    
    def vector(self) -> np.ndarray:
        return np.array([self.soc, self.soc_min, self.soc_max], dtype=np.float32)
    
    def names(self):
        return ['soc', 'soc_min', 'soc_max']
    
    def to_dict(self):
        return {'soc': self.soc, 'soc_min': self.soc_min, 'soc_max': self.soc_max}
    
    @classmethod
    def from_dict(cls, d):
        return cls(**d)
    
    def set_values(self, **kwargs):
        for k, v in kwargs.items():
            if hasattr(self, k):
                setattr(self, k, float(v))

In [None]:
class SimpleBatteryAgent(FieldAgent):
    """A simple battery agent for the tutorial.
    
    Actions: Continuous power setpoint [-1, 1]
        - Negative = discharge (supply power)
        - Positive = charge (consume power)
    """
    
    def __init__(
        self,
        agent_id: str,
        capacity_mwh: float = 1.0,
        max_power_mw: float = 0.5,
        initial_soc: float = 0.5,
        upstream_id: Optional[str] = None,
    ):
        """Initialize battery agent.
        
        Args:
            agent_id: Unique identifier
            capacity_mwh: Battery capacity in MWh
            max_power_mw: Max charge/discharge rate in MW
            initial_soc: Starting state of charge [0, 1]
            upstream_id: Parent coordinator ID
        """
        # Device parameters
        self.capacity_mwh = capacity_mwh
        self.max_power_mw = max_power_mw
        self.initial_soc = initial_soc
        
        # Initialize parent class
        super().__init__(
            agent_id=agent_id,
            upstream_id=upstream_id,
            config={'name': agent_id},
            tick_interval=1.0,  # Battery ticks every second
        )
        
        # Initialize state and action
        self._setup_state()
        self._setup_action()
    
    def _setup_state(self):
        """Initialize battery state with SOC feature."""
        self.state = FieldAgentState(
            owner_id=self.agent_id,
            owner_level=1  # Field level
        )
        # Register our SOC feature
        self.state.register_feature(
            'soc',
            BatterySOC(
                soc=self.initial_soc,
                capacity_mwh=self.capacity_mwh
            )
        )
    
    def _setup_action(self):
        """Define action space: continuous power setpoint."""
        from gymnasium.spaces import Box
        
        # Action: power setpoint normalized to [-1, 1]
        self.action = Action()
        self.action.set_specs(
            continuous_space=Box(
                low=-1.0,
                high=1.0,
                shape=(1,),
                dtype=np.float32
            )
        )
        self.action_space = self.action.space
        
        # Observation space = state vector size
        obs_dim = len(self.state.vector())
        self.observation_space = Box(
            low=-np.inf,
            high=np.inf,
            shape=(obs_dim,),
            dtype=np.float32
        )
    
    def observe(self, global_state: Dict = None) -> np.ndarray:
        """Get observation vector.
        
        Battery only sees its own state (filtered by visibility).
        """
        return self.state.observed_by(
            requestor_id=self.agent_id,
            requestor_level=1
        )
    
    def act(self, observation: np.ndarray) -> np.ndarray:
        """Compute action from observation.
        
        In training, this is overridden by the RL policy.
        Here we provide a simple heuristic.
        """
        # Simple rule: charge when SOC is low, discharge when high
        soc = self.state.features['soc'].soc
        if soc < 0.3:
            return np.array([0.5])  # Charge
        elif soc > 0.7:
            return np.array([-0.5])  # Discharge
        else:
            return np.array([0.0])  # Hold
    
    def step(self, action: np.ndarray, dt: float = 1.0) -> Dict[str, float]:
        """Execute action and update state.
        
        Args:
            action: Power setpoint [-1, 1]
            dt: Time step in hours
            
        Returns:
            Dict with power flow and new SOC
        """
        # Convert normalized action to actual power
        power_mw = float(action[0]) * self.max_power_mw
        
        # Get current SOC
        soc_feature = self.state.features['soc']
        current_soc = soc_feature.soc
        
        # Calculate energy change (positive = charging)
        energy_mwh = power_mw * dt
        new_soc = current_soc + (energy_mwh / self.capacity_mwh)
        
        # Clip to valid range
        new_soc = np.clip(new_soc, soc_feature.soc_min, soc_feature.soc_max)
        
        # Update state
        soc_feature.set_values(soc=new_soc)
        
        return {
            'power_mw': power_mw,
            'soc': new_soc,
            'energy_mwh': energy_mwh
        }
    
    def reset(self, seed: Optional[int] = None):
        """Reset battery to initial state."""
        self.state.features['soc'].set_values(soc=self.initial_soc)
        return self.observe()

In [None]:
# Test our battery agent
battery = SimpleBatteryAgent(
    agent_id='battery_1',
    capacity_mwh=2.0,
    max_power_mw=1.0,
    initial_soc=0.5,
    upstream_id='microgrid_1'
)

print(f"Agent ID: {battery.agent_id}")
print(f"Upstream: {battery.upstream_id}")
print(f"Observation space: {battery.observation_space}")
print(f"Action space: {battery.action_space}")
print(f"Initial observation: {battery.observe()}")

# Simulate a few steps
print("\nSimulating discharge...")
for i in range(3):
    result = battery.step(np.array([-0.5]), dt=1.0)  # Discharge at 50%
    print(f"  Step {i+1}: SOC={result['soc']:.3f}, Power={result['power_mw']:.2f} MW")

## Step 2: Building a Simple Generator Agent

Similar pattern, different physics.

In [None]:
@dataclass
class GeneratorOutput(FeatureProvider):
    """Generator power output."""
    visibility = ['owner', 'upper_level', 'system']
    
    p_mw: float = 0.0
    p_max: float = 10.0
    p_min: float = 0.0
    
    def vector(self) -> np.ndarray:
        p_norm = self.p_mw / max(self.p_max, 1e-6)
        return np.array([p_norm, self.p_max], dtype=np.float32)
    
    def names(self):
        return ['p_normalized', 'p_max']
    
    def to_dict(self):
        return {'p_mw': self.p_mw, 'p_max': self.p_max, 'p_min': self.p_min}
    
    @classmethod
    def from_dict(cls, d):
        return cls(**d)
    
    def set_values(self, **kwargs):
        for k, v in kwargs.items():
            if hasattr(self, k):
                setattr(self, k, float(v))


class SimpleGeneratorAgent(FieldAgent):
    """A simple generator agent.
    
    Actions: Continuous power setpoint [0, 1] (normalized to p_max)
    """
    
    def __init__(
        self,
        agent_id: str,
        p_max: float = 5.0,
        cost_per_mwh: float = 50.0,
        upstream_id: Optional[str] = None,
    ):
        self.p_max = p_max
        self.cost_per_mwh = cost_per_mwh
        
        super().__init__(
            agent_id=agent_id,
            upstream_id=upstream_id,
            config={'name': agent_id},
            tick_interval=1.0,
        )
        
        self._setup_state()
        self._setup_action()
    
    def _setup_state(self):
        self.state = FieldAgentState(
            owner_id=self.agent_id,
            owner_level=1
        )
        self.state.register_feature(
            'output',
            GeneratorOutput(p_mw=0.0, p_max=self.p_max)
        )
    
    def _setup_action(self):
        from gymnasium.spaces import Box
        
        self.action = Action()
        self.action.set_specs(
            continuous_space=Box(low=0.0, high=1.0, shape=(1,), dtype=np.float32)
        )
        self.action_space = self.action.space
        
        obs_dim = len(self.state.vector())
        self.observation_space = Box(
            low=-np.inf, high=np.inf, shape=(obs_dim,), dtype=np.float32
        )
    
    def observe(self, global_state: Dict = None) -> np.ndarray:
        return self.state.observed_by(self.agent_id, 1)
    
    def act(self, observation: np.ndarray) -> np.ndarray:
        # Simple: run at 50% capacity
        return np.array([0.5])
    
    def step(self, action: np.ndarray, dt: float = 1.0) -> Dict[str, float]:
        power_mw = float(action[0]) * self.p_max
        self.state.features['output'].set_values(p_mw=power_mw)
        
        cost = power_mw * dt * self.cost_per_mwh
        return {'power_mw': power_mw, 'cost': cost}
    
    def reset(self, seed: Optional[int] = None):
        self.state.features['output'].set_values(p_mw=0.0)
        return self.observe()


# Test
gen = SimpleGeneratorAgent('gen_1', p_max=10.0, upstream_id='microgrid_1')
print(f"Generator action space: {gen.action_space}")
result = gen.step(np.array([0.8]))  # 80% output
print(f"Output: {result['power_mw']:.2f} MW, Cost: ${result['cost']:.2f}")

## Step 3: Building a Coordinator Agent (Microgrid)

Coordinators manage multiple field agents. They:
- Aggregate observations from subordinates
- Distribute actions via protocols
- Have their own (larger) observation/action spaces

In [None]:
from heron.agents.coordinator_agent import CoordinatorAgent
from heron.protocols.vertical import SetpointProtocol
from gymnasium.spaces import Box


class SimpleMicrogridAgent(CoordinatorAgent):
    """A simple microgrid coordinator.
    
    Manages a battery and generator, optimizes for cost and grid balance.
    """
    
    def __init__(
        self,
        agent_id: str,
        load_mw: float = 3.0,  # Fixed load for simplicity
        upstream_id: Optional[str] = None,
    ):
        self.load_mw = load_mw
        
        super().__init__(
            agent_id=agent_id,
            upstream_id=upstream_id,
            protocol=SetpointProtocol(),  # Use setpoint-based coordination
            tick_interval=60.0,  # Coordinator ticks every minute
        )
        
        # Create subordinate agents
        self._create_subordinates()
        
        # Setup spaces after subordinates exist
        self._setup_spaces()
    
    def _create_subordinates(self):
        """Create battery and generator as subordinates."""
        # Battery
        self.battery = SimpleBatteryAgent(
            agent_id=f'{self.agent_id}_battery',
            capacity_mwh=2.0,
            max_power_mw=1.0,
            upstream_id=self.agent_id
        )
        
        # Generator
        self.generator = SimpleGeneratorAgent(
            agent_id=f'{self.agent_id}_gen',
            p_max=5.0,
            upstream_id=self.agent_id
        )
        
        # Register as subordinates
        self.subordinates = {
            self.battery.agent_id: self.battery,
            self.generator.agent_id: self.generator,
        }
    
    def _setup_spaces(self):
        """Define observation and action spaces.
        
        Observation: [battery_soc, gen_output, load, imbalance]
        Action: [battery_setpoint, gen_setpoint]
        """
        # Observation: aggregate from subordinates + local info
        # battery: 3 features (soc, soc_min, soc_max)
        # generator: 2 features (p_norm, p_max)
        # local: 2 features (load, imbalance)
        obs_dim = 3 + 2 + 2
        self.observation_space = Box(
            low=-np.inf, high=np.inf, shape=(obs_dim,), dtype=np.float32
        )
        
        # Action: setpoints for both subordinates
        # battery: [-1, 1], generator: [0, 1]
        self.action_space = Box(
            low=np.array([-1.0, 0.0]),
            high=np.array([1.0, 1.0]),
            dtype=np.float32
        )
    
    def observe(self, global_state: Dict = None) -> np.ndarray:
        """Aggregate observations from subordinates.
        
        As coordinator (level 2), we can see subordinate (level 1) features
        marked with 'upper_level' visibility.
        """
        # Get battery observation (visible to upper_level)
        battery_obs = self.battery.state.observed_by(
            requestor_id=self.agent_id,
            requestor_level=2
        )
        
        # Get generator observation
        gen_obs = self.generator.state.observed_by(
            requestor_id=self.agent_id,
            requestor_level=2
        )
        
        # Calculate power imbalance
        gen_power = self.generator.state.features['output'].p_mw
        battery_power = -self.battery.state.features['soc'].soc * 0.5  # Simplified
        imbalance = self.load_mw - gen_power - battery_power
        
        local_obs = np.array([self.load_mw / 10.0, imbalance / 10.0], dtype=np.float32)
        
        return np.concatenate([battery_obs, gen_obs, local_obs])
    
    def act(self, observation: np.ndarray) -> np.ndarray:
        """Compute actions for subordinates.
        
        In training, this is overridden by RL policy.
        """
        # Simple heuristic: balance load with generation
        return np.array([0.0, 0.6])  # Hold battery, run gen at 60%
    
    def step(self, action: np.ndarray, dt: float = 1.0) -> Dict[str, Any]:
        """Distribute actions to subordinates and collect results."""
        battery_action = action[0:1]  # First element
        gen_action = action[1:2]  # Second element
        
        # Execute on subordinates
        battery_result = self.battery.step(battery_action, dt)
        gen_result = self.generator.step(gen_action, dt)
        
        # Calculate microgrid metrics
        total_generation = gen_result['power_mw'] - battery_result['power_mw']
        imbalance = abs(self.load_mw - total_generation)
        
        return {
            'battery': battery_result,
            'generator': gen_result,
            'total_generation': total_generation,
            'load': self.load_mw,
            'imbalance': imbalance,
            'cost': gen_result['cost'],
        }
    
    def reset(self, seed: Optional[int] = None):
        """Reset all subordinates."""
        self.battery.reset(seed)
        self.generator.reset(seed)
        return self.observe()

In [None]:
# Test our microgrid agent
microgrid = SimpleMicrogridAgent(
    agent_id='mg_1',
    load_mw=3.0,
    upstream_id='dso'
)

print(f"Microgrid: {microgrid.agent_id}")
print(f"Subordinates: {list(microgrid.subordinates.keys())}")
print(f"Observation space: {microgrid.observation_space}")
print(f"Action space: {microgrid.action_space}")
print(f"\nInitial observation: {microgrid.observe()}")

# Run a step
action = np.array([-0.3, 0.7])  # Discharge battery, gen at 70%
result = microgrid.step(action)
print(f"\nStep result:")
print(f"  Generation: {result['total_generation']:.2f} MW")
print(f"  Load: {result['load']:.2f} MW")
print(f"  Imbalance: {result['imbalance']:.2f} MW")
print(f"  Cost: ${result['cost']:.2f}")

## Why This Design Matters

### 1. Hierarchy is Explicit
```python
battery.upstream_id = 'microgrid_1'  # Battery reports to microgrid
microgrid.subordinates = {'battery_1': battery}  # Microgrid manages battery
```

### 2. Visibility is Automatic
```python
# Coordinator sees subordinate state (upper_level visibility)
battery_obs = battery.state.observed_by(requestor_id='microgrid_1', requestor_level=2)
```

### 3. Protocols are Pluggable
```python
# Change coordination without touching agent logic
microgrid = SimpleMicrogridAgent(protocol=PriceSignalProtocol())
```

### 4. Timing is Configurable
```python
battery.tick_interval = 1.0    # Fast (device level)
microgrid.tick_interval = 60.0  # Slower (coordinator level)
```

## Key Takeaways

| Agent Type | Level | Responsibility | Typical tick_interval |
|------------|-------|----------------|----------------------|
| `FieldAgent` | 1 | Control physical devices | 1s |
| `CoordinatorAgent` | 2 | Manage subordinates | 60s |
| `SystemAgent` | 3 | System-wide coordination | 300s |

**HERON patterns used:**
1. **Hierarchy via `upstream_id`** — Battery knows its coordinator
2. **Timing via `tick_interval`** — For event-driven mode
3. **Visibility via features** — Coordinators see subordinate state
4. **Protocols** — Swappable coordination (SetpointProtocol, etc.)

---

**Next:** [04_building_environment.ipynb](04_building_environment.ipynb) — Create the environment