# Tutorial 6: Configuration & Datasets

**Goal:** Configure environments with YAML files and time-series datasets.

**Time:** ~10 minutes

---

## Why Configuration Matters

Production environments need:
- **Reproducibility**: Same config → same environment
- **Flexibility**: Change parameters without code changes
- **Time-series data**: Load profiles, prices, renewable generation

HERON uses:
- **YAML files** for environment configuration
- **Pickle files** for time-series datasets

## Step 1: Understanding the Setup Structure

A HERON setup is a directory with configuration and data:

```
powergrid/setups/ieee34_ieee13/
├── config.yml      # Environment configuration
└── data.pkl        # Time-series data (load, solar, wind, prices)
```

In [None]:
import os
import yaml
import pickle
import numpy as np
from pathlib import Path

# Create a sample setup directory
setup_dir = Path('sample_setup')
setup_dir.mkdir(exist_ok=True)

print(f"Created setup directory: {setup_dir}")

## Step 2: Creating a Configuration File

The `config.yml` defines:
- Environment parameters (episode length, reward structure)
- Microgrid configurations (devices, connections)
- Dataset paths

In [None]:
# Define a sample configuration
config = {
    # Dataset reference
    'dataset_path': 'data.pkl',
    
    # Environment parameters
    'train': True,
    'max_episode_steps': 96,  # 4 days at hourly resolution
    'penalty': 10.0,          # Safety violation penalty
    'share_reward': True,     # CTDE: shared rewards
    
    # DSO (Distribution System Operator) configuration
    'dso_config': {
        'name': 'DSO',
        'network': 'ieee34',
        'load_area': 'BANC',
    },
    
    # Microgrid configurations (used to build agent hierarchy)
    'microgrid_configs': [
        {
            'name': 'MG1',
            'connection_bus': 'DSO Bus 850',
            'base_power': 1.0,
            'load_scale': 0.1,
            'devices': [
                {
                    'type': 'Generator',
                    'name': 'gen1',
                    'config': {
                        'bus': 'Bus 633',
                        'p_max_MW': 2.0,
                        'p_min_MW': 0.5,
                        'cost_curve_coefs': [0.02, 10.0, 0.0],  # a*P^2 + b*P + c
                    },
                },
                {
                    'type': 'ESS',
                    'name': 'ess1',
                    'config': {
                        'bus': 'Bus 634',
                        'e_capacity_MWh': 5.0,
                        'p_max_MW': 1.0,
                        'p_min_MW': -1.0,  # Negative = charging
                        'init_soc': 0.5,
                        'soc_min': 0.1,
                        'soc_max': 0.9,
                    },
                },
            ],
        },
        {
            'name': 'MG2',
            'connection_bus': 'DSO Bus 860',
            'devices': [
                {
                    'type': 'Generator',
                    'name': 'gen2',
                    'config': {
                        'bus': 'Bus 633',
                        'p_max_MW': 1.5,
                        'p_min_MW': 0.3,
                    },
                },
            ],
        },
        {
            'name': 'MG3',
            'connection_bus': 'DSO Bus 890',
            'devices': [
                {
                    'type': 'ESS',
                    'name': 'ess3',
                    'config': {
                        'bus': 'Bus 680',
                        'e_capacity_MWh': 3.0,
                    },
                },
            ],
        },
    ],
}

# Save to YAML
config_path = setup_dir / 'config.yml'
with open(config_path, 'w') as f:
    yaml.dump(config, f, default_flow_style=False, sort_keys=False)

print(f"Saved configuration to: {config_path}")
print("\nConfiguration preview:")
print(yaml.dump(config, default_flow_style=False)[:500] + "...")

## Step 3: Creating a Time-Series Dataset

Datasets contain time-series profiles for:
- **Load**: Demand profiles (typically daily/seasonal patterns)
- **Solar**: PV generation (follows sun)
- **Wind**: Wind generation (more variable)
- **Price**: Electricity market prices

The dataset is organized by area (e.g., 'BANC', 'NP15') for multi-region support.

In [None]:
import matplotlib.pyplot as plt

# Create realistic time-series data
hours_per_year = 8760
hours = np.arange(hours_per_year)

# Daily and seasonal patterns
hour_of_day = hours % 24
day_of_year = hours // 24

# Load profile: daily pattern + seasonal variation + noise
daily_load = 0.6 + 0.3 * np.sin(2 * np.pi * (hour_of_day - 6) / 24)  # Peak at noon
seasonal_load = 1.0 + 0.2 * np.sin(2 * np.pi * day_of_year / 365)    # Higher in summer
load_profile = daily_load * seasonal_load + 0.05 * np.random.randn(hours_per_year)
load_profile = np.clip(load_profile, 0.3, 1.5)

# Solar profile: daylight hours only
solar_profile = np.maximum(0, np.sin(np.pi * (hour_of_day - 6) / 12))  # 6am to 6pm
solar_profile *= (hour_of_day >= 6) & (hour_of_day <= 18)
solar_profile *= 0.8 + 0.2 * np.random.rand(hours_per_year)  # Cloud variability

# Wind profile: more random, but with some correlation
wind_profile = np.zeros(hours_per_year)
wind_profile[0] = 0.5
for i in range(1, hours_per_year):
    wind_profile[i] = 0.9 * wind_profile[i-1] + 0.1 * np.random.rand()
wind_profile = np.clip(wind_profile, 0.1, 0.9)

# Price profile: correlates with load, higher during peak
base_price = 30  # $/MWh
price_profile = base_price + 20 * load_profile + 5 * np.random.randn(hours_per_year)
price_profile = np.clip(price_profile, 15, 100)

print(f"Created {hours_per_year} hours of data")

In [None]:
# Visualize one week of data
week_hours = 168  # 7 days

fig, axes = plt.subplots(2, 2, figsize=(12, 8))

axes[0, 0].plot(load_profile[:week_hours], 'b-')
axes[0, 0].set_title('Load Profile')
axes[0, 0].set_ylabel('Load (p.u.)')

axes[0, 1].plot(solar_profile[:week_hours], 'orange')
axes[0, 1].set_title('Solar Generation')
axes[0, 1].set_ylabel('Solar (p.u.)')

axes[1, 0].plot(wind_profile[:week_hours], 'g-')
axes[1, 0].set_title('Wind Generation')
axes[1, 0].set_ylabel('Wind (p.u.)')
axes[1, 0].set_xlabel('Hour')

axes[1, 1].plot(price_profile[:week_hours], 'r-')
axes[1, 1].set_title('Electricity Price')
axes[1, 1].set_ylabel('Price ($/MWh)')
axes[1, 1].set_xlabel('Hour')

plt.tight_layout()
plt.show()

In [None]:
# Split into train/test and save as pickle
# Dataset is organized by area for multi-region support
train_hours = int(0.8 * hours_per_year)  # 80% for training

dataset = {
    'train': {
        # Load profiles by area
        'load': {
            'BANC': load_profile[:train_hours],
            'AVA': load_profile[:train_hours] * 0.9,  # Slightly different
        },
        # Renewable profiles by region
        'solar': {
            'NP15': solar_profile[:train_hours],
        },
        'wind': {
            'NP15': wind_profile[:train_hours],
        },
        # Price by node
        'price': {
            '0096WD_7_N001': price_profile[:train_hours],
        },
    },
    'test': {
        'load': {
            'BANC': load_profile[train_hours:],
            'AVA': load_profile[train_hours:] * 0.9,
        },
        'solar': {
            'NP15': solar_profile[train_hours:],
        },
        'wind': {
            'NP15': wind_profile[train_hours:],
        },
        'price': {
            '0096WD_7_N001': price_profile[train_hours:],
        },
    },
    'metadata': {
        'resolution': 'hourly',
        'units': {
            'load': 'p.u.',
            'solar': 'p.u. of rated',
            'wind': 'p.u. of rated',
            'price': '$/MWh',
        },
    },
}

# Save to pickle
data_path = setup_dir / 'data.pkl'
with open(data_path, 'wb') as f:
    pickle.dump(dataset, f)

print(f"Saved dataset to: {data_path}")
print(f"Train samples: {len(dataset['train']['load']['BANC'])}")
print(f"Test samples: {len(dataset['test']['load']['BANC'])}")

## Step 4: Loading Setups

Create a loader to manage setups. This is similar to `powergrid.utils.loader.load_dataset()`.

In [None]:
def load_setup(setup_path: str) -> dict:
    """Load a HERON setup from directory.
    
    Args:
        setup_path: Path to setup directory
        
    Returns:
        Configuration dict with resolved dataset
    """
    setup_dir = Path(setup_path)
    
    # Load YAML config
    config_path = setup_dir / 'config.yml'
    with open(config_path) as f:
        config = yaml.safe_load(f)
    
    # Resolve and load dataset
    if 'dataset_path' in config:
        data_path = setup_dir / config['dataset_path']
        with open(data_path, 'rb') as f:
            config['dataset'] = pickle.load(f)
        config['dataset_path'] = str(data_path)
    
    return config


# Test loading
loaded_config = load_setup('sample_setup')

print("Loaded configuration:")
print(f"  max_episode_steps: {loaded_config['max_episode_steps']}")
print(f"  share_reward: {loaded_config['share_reward']}")
print(f"  microgrids: {[mg['name'] for mg in loaded_config['microgrid_configs']]}")
print(f"  dataset keys: {list(loaded_config['dataset'].keys())}")
print(f"  train load areas: {list(loaded_config['dataset']['train']['load'].keys())}")

## Step 5: Building Agents from Configuration

Use the configuration to build the agent hierarchy (bottom-up pattern from Tutorial 02).

In [None]:
from typing import Dict, Any

def build_agents_from_config(config: dict) -> Dict[str, Any]:
    """Build agent hierarchy from configuration.
    
    This demonstrates the bottom-up building pattern:
    1. Create DeviceAgents (FieldAgent level)
    2. Create PowerGridAgents (CoordinatorAgent level) with device subordinates
    3. Create GridSystemAgent (SystemAgent level) with microgrid subordinates
    
    Args:
        config: Loaded configuration dict
        
    Returns:
        Dict with 'system_agent' and 'microgrids' keys
    """
    # In a real implementation, you would import from powergrid:
    # from powergrid.agents import Generator, ESS, Transformer, PowerGridAgent
    # from powergrid.agents.grid_system_agent import GridSystemAgent
    
    microgrids = {}
    
    for mg_config in config.get('microgrid_configs', []):
        mg_name = mg_config['name']
        
        # Step 1: Build device agents (FieldAgent level)
        devices = {}
        for dev_cfg in mg_config.get('devices', []):
            device_type = dev_cfg['type']
            device_name = dev_cfg['name']
            device_config = dev_cfg.get('config', {})
            
            # In real code: device = DeviceClass(agent_id=device_name, **device_config)
            devices[device_name] = {
                'type': device_type,
                'config': device_config,
            }
            print(f"  Created {device_type}: {device_name}")
        
        # Step 2: Build coordinator with device subordinates
        # In real code: microgrid = PowerGridAgent(agent_id=mg_name, subordinates=devices)
        microgrids[mg_name] = {
            'name': mg_name,
            'devices': devices,
            'connection_bus': mg_config.get('connection_bus'),
        }
        print(f"Created PowerGridAgent: {mg_name} with {len(devices)} devices")
    
    # Step 3: Build system agent with microgrid subordinates
    # In real code: system = GridSystemAgent(agent_id='system', subordinates=microgrids)
    system_agent = {
        'type': 'GridSystemAgent',
        'subordinates': microgrids,
    }
    print(f"Created GridSystemAgent with {len(microgrids)} microgrids")
    
    return {
        'system_agent': system_agent,
        'microgrids': microgrids,
    }


# Build from loaded config
print("Building agent hierarchy from config:")
print("=" * 50)
agents = build_agents_from_config(loaded_config)

## Step 6: Using Dataset in Environment Step

Time-series data drives the simulation. The environment reads profiles and pushes them to agents via the ProxyAgent's global state.

In [None]:
class DatasetManager:
    """Manages time-series dataset access during simulation.
    
    This is similar to how HierarchicalMicrogridEnv._update_profiles() works.
    """
    
    def __init__(self, dataset: dict, load_area: str = 'BANC', renew_area: str = 'NP15'):
        self.dataset = dataset
        self.load_area = load_area
        self.renew_area = renew_area
        self.t = 0
        
        # Get max timesteps from price data
        self.max_t = len(dataset.get('price', {}).get('0096WD_7_N001', []))
    
    def reset(self, start_timestep: int = 0):
        """Reset to specific timestep (for random day selection)."""
        self.t = start_timestep
    
    def step(self) -> dict:
        """Get current timestep data and advance."""
        hour = self.t % self.max_t
        
        data = {
            'load': float(self.dataset['load'][self.load_area][hour]),
            'solar': float(self.dataset['solar'][self.renew_area][hour]),
            'wind': float(self.dataset['wind'][self.renew_area][hour]),
            'price': float(self.dataset['price']['0096WD_7_N001'][hour]),
            'timestep': self.t,
            'hour': hour,
        }
        self.t += 1
        return data
    
    def peek(self, horizon: int = 24) -> dict:
        """Get forecast for next `horizon` hours."""
        end = min(self.t + horizon, self.max_t)
        return {
            'load': self.dataset['load'][self.load_area][self.t:end],
            'solar': self.dataset['solar'][self.renew_area][self.t:end],
            'wind': self.dataset['wind'][self.renew_area][self.t:end],
            'price': self.dataset['price']['0096WD_7_N001'][self.t:end],
        }


# Demo usage with train data
dm = DatasetManager(loaded_config['dataset']['train'])

print("Simulating 5 timesteps:")
for step in range(5):
    data = dm.step()
    print(f"  t={step}: load={data['load']:.3f}, solar={data['solar']:.3f}, "
          f"wind={data['wind']:.3f}, price=${data['price']:.2f}/MWh")

## Key Takeaways

1. **Setup Structure**
   ```
   setup_name/
   ├── config.yml    # YAML configuration
   └── data.pkl      # Pickle dataset
   ```

2. **Configuration File**
   - Environment params: `max_episode_steps`, `penalty`, `share_reward`
   - Microgrid configs: devices, connections, parameters
   - Dataset reference: `dataset_path`

3. **Dataset Format** (area-based for multi-region support)
   ```python
   {
       'train': {
           'load': {'BANC': [...], 'AVA': [...]},
           'solar': {'NP15': [...]},
           'wind': {'NP15': [...]},
           'price': {'0096WD_7_N001': [...]},
       },
       'test': {...},
       'metadata': {'resolution': 'hourly', 'units': {...}},
   }
   ```

4. **Loading and Building Pattern**
   ```python
   config = load_setup('my_setup')
   agents = build_agents_from_config(config)
   env = HierarchicalMicrogridEnv(
       system_agent=agents['system_agent'],
       dataset_path=config['dataset_path'],
   )
   ```

---

**Next:** [07_custom_protocols.ipynb](07_custom_protocols.ipynb) — Create custom coordination protocols

In [None]:
# Cleanup
import shutil
shutil.rmtree('sample_setup')
print("Cleaned up sample_setup directory")