# Tutorial 7: Configuration & Datasets

**Goal:** Configure environments with YAML files and time-series datasets.

**Time:** ~10 minutes

---

## Why Configuration Matters

Production environments need:
- **Reproducibility**: Same config → same environment
- **Flexibility**: Change parameters without code changes
- **Time-series data**: Load profiles, prices, renewable generation

HERON uses:
- **YAML files** for environment configuration
- **Pickle files** for time-series datasets

## Step 1: Understanding the Setup Structure

A HERON setup is a directory with configuration and data:

```
powergrid/setups/ieee34_ieee13/
├── config.yml      # Environment configuration
└── data.pkl        # Time-series data (load, solar, wind, prices)
```

In [None]:
import os
import yaml
import pickle
import numpy as np
from pathlib import Path

# Create a sample setup directory
setup_dir = Path('sample_setup')
setup_dir.mkdir(exist_ok=True)

print(f"Created setup directory: {setup_dir}")

## Step 2: Creating a Configuration File

The `config.yml` defines:
- Environment parameters (episode length, reward structure)
- Microgrid configurations (devices, connections)
- Dataset paths

In [None]:
# Define a sample configuration
config = {
    # Dataset reference
    'dataset_path': 'data.pkl',
    
    # Environment parameters
    'train': True,
    'max_episode_steps': 96,  # 4 days at hourly resolution
    'penalty': 10.0,          # Safety violation penalty
    'share_reward': True,     # CTDE: shared rewards
    
    # DSO (Distribution System Operator) configuration
    'dso_config': {
        'name': 'DSO',
        'network': 'ieee34',
        'load_area': 'BANC',
    },
    
    # Microgrid configurations
    'microgrid_configs': [
        {
            'name': 'MG1',
            'connection_bus': 'DSO Bus 850',
            'base_power': 1.0,
            'load_scale': 0.1,
            'devices': [
                {
                    'type': 'Generator',
                    'name': 'gen1',
                    'device_state_config': {
                        'bus': 'Bus 633',
                        'p_max_MW': 2.0,
                        'p_min_MW': 0.5,
                        'cost_curve_coefs': [0.02, 10.0, 0.0],  # a*P^2 + b*P + c
                    },
                },
                {
                    'type': 'ESS',
                    'name': 'ess1',
                    'device_state_config': {
                        'bus': 'Bus 634',
                        'e_capacity_MWh': 5.0,
                        'p_max_MW': 1.0,
                        'p_min_MW': -1.0,  # Negative = charging
                        'init_soc': 0.5,
                        'soc_min': 0.1,
                        'soc_max': 0.9,
                    },
                },
            ],
        },
        {
            'name': 'MG2',
            'connection_bus': 'DSO Bus 860',
            'devices': [
                {
                    'type': 'Generator',
                    'name': 'gen2',
                    'device_state_config': {
                        'bus': 'Bus 633',
                        'p_max_MW': 1.5,
                        'p_min_MW': 0.3,
                    },
                },
            ],
        },
        {
            'name': 'MG3',
            'connection_bus': 'DSO Bus 890',
            'devices': [
                {
                    'type': 'ESS',
                    'name': 'ess3',
                    'device_state_config': {
                        'bus': 'Bus 680',
                        'e_capacity_MWh': 3.0,
                    },
                },
            ],
        },
    ],
}

# Save to YAML
config_path = setup_dir / 'config.yml'
with open(config_path, 'w') as f:
    yaml.dump(config, f, default_flow_style=False, sort_keys=False)

print(f"Saved configuration to: {config_path}")
print("\nConfiguration preview:")
print(yaml.dump(config, default_flow_style=False)[:500] + "...")

## Step 3: Creating a Time-Series Dataset

Datasets contain time-series profiles for:
- **Load**: Demand profiles (typically daily/seasonal patterns)
- **Solar**: PV generation (follows sun)
- **Wind**: Wind generation (more variable)
- **Price**: Electricity market prices

In [None]:
import matplotlib.pyplot as plt

# Create realistic time-series data
hours_per_year = 8760
hours = np.arange(hours_per_year)

# Daily and seasonal patterns
hour_of_day = hours % 24
day_of_year = hours // 24

# Load profile: daily pattern + seasonal variation + noise
daily_load = 0.6 + 0.3 * np.sin(2 * np.pi * (hour_of_day - 6) / 24)  # Peak at noon
seasonal_load = 1.0 + 0.2 * np.sin(2 * np.pi * day_of_year / 365)    # Higher in summer
load_profile = daily_load * seasonal_load + 0.05 * np.random.randn(hours_per_year)
load_profile = np.clip(load_profile, 0.3, 1.5)

# Solar profile: daylight hours only
solar_profile = np.maximum(0, np.sin(np.pi * (hour_of_day - 6) / 12))  # 6am to 6pm
solar_profile *= (hour_of_day >= 6) & (hour_of_day <= 18)
solar_profile *= 0.8 + 0.2 * np.random.rand(hours_per_year)  # Cloud variability

# Wind profile: more random, but with some correlation
wind_profile = np.zeros(hours_per_year)
wind_profile[0] = 0.5
for i in range(1, hours_per_year):
    wind_profile[i] = 0.9 * wind_profile[i-1] + 0.1 * np.random.rand()
wind_profile = np.clip(wind_profile, 0.1, 0.9)

# Price profile: correlates with load, higher during peak
base_price = 30  # $/MWh
price_profile = base_price + 20 * load_profile + 5 * np.random.randn(hours_per_year)
price_profile = np.clip(price_profile, 15, 100)

print(f"Created {hours_per_year} hours of data")

In [None]:
# Visualize one week of data
week_hours = 168  # 7 days

fig, axes = plt.subplots(2, 2, figsize=(12, 8))

axes[0, 0].plot(load_profile[:week_hours], 'b-')
axes[0, 0].set_title('Load Profile')
axes[0, 0].set_ylabel('Load (p.u.)')

axes[0, 1].plot(solar_profile[:week_hours], 'orange')
axes[0, 1].set_title('Solar Generation')
axes[0, 1].set_ylabel('Solar (p.u.)')

axes[1, 0].plot(wind_profile[:week_hours], 'g-')
axes[1, 0].set_title('Wind Generation')
axes[1, 0].set_ylabel('Wind (p.u.)')
axes[1, 0].set_xlabel('Hour')

axes[1, 1].plot(price_profile[:week_hours], 'r-')
axes[1, 1].set_title('Electricity Price')
axes[1, 1].set_ylabel('Price ($/MWh)')
axes[1, 1].set_xlabel('Hour')

plt.tight_layout()
plt.show()

In [None]:
# Split into train/test and save as pickle
train_hours = int(0.8 * hours_per_year)  # 80% for training

dataset = {
    'train': {
        'load': load_profile[:train_hours],
        'solar': solar_profile[:train_hours],
        'wind': wind_profile[:train_hours],
        'price': price_profile[:train_hours],
    },
    'test': {
        'load': load_profile[train_hours:],
        'solar': solar_profile[train_hours:],
        'wind': wind_profile[train_hours:],
        'price': price_profile[train_hours:],
    },
    'metadata': {
        'resolution': 'hourly',
        'units': {
            'load': 'p.u.',
            'solar': 'p.u. of rated',
            'wind': 'p.u. of rated',
            'price': '$/MWh',
        },
    },
}

# Save to pickle
data_path = setup_dir / 'data.pkl'
with open(data_path, 'wb') as f:
    pickle.dump(dataset, f)

print(f"Saved dataset to: {data_path}")
print(f"Train samples: {len(dataset['train']['load'])}")
print(f"Test samples: {len(dataset['test']['load'])}")

## Step 4: Loading Setups

Create a loader to manage setups:

In [None]:
def load_setup(setup_path: str) -> dict:
    """Load a HERON setup from directory.
    
    Args:
        setup_path: Path to setup directory
        
    Returns:
        Configuration dict with resolved dataset
    """
    setup_dir = Path(setup_path)
    
    # Load YAML config
    config_path = setup_dir / 'config.yml'
    with open(config_path) as f:
        config = yaml.safe_load(f)
    
    # Resolve and load dataset
    if 'dataset_path' in config:
        data_path = setup_dir / config['dataset_path']
        with open(data_path, 'rb') as f:
            config['dataset'] = pickle.load(f)
        config['dataset_path'] = str(data_path)
    
    return config


# Test loading
loaded_config = load_setup('sample_setup')

print("Loaded configuration:")
print(f"  max_episode_steps: {loaded_config['max_episode_steps']}")
print(f"  share_reward: {loaded_config['share_reward']}")
print(f"  microgrids: {[mg['name'] for mg in loaded_config['microgrid_configs']]}")
print(f"  dataset keys: {list(loaded_config['dataset'].keys())}")
print(f"  train samples: {len(loaded_config['dataset']['train']['load'])}")

## Step 5: Using Configuration in Environment

Here's how to use the loaded configuration:

In [None]:
# Example: Using config in environment creation
def create_env_from_config(config: dict):
    """Create environment from loaded configuration."""
    
    # Extract parameters
    max_steps = config.get('max_episode_steps', 96)
    share_reward = config.get('share_reward', True)
    penalty = config.get('penalty', 10.0)
    
    # Get dataset for current mode (train/test)
    mode = 'train' if config.get('train', True) else 'test'
    dataset = config.get('dataset', {}).get(mode, {})
    
    # Parse microgrid configs
    microgrids = []
    for mg_config in config.get('microgrid_configs', []):
        mg = {
            'name': mg_config['name'],
            'devices': mg_config.get('devices', []),
            'connection_bus': mg_config.get('connection_bus'),
        }
        microgrids.append(mg)
    
    print(f"Environment config:")
    print(f"  max_steps: {max_steps}")
    print(f"  share_reward: {share_reward}")
    print(f"  mode: {mode}")
    print(f"  microgrids: {len(microgrids)}")
    for mg in microgrids:
        print(f"    - {mg['name']}: {len(mg['devices'])} devices")
    
    return {
        'max_steps': max_steps,
        'share_reward': share_reward,
        'penalty': penalty,
        'microgrids': microgrids,
        'dataset': dataset,
    }


env_config = create_env_from_config(loaded_config)

## Step 6: Using Dataset in Environment Step

Time-series data drives the simulation:

In [None]:
class DatasetManager:
    """Manages time-series dataset access during simulation."""
    
    def __init__(self, dataset: dict):
        self.dataset = dataset
        self.t = 0
        self.max_t = len(dataset.get('load', []))
    
    def reset(self):
        """Reset to start of dataset."""
        self.t = 0
    
    def step(self) -> dict:
        """Get current timestep data and advance."""
        data = {
            'load': self.dataset['load'][self.t % self.max_t],
            'solar': self.dataset['solar'][self.t % self.max_t],
            'wind': self.dataset['wind'][self.t % self.max_t],
            'price': self.dataset['price'][self.t % self.max_t],
        }
        self.t += 1
        return data
    
    def peek(self, horizon: int = 24) -> dict:
        """Get forecast for next `horizon` hours."""
        end = min(self.t + horizon, self.max_t)
        return {
            'load': self.dataset['load'][self.t:end],
            'solar': self.dataset['solar'][self.t:end],
            'wind': self.dataset['wind'][self.t:end],
            'price': self.dataset['price'][self.t:end],
        }


# Demo usage
dm = DatasetManager(env_config['dataset'])

print("Simulating 5 timesteps:")
for step in range(5):
    data = dm.step()
    print(f"  t={step}: load={data['load']:.3f}, solar={data['solar']:.3f}, "
          f"wind={data['wind']:.3f}, price=${data['price']:.2f}/MWh")

## Key Takeaways

1. **Setup Structure**
   ```
   setup_name/
   ├── config.yml    # YAML configuration
   └── data.pkl      # Pickle dataset
   ```

2. **Configuration File**
   - Environment params: `max_episode_steps`, `penalty`, `share_reward`
   - Microgrid configs: devices, connections, parameters
   - Dataset reference: `dataset_path`

3. **Dataset Format**
   ```python
   {
       'train': {'load': [...], 'solar': [...], 'wind': [...], 'price': [...]},
       'test': {'load': [...], ...},
       'metadata': {'resolution': 'hourly', 'units': {...}},
   }
   ```

4. **Loading Pattern**
   ```python
   config = load_setup('my_setup')
   env = MyEnvironment(config)
   ```

---

**Next:** [08_custom_protocols.ipynb](08_custom_protocols.ipynb) — Create custom coordination protocols

In [None]:
# Cleanup
import shutil
shutil.rmtree('sample_setup')
print("Cleaned up sample_setup directory")