

# Tutorial: Generating a Complete Synthetic Grid from a Pandapower Network

This notebook demonstrates how to use the Bayesian grid synthesis package to generate a complete, synthetic power distribution network. We will start with an existing `pandapower` test case and use its topology to generate stochastic parameters for:

1.  **Bus-Level Data:**
      * Phase Allocation (A, B, C, AB, ABC, etc.)
      * Three-Phase Power (kW)
      * Failure Frequency (CAIFI/FIC)
      * Failure Duration (CAIDI/DIC)
2.  **Line-Level Data:**
      * Resistance (R1)
      * Reactance (X1)

## imports

First, let's import all the necessary libraries. This includes `pandapower` for network handling, `numpy`/`pandas` for data manipulation, and our custom Bayesian model classes.



In [8]:
import os
import pandapower as pp
import pandapower.networks as pn
import networkx as nx
import numpy as np
import pandas as pd
import warnings
from scipy.stats import mode

# Import our custom Bayesian model classes

from bayesgrid import BayesianPowerModel, BayesianFrequencyModel, BayesianDurationModel, BayesianImpedanceModel

from bayesgrid import save_bus_metric_samples,save_power_phase_samples,save_impedance_samples


# Suppress warnings for a cleaner tutorial
warnings.filterwarnings('ignore', category=UserWarning)
warnings.filterwarnings('ignore', category=FutureWarning)

print("Libraries imported successfully.")

Libraries imported successfully.


# Step 1: setup and initialize BHM

* Please be aware: in the background, the traces from the bayesian models are being loaded. This may take some time. 

* The traces being loaded represents previous knowledge learned from a public database. It is possible to create a bayesian model from your own data. To learn how to do it, see notebook tutorial 3: Learning the bhm from new data

* You can define the total demand for the power model, if you want. This will be the total active power (in kW) to be splitted across the buses.

In [2]:
# --- Set global generation parameters ---
RANDOM_SEED = 42
N_SAMPLES_TO_GENERATE = 100 # We will generate 100 synthetic network samples
OUTPUT_FOLDER = 'new_pp_synthetic_net' # Folder to save our results

# 1. Power & Phase Model
# We set total_demand=None to not rescale the output
bhm = BayesianPowerModel(total_demand=1e3) # 1 GW of total demand

# 2. Frequency Model
bfm = BayesianFrequencyModel()

# 3. Duration Model
bdm = BayesianDurationModel()

# 4. Impedance Model (R and X)
bim = BayesianImpedanceModel()

No trace_path provided. Fetching default pre-trained model...
Loading trace from C:\Users\hoc\AppData\Local\bayesgrid\bayesgrid\Cache\trace_power_and_phase.nc...
Successfully loaded pre-trained model.
Model was trained with 3599 buses and 10 zones.
No trace_path provided. Fetching default pre-trained model...
Loading trace from C:\Users\hoc\AppData\Local\bayesgrid\bayesgrid\Cache\trace_fic.nc...
Successfully loaded pre-trained model.
Model was trained with 9328 buses and 30 zones.
No trace_path provided. Fetching default pre-trained model...
Loading trace from C:\Users\hoc\AppData\Local\bayesgrid\bayesgrid\Cache\trace_dic.nc...
Successfully loaded pre-trained model.
Model was trained with 13547 total buses
(8299 positive) and 30 zones.
Fetching default R-model trace...
Fetching default X-model trace...
Loading R1 trace from C:\Users\hoc\AppData\Local\bayesgrid\bayesgrid\Cache\trace_r.nc...
R1 model loaded. Trained with 10353 lines and 10 zones.
Loading X1 trace from C:\Users\hoc\AppDat


# Step 2: Load Input Pandapower Network

Next, we load the `pandapower` network we want to use as a *topological template*. We'll use its graph structure to calculate the "hop distance" (topological distance) from the substation, which is the main input for our models.

We'll use `case118` for this example, but you can use any `pandapower` network.

In [3]:
# Load a standard pandapower test case
net = pn.case118()

# Create a NetworkX graph from the pandapower net
# This graph is essential for calculating distances
graph_from_net = pn.create_nxgraph(net, include_trafos=True)

# Find the main source bus (usually the external grid connection)
source_bus = net.ext_grid.bus.iloc[0]

print(f"Loaded network 'case118' with {len(net.bus)} buses and {len(net.line)} lines.")
print(f"Source bus (ext_grid) identified at index: {source_bus}")

Loaded network 'case118' with 118 buses and 173 lines.
Source bus (ext_grid) identified at index: 68


# Step 3: Calculate Hop & Electrical Distances

This is the most important preprocessing step. We convert the network topology into the simple list of *zone indices* that our Bayesian models require.

  * **Hop Distance (Buses):** The number of lines/transformers between a bus and the source bus.
  * **Electrical Distance (Lines):** For lines, we'll use the hop distance of their `from_bus` as an approximation.

We'll calculate two sets of bus zones:

1.  `N_BINS_RELIABILITY = 30` (for Frequency and Duration models)
2.  `N_BINS_POWER_IMPD = 10` (for Power and Impedance models)

In [4]:
print("Calculating hop distances for all buses...")

# --- 1. Calculate Hop Distance for all buses ---
hop_distances_new_net = {}
for bus_idx in net.bus.index:
    try:
        # Calculate shortest path length (hop distance)
        dist = nx.shortest_path_length(graph_from_net, source=source_bus, target=bus_idx)
        hop_distances_new_net[bus_idx] = dist
    except (nx.NetworkXNoPath, nx.NodeNotFound):
        # Handle buses that are not connected to the source
        hop_distances_new_net[bus_idx] = np.nan

# Convert to a Series for easier handling
hop_series = pd.Series(hop_distances_new_net, index=net.bus.index)

# For any disconnected buses (NaN), we'll assume they are "far away"
# by filling them with the maximum observed distance.
max_dist = hop_series.max()
hop_series = hop_series.fillna(max_dist)
print(f"Max hop distance found: {max_dist}")

# --- 2. Discretize Bus Hop Distances for RELIABILITY (30 Bins) ---
N_BINS_RELIABILITY = 30
hop_zone_idx_reliability = pd.cut(
    hop_series,
    bins=N_BINS_RELIABILITY,
    labels=False,
    include_lowest=True
).values
print(f"Created {len(hop_zone_idx_reliability)} bus zone indices for reliability (30 bins).")

# --- 3. Discretize Bus Hop Distances for POWER (10 Bins) ---
N_BINS_POWER_IMPD = 10
hop_zone_series_power = pd.cut(
    hop_series,
    bins=N_BINS_POWER_IMPD,
    labels=False,
    include_lowest=True
)
hop_zone_idx_power = hop_zone_series_power.values
print(f"Created {len(hop_zone_idx_power)} bus zone indices for power (10 bins).")

# --- 4. Discretize LINE Electrical Distances (10 Bins) ---
# We use the zone of the 'from_bus' for each line.
from_buses = net.line.from_bus
line_elec_dist_idx = hop_zone_series_power.loc[from_buses].values
print(f"Created {len(line_elec_dist_idx)} line zone indices for impedance (10 bins).")

Calculating hop distances for all buses...
Max hop distance found: 8
Created 118 bus zone indices for reliability (30 bins).
Created 118 bus zone indices for power (10 bins).
Created 173 line zone indices for impedance (10 bins).


# Step 4: Generate All Synthetic Data

* Now for the exciting part! We will feed our pre-processed zone indices into the `.generate_data()` method of each model.
* **Note:** This step may take a couple of minutes. Once complete, you will have `N_SAMPLES_TO_GENERATE` unique synthetic networks stored in memory.
* Since each Bayesian model is independent, you can easily customize this procedure later. For example, you could generate new synthetic power data while keeping the original R and X parameters constant.

In [5]:
print("--- Starting Synthetic Data Generation ---")

# --- 1. Generate Power & Phase ---
# This model correctly generates N_SAMPLES_TO_GENERATE
print("\nGenerating Power and Phase...")
gen_phases, gen_power = bhm.generate_consistent_data(
    new_hop_zone_idx=hop_zone_idx_power,
    graph=graph_from_net,
    source_bus_idx=source_bus,
    scan_draws=N_SAMPLES_TO_GENERATE, # This controls the sample count
    random_seed=RANDOM_SEED
)
print(f"Power/Phase shape: {gen_power.shape}")

# --- 2. Generate Failure Frequency ---
print("\nGenerating Failure Frequency (CAIFI/FIC)...")
gen_freq_all = bfm.generate_data(
    new_hop_zone_idx=hop_zone_idx_reliability,
    random_seed=RANDOM_SEED
)
# *** Manually truncate to N_SAMPLES_TO_GENERATE ***
gen_freq = gen_freq_all[:N_SAMPLES_TO_GENERATE, :]
print(f"Frequency shape: {gen_freq.shape} (truncated from {gen_freq_all.shape[0]} total samples)")

# --- 3. Generate Failure Duration ---
print("\nGenerating Failure Duration (CAIDI/DIC)...")
gen_dur_all = bdm.generate_data(
    new_hop_zone_idx=hop_zone_idx_reliability,
    random_seed=RANDOM_SEED
)
# *** Manually truncate to N_SAMPLES_TO_GENERATE ***
gen_dur = gen_dur_all[:N_SAMPLES_TO_GENERATE, :]
print(f"Duration shape: {gen_dur.shape} (truncated from {gen_dur_all.shape[0]} total samples)")

# --- 4. Generate R1 and X1 Impedance ---
print("\nGenerating Line Impedance (R1/X1)...")
gen_r_all, gen_x_all = bim.generate_data(
    new_elec_dist_idx=line_elec_dist_idx,
    random_seed=RANDOM_SEED
)
# *** Manually truncate to N_SAMPLES_TO_GENERATE ***
gen_r = gen_r_all[:N_SAMPLES_TO_GENERATE, :]
gen_x = gen_x_all[:N_SAMPLES_TO_GENERATE, :]
print(f"R1 shape: {gen_r.shape}, X1 shape: {gen_x.shape} (truncated from {gen_r_all.shape[0]} total samples)")

print("\n--- All Generation Complete ---")

--- Starting Synthetic Data Generation ---

Generating Power and Phase...
Step 1: Generating unconstrained data for 118 buses...


Sampling: [phase_likelihood, power_observed, probs]


Step 1 complete.
Step 2: Calculating phase probabilities...
Step 2a: Analyzing graph topology...
Found 63 ramification nodes.
Step 3: Building graph-consistency scan model for 63 ramification nodes...


Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha, beta]


Sampling 4 chains for 0 tune and 100 draw iterations (0 + 400 draws total) took 30 seconds.
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters.  A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details


Step 5: Mapping consistent phases to full grid...
--- Generation Complete ---
Step 6: Re-allocating power based on consistent phases...
--- Generation Complete ---
Power/Phase shape: (400, 118, 3)

Generating Failure Frequency (CAIFI/FIC)...
New network is smaller or same size. Using padding...


Sampling: [frequency_likelihood]



--- Simulation Complete ---
Shape of final samples: (4000, 118)
Frequency shape: (100, 118) (truncated from 4000 total samples)

Generating Failure Duration (CAIDI/DIC)...

--- Part 1: Predicting the Hurdle for 118 buses ---
New network is smaller or same size. Using padding...


Sampling: [observed_hurdle]



--- Part 2: Generating Potential Positive Durations ---

--- Part 3: Combining Hurdle and Duration Samples ---

--- Simulation Complete ---
Shape of final samples: (4000, 118)
Duration shape: (100, 118) (truncated from 4000 total samples)

Generating Line Impedance (R1/X1)...
Generating R1 samples for 173 lines...


Sampling: [r1_likelihood]


Generating X1 samples for 173 lines...


Sampling: [x1_likelihood]



--- Simulation Complete ---
R1 samples shape: (4000, 173)
X1 samples shape: (4000, 173)
R1 shape: (100, 173), X1 shape: (100, 173) (truncated from 4000 total samples)

--- All Generation Complete ---




# Step 5: Post-Process and Save All Samples

The models have generated `N_SAMPLES_TO_GENERATE` unique synthetic networks. Instead of summarizing them by taking the `mean` or `mode`, we will save **all generated samples**.

This approach is far more flexible. It allows you to analyze the full distribution of synthetic data or select any single sample (e.g., `sample_id = 5`) to get a complete, consistent synthetic network.

We will save the data in a "long" or "tidy" format, where each row represents a single observation for a specific `sample_id`. This format is ideal for analysis and filtering in `pandas`.



In [10]:
gen_power = gen_power[:N_SAMPLES_TO_GENERATE, :, :]

gen_phases = gen_phases[:N_SAMPLES_TO_GENERATE, :]

gen_freq = gen_freq[:N_SAMPLES_TO_GENERATE, :]

gen_dur = gen_dur[:N_SAMPLES_TO_GENERATE, :]

gen_r = gen_r[:N_SAMPLES_TO_GENERATE, :]

gen_x = gen_x[:N_SAMPLES_TO_GENERATE, :]

print(f"Processing all {N_SAMPLES_TO_GENERATE} samples and saving to folder: '{OUTPUT_FOLDER}'")
os.makedirs(OUTPUT_FOLDER, exist_ok=True)

# 1. Save Power and Phase
save_power_phase_samples(
    gen_phases=gen_phases,
    gen_power=gen_power,
    bus_index=net.bus.index,
    phase_map=bhm.get_phase_map(),
    n_samples=N_SAMPLES_TO_GENERATE,
    output_path=os.path.join(OUTPUT_FOLDER, 'bus_power_and_phase_SAMPLES.csv')
)

# 2. Save Frequency
save_bus_metric_samples(
    gen_data=gen_freq,
    col_name='CAIFI_FIC',
    bus_index=net.bus.index,
    n_samples=N_SAMPLES_TO_GENERATE,
    output_path=os.path.join(OUTPUT_FOLDER, 'bus_frequency_SAMPLES.csv')
)

# 3. Save Duration
save_bus_metric_samples(
    gen_data=gen_dur,
    col_name='CAIDI_DIC',
    bus_index=net.bus.index,
    n_samples=N_SAMPLES_TO_GENERATE,
    output_path=os.path.join(OUTPUT_FOLDER, 'bus_duration_SAMPLES.csv')
)

# 4. Save Impedance
save_impedance_samples(
    gen_r=gen_r,
    gen_x=gen_x,
    line_index=net.line.index,
    n_samples=N_SAMPLES_TO_GENERATE,
    output_path=os.path.join(OUTPUT_FOLDER, 'line_impedance_SAMPLES.csv')
)

print("\n--- All Synthetic Data Samples Saved Successfully! ---")

Processing all 100 samples and saving to folder: 'new_pp_synthetic_net'
Processing Power & Phase...
Saved 'bus_power_and_phase_SAMPLES.csv' with 11800 rows.
Processing CAIFI_FIC...
Saved 'bus_frequency_SAMPLES.csv' with 11800 rows.
Processing CAIDI_DIC...
Saved 'bus_duration_SAMPLES.csv' with 11800 rows.
Processing Impedance...
Saved 'line_impedance_SAMPLES.csv' with 17300 rows.

--- All Synthetic Data Samples Saved Successfully! ---


# ðŸ“Š Output File Descriptions

Here is a breakdown of the new files in your `OUTPUT_FOLDER`.

#### 1\. `bus_power_and_phase_SAMPLES.csv`

This file contains the complete sampled data for power and phase for all buses.

  * **Columns:**
      * `sample_id`: An integer from `0` to `N_SAMPLES_TO_GENERATE - 1`.
      * `bus_id`: The ID of the bus (from `net.bus.index`).
      * `P_A`, `P_B`, `P_C`: The specific power (kW) for that bus, for that sample.
      * `phase`: The specific phase allocation (e.g., 'A', 'ABC') for that bus, for that sample.
  * **Total Rows:** `N_SAMPLES_TO_GENERATE` x `N_BUSES`

**Example:** To get the complete data for *synthetic network \#5*, you would filter this DataFrame for `sample_id == 5`.

-----

#### 2\. `bus_frequency_SAMPLES.csv`

This file contains the complete sampled failure frequency data for all buses.

  * **Columns:**
      * `sample_id`: An integer from `0` to `N_SAMPLES_TO_GENERATE - 1`.
      * `bus_id`: The ID of the bus.
      * `CAIFI_FIC`: The specific failure frequency (failures/year) for that bus, for that sample.
  * **Total Rows:** `N_SAMPLES_TO_GENERATE` x `N_BUSES`

-----

#### 3\. `bus_duration_SAMPLES.csv`

This file contains the complete sampled failure duration data for all buses.

  * **Columns:**
      * `sample_id`: An integer from `0` to `N_SAMPLES_TO_GENERATE - 1`.
      * `bus_id`: The ID of the bus.
      * `CAIDI_DIC`: The specific failure duration (hours) for that bus, for that sample.
  * **Total Rows:** `N_SAMPLES_TO_GENERATE` x `N_BUSES`

-----

#### 4\. `line_impedance_SAMPLES.csv`

This file contains the complete sampled impedance data for all lines.

  * **Columns:**
      * `sample_id`: An integer from `0` to `N_SAMPLES_TO_GENERATE - 1`.
      * `line_id`: The ID of the line (from `net.line.index`).
      * `R1_ohm_per_km`: The specific resistance (R1) for that line, for that sample.
      * `X1_ohm_per_km`: The specific reactance (X1) for that line, for that sample.
  * **Total Rows:** `N_SAMPLES_TO_GENERATE` x `N_LINES`

## Tutorial Complete

You have successfully generated a complete set of synthetic parameters for a `pandapower` network.
