# 02 - Section 150 Validation

This notebook validates that the PMU Reliability Framework reproduces all key findings from the original Section 150 analysis.

**Expected Values:**
- Event Count: 301
- MTBF: ~16.4 days
- Network Ratio: 25.1x
- Risk Rank: #1
- Top Cause: Unknown (51 events)
- Peak Hour: 19:00 (7 PM)

In [1]:
import sys
sys.path.insert(0, '../src')

import pandas as pd
import numpy as np

from data_loader import load_pmu_disturbance_data, get_section_events, calculate_event_statistics
from risk_scorer import PMURiskScorer
from temporal_analysis import TemporalAnalyzer

DATA_PATH = '../../data/PMU_disturbance.xlsx'
TARGET_SECTION = 150

## 1. Load Data and Get Section 150 Events

In [2]:
pmu_df, dist_df = load_pmu_disturbance_data(DATA_PATH)
section_150_events = get_section_events(dist_df, TARGET_SECTION)

print(f"Section 150 Event Count: {len(section_150_events)}")
print(f"Expected: 301")
print(f"Match: {'✅' if len(section_150_events) == 301 else '❌'}")

Section 150 Event Count: 301
Expected: 301
Match: ✅


## 2. Validate MTBF

In [3]:
stats = calculate_event_statistics(section_150_events)

print(f"Section 150 MTBF: {stats['mtbf_days']:.2f} days")
print(f"Expected: ~16.4 days")
print(f"Match: {'✅' if 15.5 < stats['mtbf_days'] < 17.5 else '❌'}")

Section 150 MTBF: 16.38 days
Expected: ~16.4 days
Match: ✅


## 3. Validate Network Ratio

In [4]:
# Calculate network average
section_col = [c for c in dist_df.columns if 'section' in c.lower()][0]
events_per_section = dist_df.groupby(section_col).size()
network_avg = events_per_section.mean()

ratio = len(section_150_events) / network_avg

print(f"Network Average: {network_avg:.1f} events/section")
print(f"Section 150: {len(section_150_events)} events")
print(f"Ratio: {ratio:.1f}x")
print(f"Expected: 25.1x")
print(f"Match: {'✅' if 24 < ratio < 26 else '❌'}")

Network Average: 12.0 events/section
Section 150: 301 events
Ratio: 25.1x
Expected: 25.1x
Match: ✅


## 4. Validate Risk Ranking

In [5]:
scorer = PMURiskScorer(pmu_df, dist_df)
risk_results = scorer.calculate_risk_scores()

section_150_rank = risk_results[risk_results['SectionID'] == TARGET_SECTION]['rank'].values[0]

print(f"Section 150 Risk Rank: #{int(section_150_rank)}")
print(f"Expected: #1")
print(f"Match: {'✅' if section_150_rank == 1 else '❌'}")

# Show top 10
print("\nTop 10 Highest Risk Sections:")
risk_results.head(10)[['SectionID', 'risk_score', 'rank', 'category']]

Section 150 Risk Rank: #1
Expected: #1
Match: ✅

Top 10 Highest Risk Sections:


Unnamed: 0,SectionID,risk_score,rank,category
196,150,70.798459,1,High
236,1441,55.67836,2,Medium
191,495,54.027437,3,Medium
195,80,49.942974,4,Medium
73,886,43.899436,5,Medium
22,244,43.878314,6,Medium
65,54,43.85006,7,Medium
81,624,43.769223,8,Medium
52,562,43.292264,9,Medium
78,228,43.130108,10,Medium


## 5. Validate Top Cause

In [6]:
cause_col = [c for c in section_150_events.columns if 'cause' in c.lower()][0]
cause_counts = section_150_events[cause_col].value_counts()
top_cause = cause_counts.index[0]
top_count = cause_counts.iloc[0]

print(f"Top Cause: {top_cause}")
print(f"Count: {top_count}")
print(f"Expected: Unknown with ~51 events")
print(f"Match: {'✅' if 'unknown' in top_cause.lower() else '❌'}")

# Show top 5 causes
print("\nTop 5 Causes for Section 150:")
cause_counts.head()

Top Cause: Unknown
Count: 51
Expected: Unknown with ~51 events
Match: ✅

Top 5 Causes for Section 150:


Cause
Unknown                                                                                 51
Weather                                                                                 29
Weather, excluding lightning                                                            25
Lightning                                                                               18
Weather, excluding lightning - Severe weather in the area. Notified DCC, PSO & OMPA.     5
Name: count, dtype: int64

## 6. Validate Peak Hour

In [7]:
analyzer = TemporalAnalyzer(section_150_events)
peaks = analyzer.calculate_peak_periods()

print(f"Peak Hour: {peaks['peak_hour']}:00")
print(f"Expected: 19:00 (7 PM)")
print(f"Match: {'✅' if peaks['peak_hour'] == 19 else '❌'}")

# Show hourly distribution
hourly = analyzer.calculate_hourly_pattern()
print("\nHourly Distribution:")
hourly

Peak Hour: 19:00
Expected: 19:00 (7 PM)
Match: ✅

Hourly Distribution:


Timestamp
0      6
1      5
2     11
3     10
4     12
5     12
6     16
7     11
8     15
9     19
10    16
11    20
12     7
13    10
14    11
15    13
16    10
17    13
18    20
19    21
20    19
21    12
22     6
23     6
Name: count, dtype: int64

## Summary

| Metric | Expected | Actual | Status |
|--------|----------|--------|--------|
| Event Count | 301 | ? | ? |
| MTBF | ~16.4 days | ? | ? |
| Network Ratio | 25.1x | ? | ? |
| Risk Rank | #1 | ? | ? |
| Top Cause | Unknown | ? | ? |
| Peak Hour | 19:00 | ? | ? |

All values should match the original Section 150 analysis to confirm reproducibility.