## Table of Contents
1. [Setup and Imports](#setup)
2. [Discover Robustness Artifacts](#discover)
3. [Load and Inspect Robustness Data](#load)
4. [Plot Robustness Curves](#curves)
5. [Compute Summary Metrics](#metrics)
6. [Critical Nodes Analysis](#critical)
7. [Interpretation](#interpretation)
8. [Write Report Outputs](#write-outputs)
9. [Reproducibility Notes](#reproducibility)

In [None]:
# ============================================================================
# SETUP AND IMPORTS
# ============================================================================

import json
from pathlib import Path
from datetime import datetime
import warnings

import pandas as pd
import polars as pl
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Project paths
REPO_ROOT = Path.cwd().parent.parent
RESULTS_DIR = REPO_ROOT / "results"
ANALYSIS_DIR = RESULTS_DIR / "analysis"
TABLES_REPORT_DIR = RESULTS_DIR / "tables" / "report"
FIGURES_REPORT_DIR = RESULTS_DIR / "figures" / "report"
WARNINGS_LOG = TABLES_REPORT_DIR / "_warnings.log"

# Notebook identity
NOTEBOOK_ID = "nb05"
NOTEBOOK_NAME = "robustness__percolation_and_hub_dependence"

# Plotting settings
plt.style.use("seaborn-v0_8-whitegrid")
sns.set_palette("husl")

# Ensure output directories exist
TABLES_REPORT_DIR.mkdir(parents=True, exist_ok=True)
FIGURES_REPORT_DIR.mkdir(parents=True, exist_ok=True)

print(f"Analysis dir exists: {ANALYSIS_DIR.exists()}")

In [None]:
# ============================================================================
# HELPER FUNCTIONS
# ============================================================================

def append_warning(message: str, notebook_id: str = NOTEBOOK_ID):
    """Append a warning to the consolidated warnings log."""
    timestamp = datetime.now().isoformat()
    with open(WARNINGS_LOG, "a") as f:
        f.write(f"[{timestamp}] [{notebook_id}] {message}\n")
    print(f"WARNING: {message}")

def safe_load_parquet(path: Path) -> pl.DataFrame | None:
    """Safely load a parquet file, returning None if it fails."""
    try:
        return pl.read_parquet(path)
    except Exception as e:
        append_warning(f"Failed to load {path.name}: {e}")
        return None

def compute_auc(x: np.ndarray, y: np.ndarray) -> float:
    """Compute area under curve using trapezoidal rule."""
    # Ensure sorted by x
    sort_idx = np.argsort(x)
    x_sorted = x[sort_idx]
    y_sorted = y[sort_idx]
    return np.trapz(y_sorted, x_sorted)

<a id="discover"></a>
## 2. Discover Robustness Artifacts

In [None]:
# ============================================================================
# DISCOVER ROBUSTNESS ARTIFACTS
# ============================================================================

robustness_keywords = ["robust", "percol", "attack", "targeted", "random", "giant", "lcc"]

# Search in analysis directory
analysis_files = list(ANALYSIS_DIR.glob("*.parquet")) + list(ANALYSIS_DIR.glob("*.csv")) + list(ANALYSIS_DIR.glob("*.json"))
robustness_candidates = [
    f for f in analysis_files 
    if any(kw in f.name.lower() for kw in robustness_keywords)
]

print(f"Found {len(robustness_candidates)} robustness-related artifacts:")
for rf in sorted(robustness_candidates):
    print(f"  - {rf.name}")

# Look for primary robustness curves file
robustness_curves_file = ANALYSIS_DIR / "robustness_curves.parquet"
robustness_summary_file = ANALYSIS_DIR / "robustness_summary.json"

print(f"\nRobustness curves exists: {robustness_curves_file.exists()}")
print(f"Robustness summary exists: {robustness_summary_file.exists()}")

<a id="load"></a>
## 3. Load and Inspect Robustness Data

In [None]:
# ============================================================================
# LOAD AND INSPECT ROBUSTNESS DATA
# ============================================================================

robustness_curves = None
robustness_summary = None

# Load curves
if robustness_curves_file.exists():
    robustness_curves = safe_load_parquet(robustness_curves_file)
    if robustness_curves is not None:
        print(f"Robustness curves shape: {robustness_curves.shape}")
        print(f"Columns: {robustness_curves.columns}")
        display(robustness_curves.head(10).to_pandas())
else:
    append_warning("robustness_curves.parquet not found")

# Load summary
if robustness_summary_file.exists():
    with open(robustness_summary_file) as f:
        robustness_summary = json.load(f)
    print(f"\nRobustness summary keys: {list(robustness_summary.keys())}")
else:
    print("\nRobustness summary not found")

<a id="curves"></a>
## 4. Plot Robustness Curves

Visualize network fragmentation under different attack scenarios.

In [None]:
# ============================================================================
# PLOT ROBUSTNESS CURVES
# ============================================================================

if robustness_curves is not None:
    # Identify columns
    x_col = next((c for c in ["fraction_removed", "frac_removed", "step", "k"] 
                  if c in robustness_curves.columns), None)
    y_col = next((c for c in ["lcc_fraction", "giant_fraction", "lcc_size", "giant"] 
                  if c in robustness_curves.columns), None)
    scenario_col = next((c for c in ["scenario", "strategy", "attack_type", "method"] 
                         if c in robustness_curves.columns), None)
    
    print(f"X-axis: {x_col}, Y-axis: {y_col}, Scenario: {scenario_col}")
    
    if x_col and y_col:
        fig, ax = plt.subplots(figsize=(10, 6))
        
        if scenario_col:
            # Plot by scenario
            scenarios = robustness_curves[scenario_col].unique().to_list()
            colors = sns.color_palette("husl", len(scenarios))
            
            for scenario, color in zip(sorted(scenarios), colors):
                subset = robustness_curves.filter(pl.col(scenario_col) == scenario).to_pandas()
                ax.plot(subset[x_col], subset[y_col], 
                        label=scenario, color=color, linewidth=2, marker='o', markersize=3)
            ax.legend(title="Attack Scenario")
        else:
            # Single curve
            df = robustness_curves.to_pandas()
            ax.plot(df[x_col], df[y_col], linewidth=2, marker='o', markersize=3)
        
        ax.set_xlabel("Fraction of Nodes Removed")
        ax.set_ylabel("Largest Connected Component Fraction")
        ax.set_title("Network Robustness: Random vs Targeted Attacks")
        ax.set_xlim(0, 1)
        ax.set_ylim(0, 1)
        ax.grid(True, alpha=0.3)
        
        plt.tight_layout()
        fig_path = FIGURES_REPORT_DIR / f"{NOTEBOOK_ID}_robustness_curves.png"
        plt.savefig(fig_path, dpi=150)
        plt.show()
        print(f"âœ… Saved: {fig_path.name}")
    else:
        append_warning(f"Could not identify x ({x_col}) or y ({y_col}) columns for robustness curves")
else:
    print("Not available: robustness curves data not loaded")

<a id="metrics"></a>
## 5. Compute Summary Metrics

Derive robustness metrics like AUC, critical drop point, etc.

In [None]:
# ============================================================================
# COMPUTE SUMMARY METRICS
# ============================================================================

summary_metrics = []

if robustness_curves is not None and x_col and y_col:
    if scenario_col:
        scenarios = robustness_curves[scenario_col].unique().to_list()
        
        for scenario in scenarios:
            subset = robustness_curves.filter(pl.col(scenario_col) == scenario).to_pandas()
            x_vals = subset[x_col].values
            y_vals = subset[y_col].values
            
            # AUC (area under robustness curve)
            auc = compute_auc(x_vals, y_vals)
            
            # Critical drop: fraction removed when LCC drops below 50%
            below_50 = subset[subset[y_col] < 0.5]
            critical_point = below_50[x_col].min() if len(below_50) > 0 else 1.0
            
            # Initial drop rate (slope at beginning)
            if len(x_vals) > 1:
                initial_slope = (y_vals[1] - y_vals[0]) / (x_vals[1] - x_vals[0] + 1e-10)
            else:
                initial_slope = 0
            
            summary_metrics.append({
                "scenario": scenario,
                "auc_robustness": auc,
                "critical_50pct_drop": critical_point,
                "initial_slope": initial_slope,
                "final_lcc": y_vals[-1] if len(y_vals) > 0 else 0
            })
    else:
        df = robustness_curves.to_pandas()
        x_vals = df[x_col].values
        y_vals = df[y_col].values
        auc = compute_auc(x_vals, y_vals)
        
        summary_metrics.append({
            "scenario": "default",
            "auc_robustness": auc,
            "critical_50pct_drop": df[df[y_col] < 0.5][x_col].min() if len(df[df[y_col] < 0.5]) > 0 else 1.0,
            "initial_slope": 0,
            "final_lcc": y_vals[-1] if len(y_vals) > 0 else 0
        })

summary_df = pd.DataFrame(summary_metrics)
if len(summary_df) > 0:
    print("\nROBUSTNESS SUMMARY METRICS:")
    display(summary_df)
    
    # Highlight key finding
    if "random" in summary_df["scenario"].str.lower().values and "targeted" in str(summary_df["scenario"].str.lower().values):
        random_auc = summary_df[summary_df["scenario"].str.contains("random", case=False)]["auc_robustness"].values[0]
        targeted_auc = summary_df[summary_df["scenario"].str.contains("targeted|degree|betweenness", case=False)]["auc_robustness"].values
        if len(targeted_auc) > 0:
            auc_diff = random_auc - targeted_auc[0]
            print(f"\nðŸ“Š HUB DEPENDENCE INDICATOR: Random AUC - Targeted AUC = {auc_diff:.3f}")
            print(f"   (Larger positive values indicate greater vulnerability to targeted attacks)")
else:
    print("Not available: could not compute summary metrics")

<a id="critical"></a>
## 6. Critical Nodes Analysis

Identify which nodes are most critical for network connectivity.

In [None]:
# ============================================================================
# CRITICAL NODES ANALYSIS
# ============================================================================

# Check for critical nodes table from pipeline
critical_nodes_file = RESULTS_DIR / "tables" / "robustness_critical_nodes.csv"

if critical_nodes_file.exists():
    critical_nodes = pd.read_csv(critical_nodes_file)
    print("CRITICAL NODES (from pipeline):")
    display(critical_nodes.head(20))
else:
    print("Critical nodes table not found in pipeline outputs.")
    print("This analysis requires the robustness script to output removal order.")
    
    # Fallback: use centrality as proxy for criticality
    centrality_file = ANALYSIS_DIR / "airport_centrality.parquet"
    if centrality_file.exists():
        print("\nUsing centrality as proxy for critical nodes:")
        centrality = safe_load_parquet(centrality_file)
        if centrality is not None:
            # Rank by betweenness if available, else degree
            rank_col = next((c for c in centrality.columns if "betweenness" in c.lower()), None)
            if not rank_col:
                rank_col = next((c for c in centrality.columns if "degree" in c.lower() or "strength" in c.lower()), None)
            
            if rank_col:
                id_col = next((c for c in ["airport", "node", "id"] if c in centrality.columns), centrality.columns[0])
                top_critical = centrality.sort(rank_col, descending=True).head(20)
                print(f"\nTop 20 by {rank_col}:")
                display(top_critical.to_pandas()[[id_col, rank_col]])

<a id="interpretation"></a>
## 7. Interpretation

### Key Findings (Evidence-Grounded)

*(Populated after running cells above)*

### Mechanistic Explanation

- **Random failure resilience**: Scale-free networks are robust to random node failures due to hub redundancy
- **Targeted attack vulnerability**: Removing high-degree hubs fragments the network quickly
- **Percolation threshold**: Critical point where giant component collapses

### Alternative Explanations
1. Weight definition affects which nodes appear most critical
2. Dynamic recomputation of centrality during attacks may change results
3. Airline-specific analysis might reveal different vulnerability patterns

### Evidence Links
- Figure: `results/figures/report/nb05_robustness_curves.png`
- Table: `results/tables/report/nb05_robustness_summary_metrics.csv`

<a id="write-outputs"></a>
## 8. Write Report Outputs

In [None]:
# ============================================================================
# WRITE REPORT OUTPUTS
# ============================================================================

# Write summary metrics
if len(summary_df) > 0:
    metrics_path = TABLES_REPORT_DIR / f"{NOTEBOOK_ID}_robustness_summary_metrics.csv"
    summary_df.to_csv(metrics_path, index=False)
    print(f"âœ… Wrote: {metrics_path}")

print(f"\nðŸ“‹ All {NOTEBOOK_ID} outputs written.")

<a id="reproducibility"></a>
## 9. Reproducibility Notes

### Input Files Consumed
- `results/analysis/robustness_curves.parquet`
- `results/analysis/robustness_summary.json`
- `results/tables/robustness_critical_nodes.csv` (optional)

### Assumptions Made
1. Robustness computed on the largest connected component
2. Targeted attacks use pre-computed centrality rankings (not recomputed dynamically)
3. LCC fraction is relative to original network size

### Metrics Definitions
- **AUC**: Area under the robustness curve (higher = more robust)
- **Critical 50% drop**: Fraction of nodes removed when LCC drops below 50%
- **Initial slope**: Rate of LCC decline at start of attack sequence

### Outputs Generated
| Artifact | Path |
|----------|------|
| Robustness Curves | `results/figures/report/nb05_robustness_curves.png` |
| Summary Metrics | `results/tables/report/nb05_robustness_summary_metrics.csv` |