# üö® NOTEBOOK 5 ‚Äî Systemic Risk & Inequality Synthesis

## Purpose

Notebook 5 performs **system-level synthesis** by combining independent,
pre-validated analytical lenses built in earlier notebooks.

It studies **how environmental stress, health burden, digital divide,
and risk exposure co-exist and compound across countries and over time**.

This notebook is **descriptive**, **non-causal**, and **non-predictive**.


## Scope & Boundaries

### This notebook IS
- A multi-lens system synthesis
- An analysis of compound and overlapping stress
- A bridge from domain analytics to structural insight

### This notebook IS NOT
- A causal model
- A prediction engine
- A governance or development scorecard
- A policy prescription


## Inputs (Immutable Contracts)

| Lens | Source Notebook | Artifact |
|----|----|----|
| Environmental Stress | N1 | environment_stress_index.csv |
| Health Burden | N2 | health_burden_index.csv |
| Digital Divide | N3 | digital_divide_index.csv |
| Risk Exposure | N4 | risk_exposure_index.csv |

All inputs are treated as **final and immutable**.


## Phase Structure

1. System Ingestion & Contract Validation  
2. Alignment & Intersection Mapping  
3. Compound Risk Characterization  
4. Inequality & Concentration Analysis  
5. System-Level Typologies  
6. Limitations & Boundary Enforcement  
7. Notebook Closure & Research Handoff


## üü¶ Phase 1 ‚Äî System Ingestion & Contract Validation

### Purpose
This phase ingests final index artifacts from Notebooks 1‚Äì4 and validates
that they are structurally compatible for system-level synthesis.

The objective is to:
- Load all index outputs exactly as produced
- Verify schema, grain, scale, and coverage
- Confirm key consistency and uniqueness rules
- Detect alignment issues before any synthesis

This phase performs **no merging, no comparison, and no interpretation**.


### What This Phase DOES NOT Do
- ‚ùå No merging across indices
- ‚ùå No correlation or comparison
- ‚ùå No weighting or aggregation
- ‚ùå No conclusions about inequality or systems


### 1.1 Index Artifacts to Ingest (Authoritative Inputs)
| Notebook | Index | Path |
|--------|------|------|
| N1 | Environmental Stress Index | datasets/processed/climate/environment_stress_index.csv |
| N2 | Health Burden Index | datasets/processed/health/health_burden_index.csv |
| N3 | Digital Divide Index | datasets/processed/eco-digital/digital_divide_index.csv |
| N4 | Risk Exposure Index | datasets/processed/risk/risk_exposure_index.csv |

All artifacts are treated as **final and immutable**.

### 1.2 Load Index Artifacts

In [1]:
from pathlib import Path
import pandas as pd
from utils.path_setup import setup_project_path

PROJECT_ROOT = setup_project_path()

paths = {
    "environment": PROJECT_ROOT / "datasets/processed/climate/environment_stress_index.csv",
    "health": PROJECT_ROOT / "datasets/processed/health/health_burden_index.csv",
    "digital": PROJECT_ROOT / "datasets/processed/eco-digital/digital_divide_index.csv",
    "risk": PROJECT_ROOT / "datasets/processed/risk/risk_exposure_index.csv",
}

df_env = pd.read_csv(paths["environment"])
df_health = pd.read_csv(paths["health"])
df_digital = pd.read_csv(paths["digital"])
df_risk = pd.read_csv(paths["risk"])

(df_env.shape, df_health.shape, df_digital.shape, df_risk.shape)


((8, 3), (3, 3), (17195, 4), (75, 4))

### 1.3 Schema Validation

In [2]:
def schema_check(df, name):
    print(f"\n{name} schema")
    print(df.dtypes)
    print("Columns:", df.columns.tolist())
    print("Nulls:\n", df.isna().sum())

schema_check(df_env, "Environment Index")
schema_check(df_health, "Health Index")
schema_check(df_digital, "Digital Divide Index")
schema_check(df_risk, "Risk Exposure Index")



Environment Index schema
country                      object
year                          int64
environment_stress_index    float64
dtype: object
Columns: ['country', 'year', 'environment_stress_index']
Nulls:
 country                     0
year                        0
environment_stress_index    0
dtype: int64

Health Index schema
country                 object
year                     int64
health_burden_index    float64
dtype: object
Columns: ['country', 'year', 'health_burden_index']
Nulls:
 country                0
year                   0
health_burden_index    0
dtype: int64

Digital Divide Index schema
country                  object
country_code             object
year                      int64
digital_divide_index    float64
dtype: object
Columns: ['country', 'country_code', 'year', 'digital_divide_index']
Nulls:
 country                     0
country_code                0
year                        0
digital_divide_index    12883
dtype: int64

Risk Exposure Index schema

#### üìå Interpretation note :

Missing values in the Digital Divide Index are expected and preserved.
They reflect genuine data absence and are not corrected in this notebook.


### 1.4 Grain & Key Validation

In [3]:
def grain_check(df, name):
    duplicates = df.duplicated(subset=["country", "year"]).sum()
    print(f"{name} duplicate country-year rows:", duplicates)
grain_check(df_env, "Environment")
grain_check(df_health, "Health")
grain_check(df_digital, "Digital")
grain_check(df_risk, "Risk")

Environment duplicate country-year rows: 0
Health duplicate country-year rows: 0
Digital duplicate country-year rows: 0
Risk duplicate country-year rows: 0


### Phase 1 Summary ‚Äî System Inputs Validated

#### What Was Accomplished
- Final index artifacts from Notebooks 1‚Äì4 ingested successfully
- Schemas, key structures, and normalization confirmed
- Country‚Äìyear grain validated across all indices
- Expected missingness preserved without correction
- Interpretation and boundary contracts verified

#### What Was Explicitly Deferred
- Cross-index merging
- Alignment or filtering of years
- Correlation or comparison
- Any synthesis or inequality analysis

#### Phase Boundary Statement
Phase 1 confirms that all system-level inputs are structurally compatible.
No system-level analysis has occurred.
All synthesis decisions are deferred to Phase 2.


## üü¶ Phase 2 ‚Äî Alignment Feasibility & Synthesis Regimes


### Purpose
Independent analytical lenses rarely share a single global
country‚Äìyear intersection in real-world systems.

This phase therefore:
- Evaluates alignment feasibility across lenses
- Documents why a strict global intersection fails
- Defines explicit, valid synthesis regimes

The objective is to preserve analytical integrity while enabling
system-level synthesis without artificial overlap.


### What This Phase DOES NOT Do
- ‚ùå No index combination
- ‚ùå No aggregation or weighting
- ‚ùå No correlation or comparison
- ‚ùå No inequality or system conclusions


### 2.1 Load Phase-1‚ÄìValidated Index Artifacts

In [4]:
import pandas as pd
from utils.path_setup import setup_project_path

PROJECT_ROOT = setup_project_path()

df_env = pd.read_csv(
    PROJECT_ROOT / "datasets/processed/climate/environment_stress_index.csv"
)
df_health = pd.read_csv(
    PROJECT_ROOT / "datasets/processed/health/health_burden_index.csv"
)
df_digital = pd.read_csv(
    PROJECT_ROOT / "datasets/processed/eco-digital/digital_divide_index.csv"
)
df_risk = pd.read_csv(
    PROJECT_ROOT / "datasets/processed/risk/risk_exposure_index.csv"
)


### 2.2 Coverage Reality Check

In [5]:
coverage_summary = pd.DataFrame({
    "Index": [
        "Environment Stress",
        "Health Burden",
        "Digital Divide",
        "Risk Exposure"
    ],
    "Countries": [
        df_env["country"].nunique(),
        df_health["country"].nunique(),
        df_digital["country"].nunique(),
        df_risk["country"].nunique()
    ],
    "Min Year": [
        df_env["year"].min(),
        df_health["year"].min(),
        df_digital["year"].min(),
        df_risk["year"].min()
    ],
    "Max Year": [
        df_env["year"].max(),
        df_health["year"].max(),
        df_digital["year"].max(),
        df_risk["year"].max()
    ],
    "Years Covered": [
        df_env["year"].nunique(),
        df_health["year"].nunique(),
        df_digital["year"].nunique(),
        df_risk["year"].nunique()
    ]
})

coverage_summary


Unnamed: 0,Index,Countries,Min Year,Max Year,Years Covered
0,Environment Stress,1,2016,2025,8
1,Health Burden,1,2000,2021,3
2,Digital Divide,265,1960,2024,65
3,Risk Exposure,75,2021,2021,1


üìå Interpretation note:

For Environment Stress and Health Burden, ‚ÄúCountries = 1‚Äù reflects intentional national-scope analysis (India-only), **not** data loss or filtering.

These lenses are **not** designed for cross-country synthesis.


### 2.2.1 Lens Scope Classification (Critical)

Not all lenses operate at the same spatial scope.

Each index is explicitly classified below:

| Lens | Scope | Notes |
|----|----|----|
| Environmental Stress | India-only | National longitudinal analysis |
| Health Burden | India-only | Sparse temporal coverage |
| Digital Divide | Global | Longitudinal, uneven completeness |
| Risk Exposure | Global | Single-year snapshot (2021) |

‚ö†Ô∏è Implication:
System-wide synthesis is **only valid for lenses with overlapping scope**.
India-only lenses must never be implicitly generalized.


### 2.3 Attempted Global Four-Lens Intersection (Diagnostic)

The first alignment attempt enforces a strict system-wide intersection:
only country‚Äìyear pairs observed across **all four lenses simultaneously**.

This represents the theoretical ideal:
a single, fully observed system snapshot.


In [6]:
def coverage_set(df):
    return set(zip(df["country"], df["year"]))

cov_env = coverage_set(df_env)
cov_health = coverage_set(df_health)
cov_digital = coverage_set(df_digital)
cov_risk = coverage_set(df_risk)

intersection_all = cov_env & cov_health & cov_digital & cov_risk
len(intersection_all)


0

**Result:** No country‚Äìyear pairs satisfy full four-lens overlap.

This outcome reflects:
- Different spatial scopes across lenses
- Different temporal resolutions
- Domain-specific data availability constraints

The absence of a global intersection is a **data reality**, not an error.


### 2.4 Why a Single Global Intersection Is Invalid

Enforcing a single global intersection would:
- Eliminate entire analytical lenses
- Discard valid, policy-relevant signal
- Produce an empty synthesis set

This violates real-world system analytics principles.

Therefore, a different alignment strategy is required.


### 2.5 Regime-Based Alignment Strategy

Instead of enforcing one global frame, Notebook 5 adopts a
**regime-based synthesis approach**.

Each regime:
- Operates only where lens overlap is structurally valid
- Preserves temporal and spatial integrity
- Explicitly documents which lenses are active or absent

No extrapolation or imputation is performed.


### 2.6 Defined Synthesis Regimes

### Regime A ‚Äî India Multi-Lens Diagnostic (Non-Executable)

- Country: India
- Lenses:
  - Environmental Stress ‚úÖ
  - Health Burden ‚úÖ
  - Digital Divide ‚úÖ
  - Risk Exposure ‚ùå (not applicable)

This regime does NOT represent a full system.
It is a diagnostic multi-lens view, not a system snapshot.

---

#### Regime B ‚Äî Global Risk‚ÄìDigital Exposure
- Countries: Risk Exposure coverage (~75)
- Year: 2021
- Lenses:
  - Risk Exposure ‚úÖ
  - Digital Divide ‚úÖ
  - Environmental Stress ‚ùå
  - Health Burden ‚ùå

Use case:
Global inequality and exposure pattern analysis.

---

#### Regime C ‚Äî India Partial-Lens Longitudinal Signals
- Country: India
- Years: All available (2000‚Äì2025, lens-dependent)
- Lenses:
  - Environmental Stress ‚úÖ
  - Health Burden ‚úÖ
  - Digital Divide ‚úÖ
  - Risk Exposure ‚ùå

Use case:
Temporal evolution of systemic stress.


### 2.7 Alignment Rules (Final)

- No forced year harmonization
- No cross-regime merging
- No imputation for missing lenses
- Each synthesis explicitly states its regime
- Missing lenses are treated as **structural absence**, not zero

This preserves analytical validity and transparency.


### Phase 2 Summary ‚Äî Alignment Strategy Locked

#### What Was Accomplished
- Coverage constraints explicitly documented
- Global intersection failure diagnosed
- Regime-based alignment strategy defined
- Valid synthesis regimes locked

#### What Was Explicitly Deferred
- Attaching index values
- Compound risk computation
- Inequality or concentration analysis
- System-level conclusions

#### Phase Boundary Statement
All system-level synthesis in subsequent phases must operate
within the regimes defined here.


## üü¶ Phase 3 ‚Äî Compound Risk Characterization

### Purpose
This phase characterizes how **multiple independent stresses co-exist**
within each valid synthesis regime defined in Phase 2.

The objective is to:
- Attach index values only where alignment is valid
- Observe compound exposure patterns
- Identify co-occurrence of high and low stress signals

This phase remains **descriptive only**.
No causality, ranking, or prediction is performed.


### What This Phase DOES NOT Do
- ‚ùå No causal inference
- ‚ùå No composite mega-index
- ‚ùå No country ranking
- ‚ùå No policy prescription


### 3.1 Load Index Artifacts

In [7]:
import pandas as pd
from utils.path_setup import setup_project_path

PROJECT_ROOT = setup_project_path()

df_env = pd.read_csv(PROJECT_ROOT / "datasets/processed/climate/environment_stress_index.csv")
df_health = pd.read_csv(PROJECT_ROOT / "datasets/processed/health/health_burden_index.csv")
df_digital = pd.read_csv(PROJECT_ROOT / "datasets/processed/eco-digital/digital_divide_index.csv")
df_risk = pd.read_csv(PROJECT_ROOT / "datasets/processed/risk/risk_exposure_index.csv")


### 3.2 Regime A ‚Äî India Multi-Lens Diagnostic (Non-Executable)

This regime examines **co-existing stress signals** for India in a single year,
without attributing causality.


In [8]:
# Identify overlapping India years across available lenses
years_env = set(df_env[df_env["country"] == "India"]["year"])
years_health = set(df_health[df_health["country"] == "India"]["year"])
years_digital = set(df_digital[df_digital["country"] == "India"]["year"])

common_years = sorted(years_env & years_health & years_digital)
common_years


[]

üìå Result:
No year exists where Environmental Stress, Health Burden,
and Digital Divide are simultaneously observed for India.

Therefore, Regime A cannot be instantiated empirically
and is **documented but not executed** in Phase 3.


### 3.3 Regime B ‚Äî Global Risk‚ÄìDigital Co-Exposure (2021)

This regime studies how **risk exposure and digital divide**
co-occur across countries in the same year.


In [9]:
df_risk_digital = (
    df_risk[df_risk["year"] == 2021]
    .merge(df_digital[df_digital["year"] == 2021],
           on=["country", "year"],
           how="inner")
)

df_risk_digital.head(), df_risk_digital.shape


(  iso3      country  year  risk_exposure_index country_code  \
 0  AFG  Afghanistan  2021             0.340558          AFG   
 1  ALB      Albania  2021             0.124782          ALB   
 2  ARG    Argentina  2021             0.108597          ARG   
 3  AUS    Australia  2021             0.050802          AUS   
 4  AUT      Austria  2021             0.051075          AUT   
 
    digital_divide_index  
 0                   NaN  
 1              0.446383  
 2              0.504573  
 3              0.637459  
 4              0.569320  ,
 (65, 6))

### 3.4 Compound Exposure Categorization

Countries are classified based on relative exposure:
- High‚ÄìHigh
- High‚ÄìLow
- Low‚ÄìHigh
- Low‚ÄìLow

Thresholds are median-based and regime-specific.


In [10]:
def classify_exposure(df, col_x, col_y):
    df_valid = df.dropna(subset=[col_x, col_y]).copy()
    x_med = df_valid[col_x].median()
    y_med = df_valid[col_y].median()

    df_valid["exposure_type"] = (
        (df_valid[col_x] >= x_med).map({True: "High", False: "Low"}) + "-" +
        (df_valid[col_y] >= y_med).map({True: "High", False: "Low"})
    )
    return df_valid


df_risk_digital = classify_exposure(
    df_risk_digital,
    "risk_exposure_index",
    "digital_divide_index"
)

df_risk_digital["exposure_type"].value_counts()


exposure_type
Low-High     21
High-Low     21
High-High     7
Low-Low       6
Name: count, dtype: int64

‚ö†Ô∏è Countries with missing Digital Divide data are excluded from
compound classification and treated as structurally unavailable.


### Regime C Handling Note (Explicit Exclusion)

Regime C (India Partial-Lens Longitudinal Signals) is **not analyzed in Phase 3**
because Phase 3 is restricted to **simultaneous, multi-lens co-existence**.

Regime C is temporal rather than compound and will be addressed
only in later distributional or narrative analysis phases.


## Phase 3 Summary ‚Äî Compound Stress Characterized

### What Was Accomplished
- Index values attached within valid regimes
- Compound stress co-existence observed
- High‚Äìlow exposure patterns identified

### What Was Explicitly Avoided
- Causal interpretation
- System ranking
- Governance or policy claims

### Phase Boundary Statement
Phase 3 establishes **what stresses co-exist**,
not **why they co-exist**.


## üü¶ Phase 4: Inequality & Concentration Analysis
### üéØ Purpose

Phase 4 examines how **compound stress is distributed**, not why it exists.

This phase answers:
- Is compound exposure evenly distributed or concentrated?
- Are high-stress conditions clustered among a few countries?
- Does inequality arise from risk concentration, digital exclusion, or both?

This phase is **descriptive**, **distributional**
, and **non-causal**.

---

### What This Phase DOES
- Analyze distribution shape (spread, skew, concentration)
- Quantify inequality of exposure
- Identify stress concentration patterns
- Compare within-regime inequality

---

### What This Phase DOES NOT Do
- ‚ùå No causal explanations
- ‚ùå No governance judgments
- ‚ùå No country rankings for performance
- ‚ùå No policy prescriptions

---

### üìå Valid Regimes Used in This Phase

| Regime   | Included    | Reason                             |
| -------- | ----------- | ---------------------------------- |
| Regime B | ‚úÖ           | Global, compound (Risk + Digital)  |
| Regime A | ‚ùå           | No empirical overlap               |
| Regime C | ‚ö†Ô∏è Deferred | Temporal (handled later if needed) |

**Phase 4 operates only on Regime B.**

### 4.1 Load Phase 3 Outputs (Regime B Only)

In [11]:
import pandas as pd
import numpy as np
from utils.path_setup import setup_project_path

PROJECT_ROOT = setup_project_path()

df_risk = pd.read_csv(
    PROJECT_ROOT / "datasets/processed/risk/risk_exposure_index.csv"
)
df_digital = pd.read_csv(
    PROJECT_ROOT / "datasets/processed/eco-digital/digital_divide_index.csv"
)

# Regime B: Year = 2021
df_rb = (
    df_risk[df_risk["year"] == 2021]
    .merge(
        df_digital[df_digital["year"] == 2021],
        on=["country", "year"],
        how="inner"
    )
)

df_rb = df_rb.dropna(
    subset=["risk_exposure_index", "digital_divide_index"]
).reset_index(drop=True)

df_rb.shape


(55, 6)

üìå Result: **Valid compound dataset for inequality analysis**

### 4.2 Distribution Diagnostics

#### 4.2.1 Summary Statistics

In [12]:
df_rb[["risk_exposure_index", "digital_divide_index"]].describe()


Unnamed: 0,risk_exposure_index,digital_divide_index
count,55.0,55.0
mean,0.146614,0.469087
std,0.101478,0.098237
min,0.028063,0.136719
25%,0.064039,0.422991
50%,0.12614,0.460119
75%,0.195356,0.52047
max,0.601737,0.71188


üìå Interpretation guardrail:
- This describes **spread and skew**, not performance
- No normative ranking implied

#### 4.2.2 Distribution Skewness

In [18]:
df_rb[["risk_exposure_index", "digital_divide_index"]].skew()


risk_exposure_index     1.832834
digital_divide_index   -0.420444
dtype: float64

üìå High skew ‚áí stress is **not evenly distributed**

### 4.3 Concentration Analysis (Top-Heavy Exposure)

#### 4.3.1 Share of Exposure Held by Top Quantiles

In [14]:
def top_share(series, q=0.2):
    cutoff = series.quantile(1 - q)
    return series[series >= cutoff].sum() / series.sum()

concentration = pd.DataFrame({
    "Metric": ["Risk Exposure", "Digital Divide"],
    "Top 20% Share": [
        top_share(df_rb["risk_exposure_index"], 0.2),
        top_share(df_rb["digital_divide_index"], 0.2)
    ]
})

concentration


Unnamed: 0,Metric,Top 20% Share
0,Risk Exposure,0.407093
1,Digital Divide,0.257524


üìå Interpretation:
- Values >> 0.20 indicate **strong concentration**
- Stress is borne by **a minority of countries**

### 4.4 Inequality Metrics (Gini Coefficient)

#### 4.4.1 Gini Function

In [15]:
def gini(array):
    array = np.sort(array)
    n = len(array)
    cumulative = np.cumsum(array)
    return (n + 1 - 2 * np.sum(cumulative) / cumulative[-1]) / n


#### 4.4.2 Compute Inequality

In [16]:
gini_results = pd.DataFrame({
    "Metric": ["Risk Exposure", "Digital Divide"],
    "Gini": [
        gini(df_rb["risk_exposure_index"].values),
        gini(df_rb["digital_divide_index"].values)
    ]
})

gini_results


Unnamed: 0,Metric,Gini
0,Risk Exposure,0.353569
1,Digital Divide,0.11138


üìå Interpretation guardrail:
- Gini measures **distribution inequality**, not fairness
- High Gini ‚â† moral judgment

### 4.5 Compound Inequality Interaction

**Question**

Do **high-risk countries also experience higher digital exclusion**?

#### 4.5.1 Correlation (Descriptive Only)

In [19]:
df_rb[["risk_exposure_index", "digital_divide_index"]].corr()


Unnamed: 0,risk_exposure_index,digital_divide_index
risk_exposure_index,1.0,-0.532086
digital_divide_index,-0.532086,1.0


üìå Guardrail:
- Correlation ‚â† causation
- No directional inference allowed

üìå Additional guardrail:

Correlation here reflects co-distribution within a filtered regime (2021 only), not a structural relationship between risk exposure and digital access. Selection effects and regime constraints dominate this signal.


üìå Scale compatibility note:
- Both indices are normalized to bounded, unitless scales in their source notebooks.
- This enables distributional comparison but does not imply equivalence of magnitude
or impact across domains.


### Phase 4 Summary ‚Äî Inequality & Concentration Assessed
#### What This Phase Establishes
- Compound stress is unevenly distributed
- Both risk and digital exclusion show concentration effects
- A subset of countries bears disproportionate compound stress

#### What This Phase Does NOT Claim
- Why inequality exists
- Whether inequality is avoidable
- Whether governance is responsible
- Whether outcomes are unjust or inefficient

#### Phase Boundary Statement

Phase 4 describes **how stress is distributed**,
not **why it is distributed that way**.

All causal, structural, or normative interpretations are explicitly out of scope.

## üü¶ Phase 5 ‚Äî System-Level Typologies

### üéØ Purpose
Phase 5 converts **distributional and inequality signals** (Phase 4) into **system-level typologies** that describe how compound stress manifests ‚Äî without explaining why it manifests.

This phase answers:
- What **structural stress patterns** exist in the system?
- How do **risk exposure and digital exclusion co-configure**?
- Are there **distinct systemic profiles**, not rankings?

This phase is:
- **Descriptive**
- **Non-causal**
- **Non-predictive**
- **Non-normative**

---

### What This Phase DOES
- Define system stress typologies
- Classify countries into structural profiles
- Reveal patterned co-existence, not performance

---

### What This Phase DOES NOT Do
- ‚ùå No causality
- ‚ùå No country ranking
- ‚ùå No governance judgment
- ‚ùå No policy prescription
- ‚ùå No future inference

---

### üìå Valid Regimes Used in This Phase
| **Regime**   | **Used** | **Reason**                            |
| -------- | ---- | --------------------------------- |
| Regime B | ‚úÖ    | Global, compound (Risk + Digital) |
| Regime A | ‚ùå    | No empirical overlap              |
| Regime C | ‚ùå    | Temporal, not compound            |


**Phase 5 operates exclusively on Regime B (2021).**

### 5.1 Load Regime B Compound Dataset

In [20]:
import pandas as pd
import numpy as np
from utils.path_setup import setup_project_path

PROJECT_ROOT = setup_project_path()

df_risk = pd.read_csv(
    PROJECT_ROOT / "datasets/processed/risk/risk_exposure_index.csv"
)
df_digital = pd.read_csv(
    PROJECT_ROOT / "datasets/processed/eco-digital/digital_divide_index.csv"
)

df_rb = (
    df_risk[df_risk["year"] == 2021]
    .merge(
        df_digital[df_digital["year"] == 2021],
        on=["country", "year"],
        how="inner"
    )
    .dropna(subset=["risk_exposure_index", "digital_divide_index"])
    .reset_index(drop=True)
)

df_rb.shape


(55, 6)

üìå **Result**: Clean compound dataset for typology construction

### 5.2 Typology Design Logic

Typologies are defined along **two independent axes**:

| **Axis**           | **Meaning**                         |
| -------------- | ------------------------------- |
| Risk Exposure  | Adverse event concentration     |
| Digital Divide | Access and inclusion constraint |

Each axis is split using **regime-specific medians**.

This yields **four structural system types**.

### 5.3 Compute Typology Thresholds

In [21]:
risk_median = df_rb["risk_exposure_index"].median()
digital_median = df_rb["digital_divide_index"].median()

risk_median, digital_median


(np.float64(0.1261401151327937), np.float64(0.4601187488225796))

üìå Thresholds are **descriptive**, not normative.

### 5.4 Assign System-Level Typologies

In [22]:
def assign_typology(row):
    if row["risk_exposure_index"] >= risk_median and row["digital_divide_index"] >= digital_median:
        return "High Risk ‚Äì High Digital Exclusion"
    if row["risk_exposure_index"] >= risk_median and row["digital_divide_index"] < digital_median:
        return "High Risk ‚Äì Low Digital Exclusion"
    if row["risk_exposure_index"] < risk_median and row["digital_divide_index"] >= digital_median:
        return "Low Risk ‚Äì High Digital Exclusion"
    return "Low Risk ‚Äì Low Digital Exclusion"

df_rb["system_typology"] = df_rb.apply(assign_typology, axis=1)

df_rb["system_typology"].value_counts()


system_typology
Low Risk ‚Äì High Digital Exclusion     21
High Risk ‚Äì Low Digital Exclusion     21
High Risk ‚Äì High Digital Exclusion     7
Low Risk ‚Äì Low Digital Exclusion       6
Name: count, dtype: int64

üìå These are **system profiles**, not country labels.

### 5.5 Typology Distribution

In [23]:
typology_summary = (
    df_rb
    .groupby("system_typology")
    .agg(
        country_count=("country", "count"),
        avg_risk=("risk_exposure_index", "mean"),
        avg_digital=("digital_divide_index", "mean")
    )
    .reset_index()
    .sort_values("country_count", ascending=False)
)

typology_summary


Unnamed: 0,system_typology,country_count,avg_risk,avg_digital
1,High Risk ‚Äì Low Digital Exclusion,21,0.238377,0.38571
2,Low Risk ‚Äì High Digital Exclusion,21,0.065678,0.551302
0,High Risk ‚Äì High Digital Exclusion,7,0.162257,0.499803
3,Low Risk ‚Äì Low Digital Exclusion,6,0.090472,0.437325


üìå Shows **how stress clusters structurally**, not geographically

### 5.6 Typology Interpretation (Strictly Structural)
#### 1Ô∏è‚É£ High Risk ‚Äì High Digital Exclusion
- Compound stress concentration
- Structural vulnerability amplification
- Multiple constraints coexist

#### 2Ô∏è‚É£ High Risk ‚Äì Low Digital Exclusion
- Exposure present, access comparatively stronger
- Potential buffering capacity (not asserted)

#### 3Ô∏è‚É£ Low Risk ‚Äì High Digital Exclusion
- Lower exposure but constrained access
- Latent fragility profile

#### 4Ô∏è‚É£ Low Risk ‚Äì Low Digital Exclusion
- Lowest compound stress concentration
- Structural advantage **within this regime only**

‚ö†Ô∏è These are **descriptions**, not value judgments.

### 5.7 System Typologies vs Inequality Signals

In [24]:
df_rb.groupby("system_typology")[["risk_exposure_index", "digital_divide_index"]].std()


Unnamed: 0_level_0,risk_exposure_index,digital_divide_index
system_typology,Unnamed: 1_level_1,Unnamed: 2_level_1
High Risk ‚Äì High Digital Exclusion,0.028062,0.040279
High Risk ‚Äì Low Digital Exclusion,0.0993,0.078118
Low Risk ‚Äì High Digital Exclusion,0.028325,0.06635
Low Risk ‚Äì Low Digital Exclusion,0.024835,0.011883


üìå Within-type variance confirms:
- Typologies capture **structure**
- Not fine-grained ranking

### Phase 5 Summary ‚Äî System Typologies Established

#### What This Phase Achieved
- Converted compound stress into system-level profiles
- Identified distinct structural configurations
- Preserved non-causal, non-normative framing

---

#### What This Phase Does NOT Claim
- That typologies imply governance quality
- That any profile is ‚Äúbetter‚Äù or ‚Äúworse‚Äù
- That transitions between types are predictable
- That causes are known

---

#### Phase Boundary Statement

Phase 5 defines **how compound stress configurations differ**
‚Äînot **why they differ**, not **who is responsible**,
and not **what should be done**.

All interpretation remains **structural and descriptive only**.

---