# <center>Site-level Study on EU ETS Policy Effects:</center>
## <center>Dual-Outcome Analysis (ETS CO₂ + Satellite NOx)</center>

## Research question / Hypotheses

**Core question:**
> How do EU ETS carbon market stringency interventions affect both verified CO₂ emissions and satellite-derived NOx emission proxies around major industrial emitters?

**Dual Outcomes:**
1. **ETS CO₂** (ground-truth): Verified annual emissions from EU ETS registry
2. **Satellite NOx** (proxy): Beirle-style flux-divergence NOx estimates from TROPOMI

**Sub-questions / hypotheses:**
- *H1:* Allocation shortfall (ratio < 1) induces facilities to reduce combustion, lowering both CO₂ and NOx.
- *H2:* The magnitude of emission reduction correlates with plant characteristics (fuel type, capacity) and local geography.
- *H3:* Treatment effects may differ between ground-truth CO₂ and noisy satellite NOx due to measurement error attenuation.

## Variables and Causal Structure

We work with a **plant–year panel** indexed by facility *i* and year *t*.

### Dual Outcomes
- **$Y^{CO2}_{it}$**: Verified annual CO₂ emissions from EU ETS registry (tCO₂/yr, log-transformed).
  Ground-truth administrative data with high reliability.
- **$Y^{NOx}_{it}$**: Satellite-derived NOx emission proxy (kg/s) via Beirle-style flux-divergence.
  Noisy but independent measure of combustion intensity.

### Treatment (Causal Target)
- **Policy $P_{it}$**: EU ETS stringency at the plant–year level, measured by the
  allocation ratio (free allocation / verified emissions). Values < 1 indicate the
  facility must purchase additional allowances, creating abatement incentives.

### Observed Covariates
- **Plant characteristics $X_{it}$**: Time-varying capacity (MW) and fuel mix shares.
  Used as controls for **BOTH outcomes**.
- **Geographic context $G_i$**: High-dimensional AlphaEarth embeddings (64-dim) encoding
  land use, infrastructure, and climate patterns. Used as controls for **NOx only** (see below).

### Unobserved Variables
- **$U_i$**: Plant-level time-invariant unobservables (baseline technology, combustion efficiency).
- **$U_{it}$**: **Plant-level time-varying unobservables**—the key identification challenge.
  Includes dispatch/utilization, maintenance status, operational efficiency changes.
- **$U_{rt}$**: Region–time effects (electricity demand, fuel prices, regional policy enforcement).

### Why Dual Outcomes?

1. **ETS CO₂** is ground-truth but may be subject to reporting incentives
2. **Satellite NOx** is independent but noisy (~35-45% uncertainty)
3. Consistent effects across both outcomes strengthen causal claims
4. Divergent effects may reveal measurement error attenuation in NOx

This aligns with the DAG: $G_i \to Y^{NOx}_{it}$ only, not $G_i \to Y^{CO2}_{it}$.

```mermaid
flowchart LR
  %% === UNOBSERVED ===
  subgraph Unobserved
    direction TB
    Ui["U_i: time-invariant (technology, efficiency)"]
    Uit["U_it: time-varying (dispatch, maintenance)"]
    Urt["U_rt: region–time (demand, prices)"]
  end

  %% === OBSERVED ===
  subgraph Observed Controls
    direction TB
    X["X_it: capacity, fuel mix"]
    G["G_i: AlphaEarth embeddings"]
    W["W_it: wind (measurement only)"]
  end

  %% === TREATMENT ===
  subgraph Treatment
    A["A_it: Free allocation (predetermined Feb)"]
    P["P_it: Allocation ratio = A/Y"]
  end

  %% === DUAL OUTCOMES ===
  subgraph "Dual Outcomes"
    Y_CO2["Y^CO2_it: Verified ETS emissions"]
    Y_NOx["Y^NOx_it: Satellite NOx proxy"]
  end

  %% === CAUSAL ARROWS ===

  %% Time-invariant unobserved → absorbed by facility FE
  Ui --> X
  Ui --> Y_CO2
  Ui --> Y_NOx
  Ui -.->|"absorbed by facility FE"| A

  %% Plant-level time-varying unobserved (KEY CHALLENGE)
  Uit --> Y_CO2
  Uit --> Y_NOx
  Urt --> Uit

  %% Region-time effects → absorbed by Region×Year FE
  Urt -.->|"absorbed by region×year FE"| A
  Urt --> Y_CO2
  Urt --> Y_NOx

  %% Observed controls
  X --> Y_CO2
  X --> Y_NOx
  
  %% AlphaEarth: affects satellite retrieval quality (background, terrain AMF)
  %% Does NOT affect administrative ETS reporting
  G --> Y_NOx
  
  %% Wind: ONLY affects satellite measurement (advection calculation)
  %% Does NOT affect ETS verified emissions (administrative data)
  W --> Y_NOx

  %% === TREATMENT PATHWAY (NO CYCLE) ===
  %% Key insight: A_it is PREDETERMINED (granted by Feb 28)
  %% Emissions Y_it occur DURING the year, AFTER allocation is known
  %% So: A_it → P_it → Y_it (no reverse causation in same period)
  
  A --> P
  Y_CO2 -.->|"ex-post denominator"| P
  
  %% CAUSAL EFFECTS OF INTEREST
  P ==>|"β_CO2 (causal)"| Y_CO2
  P ==>|"β_NOx (causal)"| Y_NOx

  %% Mediation: policy affects dispatch (potential mechanism)
  P -.->|"mediation via dispatch"| Uit
```

---

## Controls Included in Each Outcome

| Variable | ETS CO₂ | Satellite NOx | Reason |
|----------|------------------|------------------------|--------|
| **AlphaEarth** $G_i$ | ❌ No | ✅ Yes | ETS is administrative data; satellite retrieval depends on terrain, land use (AMF), urban background |
| **Wind** $W_{it}$ | ❌ No | ✅ Yes | ETS is reported mass balance; Beirle flux-divergence uses wind for advection calculation |
| **Capacity/Fuel** $X_{it}$ | ✅ Yes | ✅ Yes | Both outcomes reflect combustion intensity |
| **Dispatch** $U_{it}$ | ✅ Yes | ✅ Yes | Both outcomes respond to operational activity |

---

## Identification Strategy

### What We Control For

| Variable | Absorbed By | Rationale |
|----------|-------------|-----------|
| $U_i$ (time-invariant) | Facility FE | Technology, location, baseline efficiency |
| $U_{rt}$ (region-time) | Region×Year FE | Demand shocks, fuel prices, regional policy |
| $X_{it}$ (observed) | Controls | Capacity, fuel mix |
| $G_i$ (geography) | AlphaEarth embeddings | Satellite retrieval context (NOx only) |

### What We Do NOT Control For: $U_{it}$

Plant-level time-varying unobservables ($U_{it}$) include dispatch, maintenance, and efficiency changes.
We deliberately do not control for these because:

1. **Dispatch as confounder**: Regional demand → higher dispatch → more emissions → lower $R_{it}$.
   Same demand → more combustion → higher NO₂. This creates spurious correlation.

2. **Dispatch as mediator**: Policy affects dispatch via carbon costs in bids (higher costs → higher bids →
   lower dispatch probability). Thus $P_{it} \to U_{it} \to Y_{it}$. Controlling for $U_{it}$ blocks
   this pathway and biases $\hat{\beta}$ toward zero.

3. **Facility×Year FE infeasible**: Would absorb all within-facility-year variation, including treatment.

### Our Solution: Region×Year FE

Region×Year FE absorbs the *common regional component* of $U_{it}$ (since dispatch responds to
regional demand/prices) without estimating facility-specific parameters. The identifying variation:

> *Within the same region and year, do facilities with different allocation ratios show different emissions?*

This leaves **facility-specific deviations** in $U_{it}$ as residual confounding (e.g., idiosyncratic
outages, plant-specific demand). These are plausibly second-order and orthogonal to allocation ratio
conditional on capacity and fuel mix controls.

---

## Analysis Methodology

### Treatment Definition

**Continuous Treatment**: `eu_alloc_ratio = allocated_allowances / verified_emissions`
- Ratio < 1 → Facility must purchase additional allowances (treated)
- Ratio ≥ 1 → Facility has sufficient free allocation (control/less treated)

### Clustering Strategy

We use **NUTS2 regions** (Eurostat administrative units) for spatial clustering:
1. **Clustered standard errors** — accounts for within-region correlation in errors
2. **Region × Year fixed effects** — absorbs region-specific time shocks (electricity demand, fuel prices, policy enforcement)

### Sample Definitions

| Sample | Description | Used For |
|--------|-------------|----------|
| **Full ETS** | All facilities with valid ETS emissions | TWFE (ETS CO₂) |
| **Permissive NOx** | Satellite NOx ≥ 0.03 kg/s (detection limit) | NOx outcome (main) |
| **Conservative NOx** | Satellite NOx ≥ 0.11 kg/s (Beirle standard) | NOx outcome (robustness) |

**IMPORTANT**: NOx analysis NEVER uses unfiltered samples (all specs require DL filtering)

### Five Core Specifications (TWFE Only)

| # | Outcome | Sample | Embedding | Estimator |
|---|---------|--------|-----------|-----------|
| **1** | ETS CO₂ | Full | None | TWFE |
| **2** | Satellite NOx | DL ≥0.03 | PCA (10 dims) | TWFE |
| **3** | Satellite NOx | DL ≥0.03 | PLS (10 dims) | TWFE |
| **4** | Satellite NOx | DL ≥0.11 | PCA (10 dims) | TWFE |
| **5** | Satellite NOx | DL ≥0.11 | PLS (10 dims) | TWFE |

### Heterogeneity Analysis

**Split-sample** (separate regressions per group):
- Electricity sector vs other
- Urban vs rural
- By dominant fuel type (Coal, Gas, Oil, Biomass)
- Top 5 countries
- Top 5 PyPSA clusters (electricity only)

**Continuous interactions** (treatment × characteristic):
- Fuel shares: `Treatment × share_coal`, `Treatment × share_gas`, etc.
- Capacity: `Treatment × capacity_mw` (standardized)
- Urbanization: `Treatment × urban` (binary)

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

# =============================================================================
# COLUMN CONSTANTS
# =============================================================================
FAC_ID_COL = "idx"
YEAR_COL = "year"
CLUSTER_COL = "nuts2_region"
PYPSA_CLUSTER_COL = "pypsa_cluster"
ALLOC_RATIO_COL = "eu_alloc_ratio"
ETS_CO2_COL = "eu_verified_tco2"
LOG_ETS_CO2_COL = "log_ets_co2"
NOX_OUTCOME_COL = "beirle_nox_kg_s"
NOX_SE_COL = "beirle_nox_kg_s_se"
IN_URBAN_AREA_COL = "in_urban_area"
URBANIZATION_DEGREE_COL = "urbanization_degree"
IS_ELECTRICITY_COL = "is_electricity"

# =============================================================================
# DATA PATHS (adjust if running from different directory)
# =============================================================================
DATA_DIR = Path("../data/out")
STATIC_PATH = DATA_DIR / "facilities_static.parquet"
YEARLY_PATH = DATA_DIR / "facilities_yearly.parquet"
BEIRLE_PATH = DATA_DIR / "beirle_panel.parquet"

# =============================================================================
# ANALYSIS PARAMETERS
# =============================================================================

# Time range
START_YEAR = 2018
END_YEAR = 2023

# Treatment
TREATMENT_COL = ALLOC_RATIO_COL        # "eu_alloc_ratio"
TREATMENT_THRESHOLD = 1.0               # Treated if ratio < 1

# Dual Outcomes
OUTCOMES = {
    "ETS_CO2": ETS_CO2_COL,              # Verified Emissions (tCO2e)
    "Satellite_NOx": NOX_OUTCOME_COL,    # Beirle-style NOx (Yearly Mean kg/s)
}

# =============================================================================
# CONTROLS
# =============================================================================
# Base controls: applied to BOTH outcomes
CONTROLS = ["capacity_mw", "share_coal", "share_gas"]

# NOTE: AlphaEarth embeddings (emb_*) are stored in beirle_panel.parquet and are
# automatically added as controls for NOx outcome ONLY by run_dual_outcome_analysis().
# They are NOT used for ETS CO₂ because:
#   1. ETS CO₂ is administrative data - geographic context doesn't affect measurement
#   2. Geographic confounders are absorbed by Facility FE and Region×Year FE
#   3. Embeddings would control for something irrelevant to the ETS measurement
# See DAG: G_i → Y^NOx only, not G_i → Y^CO2

# Clustering: NUTS2 regions (primary)
# PyPSA clusters available for electricity heterogeneity via PYPSA_CLUSTER_COL

# Sample restrictions
MIN_YEARS_PER_FACILITY = 3

# Satellite-specific filters
SATELLITE_DETECTION_LIMIT = "permissive"  # "conservative" (0.11 kg/s) or "permissive" (0.03 kg/s)

# Heterogeneity dimensions
HETEROGENEITY_VARS = {
    "urbanization": IN_URBAN_AREA_COL,    # Urban vs rural
    "electricity": IS_ELECTRICITY_COL,     # Electricity generators (activity codes 1, 20)
    "fuel_coal": "share_coal",             # High coal (>0.5) vs other
    "capacity": "capacity_mw",             # Above/below median
}

print("Parameters configured:")
print(f"  Years: {START_YEAR}-{END_YEAR}")
print(f"  Treatment: {TREATMENT_COL} < {TREATMENT_THRESHOLD}")
print(f"  Outcomes: {list(OUTCOMES.keys())}")
print(f"  Base controls: {CONTROLS}")
print(f"  NOx-only controls: AlphaEarth embeddings (64 dims, auto-added)")
print(f"  Primary clustering: NUTS2 regions ({CLUSTER_COL})")
print(f"  Satellite DL: {SATELLITE_DETECTION_LIMIT}")
print(f"  Data paths: {DATA_DIR}")

Parameters configured:
  Years: 2018-2023
  Treatment: eu_alloc_ratio < 1.0
  Outcomes: ['ETS_CO2', 'Satellite_NOx']
  Base controls: ['capacity_mw', 'share_coal', 'share_gas']
  NOx-only controls: AlphaEarth embeddings (64 dims, auto-added)
  Primary clustering: NUTS2 regions (nuts2_region)
  Satellite DL: permissive
  Data paths: ../data/out


### Load Data (ETS + Satellite outcomes merged)

In [2]:
from data import load_analysis_panel, load_facilities_static

# Load static attributes (includes is_electricity, nuts2_region, etc.)
static = load_facilities_static(STATIC_PATH)

# Load panel with satellite outcomes included
panel = load_analysis_panel(
    static_path=STATIC_PATH,
    yearly_path=YEARLY_PATH,
    beirle_path=BEIRLE_PATH,
    include_satellite=True
)

print(f"\nPanel: {len(panel)} obs, {panel[FAC_ID_COL].nunique()} facilities")
print(f"Years: {panel[YEAR_COL].min()} - {panel[YEAR_COL].max()}")

print(f"\nStatic attributes:")
print(f"  is_electricity: {static[IS_ELECTRICITY_COL].sum()} / {len(static)} are electricity generators")
if CLUSTER_COL in static.columns:
    print(f"  {CLUSTER_COL}: {static[CLUSTER_COL].nunique()} unique regions")

print(f"\nAvailable outcomes:")
print(f"  ETS CO₂: {ETS_CO2_COL} → {panel[ETS_CO2_COL].notna().sum()} valid obs")
if NOX_OUTCOME_COL in panel.columns:
    print(f"  Satellite NOx: {NOX_OUTCOME_COL} → {panel[NOX_OUTCOME_COL].notna().sum()} valid obs")

Loaded 521 facilities from facilities_static.parquet
Loaded 521 facilities from facilities_static.parquet
Loaded 2819 obs from facilities_yearly.parquet
Loaded 1213 satellite obs from beirle_panel.parquet
Satellite outcomes: 1122 obs with valid NOx data
AlphaEarth embeddings: 64 dims, 914 obs (NOx analysis only)
Panel: 2819 obs, 521 facilities, 2018-2023

Panel: 2819 obs, 521 facilities
Years: 2018 - 2023

Static attributes:
  is_electricity: 421 / 521 are electricity generators
  nuts2_region: 82 unique regions

Available outcomes:
  ETS CO₂: eu_verified_tco2 → 2819 valid obs
  Satellite NOx: beirle_nox_kg_s → 1122 valid obs


## 2. Data Preparation and Treatment Construction

### Build Treatment Variables

In [3]:
from data import build_treatment_variables, apply_sample_filters

# Continuous: eu_alloc_ratio (already in data)
# Discrete: treated = 1 if ratio < 1, cohort = first year treated
panel = build_treatment_variables(
    panel, 
    treatment_col=TREATMENT_COL,
    threshold=TREATMENT_THRESHOLD
)

# Apply sample filters for ETS CO₂ analysis
# NOTE: apply_ets_filter=True applies the emissions filter (≥100 ktCO₂/yr)
# This filter is specific to ETS CO₂ outcome and should NOT be used for satellite NOx
panel = apply_sample_filters(
    panel, 
    min_years=MIN_YEARS_PER_FACILITY,
    year_range=(START_YEAR, END_YEAR),
    require_outcome=True,
    outcome_col=ETS_CO2_COL,
    apply_ets_filter=False,  # Apply emissions filter for ETS CO₂ analysis
)

# Map is_electricity from static to panel
if IS_ELECTRICITY_COL in static.columns and IS_ELECTRICITY_COL not in panel.columns:
    elec_map = static.set_index(FAC_ID_COL)[IS_ELECTRICITY_COL]
    panel[IS_ELECTRICITY_COL] = panel[FAC_ID_COL].map(elec_map) # type: ignore
    print(f"\nMapped is_electricity to panel: {panel[IS_ELECTRICITY_COL].sum()} electricity facility-years")

Treatment: 457 ever-treated, 64 never-treated
Cohorts: 6
Filters: 2819 → 2819 obs (521 facilities)


In [4]:
# =============================================================================
# Panel Summary
# =============================================================================
print(f"\nFinal panel:")
print(f"  Observations: {len(panel)}")
print(f"  Facilities: {panel[FAC_ID_COL].nunique()}")
print(f"  Years: {panel[YEAR_COL].min()}-{panel[YEAR_COL].max()}")
print(f"  Treated (ever): {(panel.groupby(FAC_ID_COL)['treated'].max() > 0).sum()}") # type: ignore
print(f"  Never treated: {(panel.groupby(FAC_ID_COL)['treated'].max() == 0).sum()}") # type: ignore


Final panel:
  Observations: 2819
  Facilities: 521
  Years: 2018-2023
  Treated (ever): 457
  Never treated: 64


## 3. TWFE Continuous Specifications

Two TWFE specifications for each outcome:
1. **Spec 1**: Facility + Year FE, clustered SEs by NUTS2 region
2. **Spec 2**: Facility + Region×Year FE, clustered SEs by NUTS2 region

In [5]:
from continuous import run_ets_analysis, run_nox_analysis

# =============================================================================
# ETS CO₂ (Full Sample) — Run once
# =============================================================================
print("\n" + "=" * 70)
print("ETS CO₂ ANALYSIS (Full Sample)")
print("=" * 70)
ets_results = run_ets_analysis(
    panel, treatment_col=TREATMENT_COL, controls=CONTROLS,
    cluster_col=CLUSTER_COL, ets_col=LOG_ETS_CO2_COL
)

# =============================================================================
# Satellite NOx — Permissive DL (≥0.03 kg/s) — Main Specification
# =============================================================================
print("\n" + "=" * 70)
print("SATELLITE NOx: Permissive Detection Limit (≥0.03 kg/s)")
print("=" * 70)
nox_perm = run_nox_analysis(
    panel, treatment_col=TREATMENT_COL, controls=CONTROLS,
    cluster_col=CLUSTER_COL, nox_col=NOX_OUTCOME_COL,
    nox_dl_col="above_dl_0_03"
)

# =============================================================================
# Satellite NOx — Conservative DL (≥0.11 kg/s) — Robustness
# =============================================================================
print("\n" + "=" * 70)
print("SATELLITE NOx: Conservative Detection Limit (≥0.11 kg/s) [Robustness]")
print("=" * 70)
nox_cons = run_nox_analysis(
    panel, treatment_col=TREATMENT_COL, controls=CONTROLS,
    cluster_col=CLUSTER_COL, nox_col=NOX_OUTCOME_COL,
    nox_dl_col="above_dl_0_11"
)

# =============================================================================
# Combined Summary Table
# =============================================================================
print("\n" + "=" * 70)
print("TWFE RESULTS SUMMARY")
print("=" * 70)

# Build single combined table
rows = []

# ETS (once)
if "coefficient" in ets_results:
    rows.append({
        "Outcome": "ETS CO₂ (Full Sample)",
        "Coefficient": ets_results["coefficient"],
        "SE": ets_results["se"],
        "P-value": ets_results["pvalue"],
        "95% CI": f"[{ets_results['ci_lower']:.4f}, {ets_results['ci_upper']:.4f}]",
        "N": ets_results["n_obs"]
    })

# NOx Permissive
for key in ["satellite_pca", "satellite_pls"]:
    if key in nox_perm and "coefficient" in nox_perm[key]:
        r = nox_perm[key]
        label = "PCA" if "pca" in key else "PLS"
        rows.append({
            "Outcome": f"Satellite NOx ({label}, DL ≥0.03)",
            "Coefficient": r["coefficient"],
            "SE": r["se"],
            "P-value": r["pvalue"],
            "95% CI": f"[{r['ci_lower']:.4f}, {r['ci_upper']:.4f}]",
            "N": r["n_obs"]
        })

# NOx Conservative
for key in ["satellite_pca", "satellite_pls"]:
    if key in nox_cons and "coefficient" in nox_cons[key]:
        r = nox_cons[key]
        label = "PCA" if "pca" in key else "PLS"
        rows.append({
            "Outcome": f"Satellite NOx ({label}, DL ≥0.11)",
            "Coefficient": r["coefficient"],
            "SE": r["se"],
            "P-value": r["pvalue"],
            "95% CI": f"[{r['ci_lower']:.4f}, {r['ci_upper']:.4f}]",
            "N": r["n_obs"]
        })

display(pd.DataFrame(rows))


ETS CO₂ ANALYSIS (Full Sample)

ETS VERIFIED CO2 EMISSIONS (GROUND TRUTH)
  Sample: Full ETS panel
  Controls: base only (embeddings NOT used - absorbed by FE)

######################################################################
# Outcome: ETS CO2 (tCO2/yr, log)
######################################################################
TWFE: eu_alloc_ratio -> log_ets_co2

TWFE (Facility + Region×Year FE)
  Formula: log_ets_co2 ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas | idx + nuts2_region^year
  Cluster: nuts2_region




  Coef: -0.186267 (SE: 0.030313, p=0.0000)

SATELLITE NOx: Permissive Detection Limit (≥0.03 kg/s)

NOx Detection Limit Filter (≥0.03 kg/s): 2819 → 827 obs
PCA reduction: 64 → 10 dimensions
  Variance explained: 89.8%
  Valid observations: 677 / 827

SATELLITE NOx (PCA) — Detection Limit: ≥0.03 kg/s
  Controls: base + embeddings PCA (10 dims)

######################################################################
# Outcome: Satellite NOx (PCA, DL ≥0.03 kg/s)
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pca_emb_06 + pca_emb_07 + pca_emb_08 + pca_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.000049 (SE: 0.000243, p=0.8411)
PLS training: 200 facilities with valid embeddings + target
PLS reduction: 64 → 10 dim



  Coef: -0.000114 (SE: 0.000245, p=0.6465)

SATELLITE NOx: Conservative Detection Limit (≥0.11 kg/s) [Robustness]

NOx Detection Limit Filter (≥0.11 kg/s): 2819 → 187 obs
PCA reduction: 64 → 10 dimensions
  Variance explained: 94.5%
  Valid observations: 154 / 187

SATELLITE NOx (PCA) — Detection Limit: ≥0.11 kg/s
  Controls: base + embeddings PCA (10 dims)

######################################################################
# Outcome: Satellite NOx (PCA, DL ≥0.11 kg/s)
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pca_emb_06 + pca_emb_07 + pca_emb_08 + pca_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.002816 (SE: 0.000589, p=0.0174)
PLS training: 46 facilities with valid embeddings + target
PLS reductio



Unnamed: 0,Outcome,Coefficient,SE,P-value,95% CI,N
0,ETS CO₂ (Full Sample),-0.186267,0.030313,4.960499e-08,"[-0.2457, -0.1269]",2723
1,"Satellite NOx (PCA, DL ≥0.03)",-4.9e-05,0.000243,0.841075,"[-0.0005, 0.0004]",577
2,"Satellite NOx (PLS, DL ≥0.03)",-0.000114,0.000245,0.6464791,"[-0.0006, 0.0004]",577
3,"Satellite NOx (PCA, DL ≥0.11)",-0.002816,0.000589,0.01740139,"[-0.0040, -0.0017]",140
4,"Satellite NOx (PLS, DL ≥0.11)",-0.003008,0.000256,0.001324554,"[-0.0035, -0.0025]",140


## 4. Callaway-Sant'Anna DiD: NOT POSSIBLE
### **These cells have been kept for completeness/future work.**

I initially planned to complement the TWFE analysis with the **Callaway & Sant'Anna (2021)** estimator, which addresses potential TWFE bias when treatment timing varies and effects are heterogeneous. However, this approach is **not feasible** for this dataset.

### The Problem: No Pre-Treatment Observations

CS-DiD requires observing units *before* they first receive treatment to establish parallel pre-trends. Defining treatment as `eu_alloc_ratio < 1`, the "cohort" is the first year a facility crosses this threshold.

**The EU ETS has operated since 2005.** By the start of our panel (2018), most facilities that would ever face allocation shortfalls were already treated:

| Cohort (First Treatment Year) | Facilities | % of Ever-Treated |
|------------------------------|-----------|-------------------|
| **2018** (first panel year) | 386 | **84.5%** |
| 2019 | 20 | 4.4% |
| 2020 | 9 | 2.0% |
| 2021 | 38 | 8.3% |
| 2022-2023 | 4 | <1% |

With **84.5% of treated facilities already treated in 2018**, there are no pre-treatment observations for the vast majority of units. The CS-DiD estimator cannot identify treatment effects without this variation.

### Why TWFE Still Works

Our continuous TWFE specification does **not** require discrete cohort definitions. It exploits:
- **Within-facility variation** in allocation ratios over time
- **Continuous treatment intensity** rather than binary treatment status

This is valid because:
1. The allocation ratio varies year-to-year within facilities
2. Fixed effects absorb time-invariant confounders
3. Region×Year FE absorbs regional time shocks

### Future Work

Extending the panel backward to **2013-2017** (EU ETS Phase 3) would capture actual first-treatment timing for most facilities, enabling proper event-study analysis with pre-treatment observations.

In [6]:
# =============================================================================
# Validate Cohorts for Callaway-Sant'Anna
# =============================================================================
from discrete import validate_cohorts

# Ensure cohorts are properly constructed (already done in build_treatment_variables)
cohort_validation = validate_cohorts(panel, cohort_col="cohort")

Valid: 6 cohorts, 457 treated, 64 control


In [7]:
from discrete import estimate_callaway_santanna, validate_cohorts
from data import get_absorbing_treatment_sample, identify_treatment_reversers
from embedding_reduction import reduce_embeddings, get_reduced_embedding_cols

# Filter to Absorbing Treatment Sample (drop reversers)
# =============================================================================
# CS-DiD assumes absorbing treatment (once treated, always treated).
# Facilities with treatment reversals violate this assumption and must be dropped.

print("=" * 70)
print("CS-DiD SAMPLE PREPARATION: Absorbing Treatment Filter")
print("=" * 70)

# Show reversal distribution
reversal_info = identify_treatment_reversers(panel, treatment_col=TREATMENT_COL)
print("\nTreatment reversal distribution:")
print(reversal_info["n_reversals"].value_counts().sort_index().to_string())

# Filter to absorbing treatment sample
panel_csdid = get_absorbing_treatment_sample(panel, treatment_col=TREATMENT_COL)

# Validate cohorts on filtered sample
cohort_validation = validate_cohorts(panel_csdid, cohort_col="cohort")

# CS-DiD: ETS CO₂ (Absorbing Treatment Sample)
# =============================================================================
print("\n" + "=" * 70)
print("CS-DiD: ETS CO₂ (Absorbing Treatment Sample)")
print("=" * 70)

csdid_ets = estimate_callaway_santanna(
    panel_csdid, outcome_col=LOG_ETS_CO2_COL, cohort_col="cohort",
    control_group="notyettreated"  # Use not-yet-treated as controls
)

# CS-DiD: Satellite NOx — Permissive DL (≥0.03 kg/s)
# =============================================================================
print("\n" + "=" * 70)
print("CS-DiD: Satellite NOx — Permissive DL (≥0.03 kg/s)")
print("=" * 70)

# Filter to permissive DL (on absorbing treatment sample)
nox_perm_panel = panel_csdid[panel_csdid["above_dl_0_03"] == True].copy() if "above_dl_0_03" in panel_csdid.columns else panel_csdid.dropna(subset=[NOX_OUTCOME_COL])
print(f"NOx sample (DL ≥0.03, absorbing): {len(nox_perm_panel)} obs, {nox_perm_panel[FAC_ID_COL].nunique()} facilities") # type: ignore

csdid_nox_perm = {}
for emb_method in ["pca", "pls"]:
    print(f"\n--- {emb_method.upper()} Embeddings ---")
    df_reduced = reduce_embeddings(
        nox_perm_panel.copy(), method=emb_method, n_components=10, # type: ignore
        target_col=NOX_OUTCOME_COL if emb_method == "pls" else None
    )
    emb_cols = get_reduced_embedding_cols(df_reduced, method=emb_method)
    xformla = "~ " + " + ".join(emb_cols) if emb_cols else None
    
    csdid_nox_perm[emb_method] = estimate_callaway_santanna(
        df_reduced, outcome_col=NOX_OUTCOME_COL, cohort_col="cohort",
        xformla=xformla, control_group="notyettreated"
    )

# CS-DiD: Satellite NOx — Conservative DL (≥0.11 kg/s) [Robustness]
# =============================================================================
print("\n" + "=" * 70)
print("CS-DiD: Satellite NOx — Conservative DL (≥0.11 kg/s) [Robustness]")
print("=" * 70)

# Filter to conservative DL (on absorbing treatment sample)
nox_cons_panel = panel_csdid[panel_csdid["above_dl_0_11"] == True].copy() if "above_dl_0_11" in panel_csdid.columns else nox_perm_panel
print(f"NOx sample (DL ≥0.11, absorbing): {len(nox_cons_panel)} obs, {nox_cons_panel[FAC_ID_COL].nunique()} facilities") # type: ignore

csdid_nox_cons = {}
for emb_method in ["pca", "pls"]:
    print(f"\n--- {emb_method.upper()} Embeddings ---")
    df_reduced = reduce_embeddings(
        nox_cons_panel.copy(), method=emb_method, n_components=10, # type: ignore
        target_col=NOX_OUTCOME_COL if emb_method == "pls" else None
    )
    emb_cols = get_reduced_embedding_cols(df_reduced, method=emb_method)
    xformla = "~ " + " + ".join(emb_cols) if emb_cols else None
    
    csdid_nox_cons[emb_method] = estimate_callaway_santanna(
        df_reduced, outcome_col=NOX_OUTCOME_COL, cohort_col="cohort",
        xformla=xformla, control_group="notyettreated"
    )

# CS-DiD Summary Table
# =============================================================================
print("\n" + "=" * 70)
print("CS-DiD RESULTS SUMMARY (Absorbing Treatment Sample, Not-Yet-Treated Controls)")
print("=" * 70)

rows = []

# ETS
if csdid_ets and "agg_simple" in csdid_ets:
    agg = csdid_ets["agg_simple"]
    rows.append({
        "Outcome": "ETS CO₂ (Absorbing Sample)",
        "ATT": agg["att"],
        "SE": agg["se"],
        "95% CI": f"[{agg['ci'][0]:.4f}, {agg['ci'][1]:.4f}]" if agg["att"] else "N/A",
        "N": csdid_ets.get("n_obs", "")
    })

# NOx Permissive
for emb_method, res in csdid_nox_perm.items():
    if res and "agg_simple" in res:
        agg = res["agg_simple"]
        rows.append({
            "Outcome": f"Satellite NOx ({emb_method.upper()}, DL ≥0.03)",
            "ATT": agg["att"],
            "SE": agg["se"],
            "95% CI": f"[{agg['ci'][0]:.4f}, {agg['ci'][1]:.4f}]" if agg["att"] else "N/A",
            "N": res.get("n_obs", "")
        })

# NOx Conservative
for emb_method, res in csdid_nox_cons.items():
    if res and "agg_simple" in res:
        agg = res["agg_simple"]
        rows.append({
            "Outcome": f"Satellite NOx ({emb_method.upper()}, DL ≥0.11)",
            "ATT": agg["att"],
            "SE": agg["se"],
            "95% CI": f"[{agg['ci'][0]:.4f}, {agg['ci'][1]:.4f}]" if agg["att"] else "N/A",
            "N": res.get("n_obs", "")
        })

display(pd.DataFrame(rows))

CS-DiD SAMPLE PREPARATION: Absorbing Treatment Filter

Treatment reversal distribution:
n_reversals
0    390
1     65
2     53
3     12
4      1

Absorbing treatment filter (CS-DiD requirement):
  Total facilities: 521
  Reversers dropped: 131 (25.1%)
  Non-reversers kept: 390 (74.9%)
  Observations: 2819 → 2101
Valid: 4 cohorts, 326 treated, 64 control

CS-DiD: ETS CO₂ (Absorbing Treatment Sample)

  Dropping small cohorts (<10 units): [2019, 2020, 2021]
  Remaining cohorts: [np.int64(2018)]

Estimating CS-DiD:
  Outcome: log_ets_co2
  Control group: notyettreated
  Covariates: None
  N obs: 2101, N units: 390
Dropped 1692 units that were already treated in the first period.
  ERROR: exceptions must derive from BaseException

CS-DiD: Satellite NOx — Permissive DL (≥0.03 kg/s)
NOx sample (DL ≥0.03, absorbing): 633 obs, 162 facilities

--- PCA Embeddings ---
PCA reduction: 64 → 10 dimensions
  Variance explained: 89.9%
  Valid observations: 517 / 633

  Dropping small cohorts (<10 units

In [8]:
# =============================================================================
# Event Study Plots (CS-DiD)
# =============================================================================

if 'csdid_ets' in dir() and csdid_ets:
    print("\n" + "=" * 70)
    print("EVENT STUDY: ETS CO₂")
    print("=" * 70)
    
    if "att_gt" in csdid_ets:
        ets_model = csdid_ets["att_gt"]
        try:
            ets_model.aggte("dynamic")
            ets_model.plot_aggte()
            plt.title("Event Study: ETS CO₂ (CS-DiD)")
            plt.tight_layout()
            plt.show()
        except Exception as e:
            print(f"Could not plot ETS event study: {e}")

if 'csdid_nox_perm' in dir() and "pca" in csdid_nox_perm:
    print("\n" + "=" * 70)
    print("EVENT STUDY: Satellite NOx (PCA, DL ≥0.03)")
    print("=" * 70)
    
    if "att_gt" in csdid_nox_perm["pca"]:
        nox_model = csdid_nox_perm["pca"]["att_gt"]
        try:
            nox_model.aggte("dynamic")
            nox_model.plot_aggte()
            plt.title("Event Study: Satellite NOx PCA (CS-DiD, DL ≥0.03)")
            plt.tight_layout()
            plt.show()
        except Exception as e:
            print(f"Could not plot NOx event study: {e}")


EVENT STUDY: ETS CO₂

EVENT STUDY: Satellite NOx (PCA, DL ≥0.03)


## 5. Heterogeneity Analysis

Examine whether treatment effects vary systematically across facility characteristics.

### Split-Sample Analysis
Separate regressions for each subgroup:
1. **Electricity Sector** (`is_electricity`): EU ETS activity codes 1 or 20
2. **Urbanization**: Urban (SMOD ≥21) vs Rural facilities
3. **Dominant Fuel Type**: Coal, Gas, Oil, Biomass (whichever share is highest)
4. **Country**: Top 5 countries by facility count
5. **PyPSA Clusters** (electricity only): Top 5 grid regions

### Continuous Interaction Analysis
Single regression with treatment × characteristic interactions:
- **Fuel shares**: Treatment effect varies continuously with coal/gas/oil/biomass composition
- **Capacity**: Treatment effect varies with facility size (standardized for interpretability)
- **Urbanization**: Treatment effect differs for urban vs rural (binary interaction)

**Interpretation**: Positive interaction = weaker (less negative) treatment effect for that characteristic

### Split-Sample Dimensions

In [9]:
from diagnostics import run_full_heterogeneity_analysis, display_heterogeneity_results

# Run heterogeneity for all specs: ETS CO₂ + NOx (PCA/PLS × DL permissive/conservative)
het_results = run_full_heterogeneity_analysis(
    panel,
    ets_col=LOG_ETS_CO2_COL,
    nox_col=NOX_OUTCOME_COL,
    treatment_col=TREATMENT_COL,
    base_controls=CONTROLS,
    cluster_col=CLUSTER_COL,
    electricity_col=IS_ELECTRICITY_COL,
    urban_col=IN_URBAN_AREA_COL
)

# Display results
display_heterogeneity_results(het_results)

Running heterogeneity: ETS CO₂...

######################################################################
# Outcome: log_ets_co2
######################################################################
TWFE: eu_alloc_ratio -> log_ets_co2

TWFE (Facility + Region×Year FE)
  Formula: log_ets_co2 ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas | idx + nuts2_region^year
  Cluster: nuts2_region




  Coef: -0.210786 (SE: 0.040150, p=0.0000)

######################################################################
# Outcome: log_ets_co2
######################################################################
TWFE: eu_alloc_ratio -> log_ets_co2

TWFE (Facility + Region×Year FE)
  Formula: log_ets_co2 ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.090196 (SE: 0.026928, p=0.0029)

######################################################################
# Outcome: log_ets_co2
######################################################################
TWFE: eu_alloc_ratio -> log_ets_co2

TWFE (Facility + Region×Year FE)
  Formula: log_ets_co2 ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas | idx + nuts2_region^year
  Cluster: nuts2_region




  Coef: -0.154435 (SE: 0.038877, p=0.0002)

######################################################################
# Outcome: log_ets_co2
######################################################################
TWFE: eu_alloc_ratio -> log_ets_co2

TWFE (Facility + Region×Year FE)
  Formula: log_ets_co2 ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.239520 (SE: 0.024007, p=0.0000)

######################################################################
# Outcome: log_ets_co2
######################################################################
TWFE: eu_alloc_ratio -> log_ets_co2

TWFE (Facility + Region×Year FE)
  Formula: log_ets_co2 ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas | idx + nuts2_region^year
  Cluster: nuts2_region




  Coef: -0.206039 (SE: 0.058580, p=0.0011)

######################################################################
# Outcome: log_ets_co2
######################################################################
TWFE: eu_alloc_ratio -> log_ets_co2

TWFE (Facility + Region×Year FE)
  Formula: log_ets_co2 ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.444630 (SE: 0.327496, p=0.2076)

######################################################################
# Outcome: log_ets_co2
######################################################################
TWFE: eu_alloc_ratio -> log_ets_co2

TWFE (Facility + Region×Year FE)
  Formula: log_ets_co2 ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas | idx + nuts2_region^year
  Cluster: nuts2_region


            1 variables dropped due to multicollinearity.
            The following variables are dropped: ['share_coal'].
            


  Coef: -0.981538 (SE: 0.174035, p=0.0000)

######################################################################
# Outcome: log_ets_co2
######################################################################
TWFE: eu_alloc_ratio -> log_ets_co2

TWFE (Facility + Region×Year FE)
  Formula: log_ets_co2 ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.104904 (SE: 0.012065, p=0.0000)

######################################################################
# Outcome: log_ets_co2
######################################################################
TWFE: eu_alloc_ratio -> log_ets_co2

TWFE (Facility + Region×Year FE)
  Formula: log_ets_co2 ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas | idx + nuts2_region^year
  Cluster: nuts2_region




  Coef: -0.250225 (SE: 0.034293, p=0.0000)

######################################################################
# Outcome: log_ets_co2
######################################################################
TWFE: eu_alloc_ratio -> log_ets_co2

TWFE (Facility + Region×Year FE)
  Formula: log_ets_co2 ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.326573 (SE: 0.048863, p=0.0000)

######################################################################
# Outcome: log_ets_co2
######################################################################
TWFE: eu_alloc_ratio -> log_ets_co2

TWFE (Facility + Region×Year FE)
  Formula: log_ets_co2 ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.109272 (SE: 0.014617, p=0.0001)





######################################################################
# Outcome: log_ets_co2
######################################################################
TWFE: eu_alloc_ratio -> log_ets_co2

TWFE (Facility + Region×Year FE)
  Formula: log_ets_co2 ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.241734 (SE: 0.059378, p=0.0096)

######################################################################
# Outcome: log_ets_co2
######################################################################
TWFE: eu_alloc_ratio -> log_ets_co2

TWFE (Facility + Region×Year FE)
  Formula: log_ets_co2 ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -1.253513 (SE: 0.169227, p=0.0003)

######################################################################
# Outcome: log_ets_co2
######################################################################
TWFE: eu_alloc_rat



  Coef: -1.089565 (SE: 0.038705, p=0.0001)

######################################################################
# Outcome: log_ets_co2
######################################################################
TWFE: eu_alloc_ratio -> log_ets_co2

TWFE (Facility + Region×Year FE)
  Formula: log_ets_co2 ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.261709 (SE: 0.060588, p=0.0125)
Running heterogeneity: NOx (PCA, DL≥0.03)...
PCA reduction: 64 → 10 dimensions
  Variance explained: 89.8%
  Valid observations: 677 / 827

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pca



  Coef: -0.000027 (SE: 0.000273, p=0.9230)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pca_emb_06 + pca_emb_07 + pca_emb_08 + pca_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.004022 (SE: 0.007495, p=0.6288)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + 

            1 variables dropped due to multicollinearity.
            The following variables are dropped: ['share_coal'].
            


  Coef: 0.000472 (SE: 0.000529, p=0.3806)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pca_emb_06 + pca_emb_07 + pca_emb_08 + pca_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: 0.001821 (SE: 0.000000, p=0.0000)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pc

            4 variables dropped due to multicollinearity.
            The following variables are dropped: ['share_coal', 'pca_emb_07', 'pca_emb_08', 'pca_emb_09'].
            
            13 variables dropped due to multicollinearity.
            The following variables are dropped: 
    capacity_mw
    share_coal
    share_gas
    pca_emb_00
    pca_emb_01
    ....
            
  G_adj_value = G / (G - 1)
  _adj_factor = (_N - _has_intercept) / (_N - _k - _k_fe)
  _adj_factor_within = (_N - _k_fe) / (_N - _k - _k_fe)
  _adj_factor = (_N - _has_intercept) / (_N - _k - _k_fe)
  _adj_factor_within = (_N - _k_fe) / (_N - _k - _k_fe)


  Coef: -0.129673 (SE: inf, p=nan)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pca_emb_06 + pca_emb_07 + pca_emb_08 + pca_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: 0.000055 (SE: 0.000240, p=0.8208)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pca_emb_0

            1 variables dropped due to multicollinearity.
            The following variables are dropped: ['share_coal'].
            
  _adj_factor = (_N - _has_intercept) / (_N - _k - _k_fe)
  _adj_factor_within = (_N - _k_fe) / (_N - _k - _k_fe)
  _adj_factor = (_N - _has_intercept) / (_N - _k - _k_fe)
  _adj_factor_within = (_N - _k_fe) / (_N - _k - _k_fe)


  Coef: -0.000091 (SE: 0.000580, p=0.9005)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pca_emb_06 + pca_emb_07 + pca_emb_08 + pca_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: 0.040739 (SE: 0.008334, p=0.0009)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + p




######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pca_emb_06 + pca_emb_07 + pca_emb_08 + pca_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.000349 (SE: 0.000425, p=0.4379)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pca_emb_06 + pca_emb_07 + pca_emb_08 + pca_

            3 variables dropped due to multicollinearity.
            The following variables are dropped: ['pca_emb_07', 'pca_emb_08', 'pca_emb_09'].
            


  Coef: 0.023187 (SE: 0.000000, p=0.0000)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pca_emb_06 + pca_emb_07 + pca_emb_08 + pca_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.005288 (SE: 0.015939, p=0.7961)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + p

            1 variables dropped due to multicollinearity.
            The following variables are dropped: ['share_coal'].
            
            1 variables dropped due to multicollinearity.
            The following variables are dropped: ['share_gas'].
            
  G_adj_value = G / (G - 1)



######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pca_emb_06 + pca_emb_07 + pca_emb_08 + pca_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: 0.025250 (SE: 0.002698, p=0.0026)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pca_emb_06 + pca_emb_07 + pca_emb_08 + pca_e

  G_adj_value = G / (G - 1)


  Coef: 0.000424 (SE: inf, p=nan)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pca_emb_06 + pca_emb_07 + pca_emb_08 + pca_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.275716 (SE: inf, p=nan)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pca_emb_06 + pca_

            11 variables dropped due to multicollinearity.
            The following variables are dropped: 
    share_gas
    pca_emb_00
    pca_emb_01
    pca_emb_02
    pca_emb_03
    ....
            
  G_adj_value = G / (G - 1)
  _adj_factor = (_N - _has_intercept) / (_N - _k - _k_fe)
  _adj_factor_within = (_N - _k_fe) / (_N - _k - _k_fe)
  _adj_factor = (_N - _has_intercept) / (_N - _k - _k_fe)
  _adj_factor_within = (_N - _k_fe) / (_N - _k - _k_fe)
            1 variables dropped due to multicollinearity.
            The following variables are dropped: ['share_coal'].
            


  Coef: 0.001526 (SE: 0.012911, p=0.9251)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pca_emb_06 + pca_emb_07 + pca_emb_08 + pca_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.001031 (SE: inf, p=nan)
Running heterogeneity: NOx (PLS, DL≥0.03)...
PLS training: 200 facilities with valid embeddings + target
PLS reduction: 64 → 10 dimensions
  Training R² (facility-level mean beirle_nox_kg_s): 0.627
  Facilities used for training: 200
  Panel observations with valid embeddings: 677 / 827

######################################################################
# Outcome: beirle_nox_kg_s
#############################

            6 variables dropped due to multicollinearity.
            The following variables are dropped: 
    capacity_mw
    pca_emb_05
    pca_emb_06
    pca_emb_07
    pca_emb_08
    ....
            
  G_adj_value = G / (G - 1)
  _adj_factor = (_N - _has_intercept) / (_N - _k - _k_fe)
  _adj_factor_within = (_N - _k_fe) / (_N - _k - _k_fe)
  _adj_factor = (_N - _has_intercept) / (_N - _k - _k_fe)
  _adj_factor_within = (_N - _k_fe) / (_N - _k - _k_fe)


  Coef: -0.000104 (SE: 0.000262, p=0.6943)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + pls_emb_06 + pls_emb_07 + pls_emb_08 + pls_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.012772 (SE: 0.010455, p=0.3091)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + 

            1 variables dropped due to multicollinearity.
            The following variables are dropped: ['share_coal'].
            


  Coef: 0.000435 (SE: 0.000461, p=0.3549)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + pls_emb_06 + pls_emb_07 + pls_emb_08 + pls_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: 0.001821 (SE: 0.000000, p=0.0000)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + pl

            4 variables dropped due to multicollinearity.
            The following variables are dropped: ['share_coal', 'pls_emb_07', 'pls_emb_08', 'pls_emb_09'].
            
            13 variables dropped due to multicollinearity.
            The following variables are dropped: 
    capacity_mw
    share_coal
    share_gas
    pls_emb_00
    pls_emb_01
    ....
            
  G_adj_value = G / (G - 1)
  _adj_factor = (_N - _has_intercept) / (_N - _k - _k_fe)
  _adj_factor_within = (_N - _k_fe) / (_N - _k - _k_fe)
  _adj_factor = (_N - _has_intercept) / (_N - _k - _k_fe)
  _adj_factor_within = (_N - _k_fe) / (_N - _k - _k_fe)



######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + pls_emb_06 + pls_emb_07 + pls_emb_08 + pls_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.000002 (SE: 0.000254, p=0.9940)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + pls_emb_06 + pls_emb_07 + pls_emb_08 + pls_



  Coef: 0.000345 (SE: 0.000689, p=0.6255)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + pls_emb_06 + pls_emb_07 + pls_emb_08 + pls_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.003070 (SE: 0.002029, p=0.2695)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + p

            1 variables dropped due to multicollinearity.
            The following variables are dropped: ['share_coal'].
            
  _adj_factor = (_N - _has_intercept) / (_N - _k - _k_fe)
  _adj_factor_within = (_N - _k_fe) / (_N - _k - _k_fe)
  _adj_factor = (_N - _has_intercept) / (_N - _k - _k_fe)
  _adj_factor_within = (_N - _k_fe) / (_N - _k - _k_fe)



######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + pls_emb_06 + pls_emb_07 + pls_emb_08 + pls_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: 0.040740 (SE: 0.008525, p=0.0010)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + pls_emb_06 + pls_emb_07 + pls_emb_08 + pls_e



  Coef: -0.002278 (SE: 0.002013, p=0.2841)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + pls_emb_06 + pls_emb_07 + pls_emb_08 + pls_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: 0.000234 (SE: 0.000525, p=0.6695)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + p

            3 variables dropped due to multicollinearity.
            The following variables are dropped: ['pls_emb_07', 'pls_emb_08', 'pls_emb_09'].
            
            1 variables dropped due to multicollinearity.
            The following variables are dropped: ['share_coal'].
            


  Coef: -0.032346 (SE: 0.000000, p=0.0000)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + pls_emb_06 + pls_emb_07 + pls_emb_08 + pls_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: 0.007446 (SE: 0.032982, p=0.8586)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + p

            1 variables dropped due to multicollinearity.
            The following variables are dropped: ['share_gas'].
            
  G_adj_value = G / (G - 1)



######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + pls_emb_06 + pls_emb_07 + pls_emb_08 + pls_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: 0.027636 (SE: 0.004496, p=0.0087)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + pls_emb_06 + pls_emb_07 + pls_emb_08 + pls_e

  G_adj_value = G / (G - 1)
            11 variables dropped due to multicollinearity.
            The following variables are dropped: 
    share_gas
    pls_emb_00
    pls_emb_01
    pls_emb_02
    pls_emb_03
    ....
            
  G_adj_value = G / (G - 1)
  _adj_factor = (_N - _has_intercept) / (_N - _k - _k_fe)
  _adj_factor_within = (_N - _k_fe) / (_N - _k - _k_fe)
  _adj_factor = (_N - _has_intercept) / (_N - _k - _k_fe)
  _adj_factor_within = (_N - _k_fe) / (_N - _k - _k_fe)
            1 variables dropped due to multicollinearity.
            The following variables are dropped: ['share_coal'].
            


  Coef: 0.001022 (SE: inf, p=nan)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + pls_emb_06 + pls_emb_07 + pls_emb_08 + pls_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.275716 (SE: inf, p=nan)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + pls_emb_06 + pls_

            6 variables dropped due to multicollinearity.
            The following variables are dropped: 
    capacity_mw
    pls_emb_05
    pls_emb_06
    pls_emb_07
    pls_emb_08
    ....
            
  G_adj_value = G / (G - 1)
  _adj_factor = (_N - _has_intercept) / (_N - _k - _k_fe)
  _adj_factor_within = (_N - _k_fe) / (_N - _k - _k_fe)
  _adj_factor = (_N - _has_intercept) / (_N - _k - _k_fe)
  _adj_factor_within = (_N - _k_fe) / (_N - _k - _k_fe)


  Coef: 0.059349 (SE: inf, p=nan)
Running heterogeneity: NOx (PCA, DL≥0.11)...
PCA reduction: 64 → 10 dimensions
  Variance explained: 94.5%
  Valid observations: 154 / 187

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pca_emb_06 + pca_emb_07 + pca_emb_08 + pca_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.003341 (SE: 0.000386, p=0.0033)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_a



  Coef: 0.006902 (SE: 0.001480, p=0.0186)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pca_emb_06 + pca_emb_07 + pca_emb_08 + pca_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.002816 (SE: 0.000589, p=0.0174)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + p



  Coef: 0.011171 (SE: 0.015400, p=0.6005)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pca_emb_06 + pca_emb_07 + pca_emb_08 + pca_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: 0.065264 (SE: 0.000000, p=0.0000)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pc

            4 variables dropped due to multicollinearity.
            The following variables are dropped: ['share_coal', 'pca_emb_07', 'pca_emb_08', 'pca_emb_09'].
            
  G_adj_value = G / (G - 1)
  _adj_factor = (_N - _has_intercept) / (_N - _k - _k_fe)
  _adj_factor_within = (_N - _k_fe) / (_N - _k - _k_fe)
  _adj_factor = (_N - _has_intercept) / (_N - _k - _k_fe)
  _adj_factor_within = (_N - _k_fe) / (_N - _k - _k_fe)


  Coef: 0.011425 (SE: inf, p=nan)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pca_emb_06 + pca_emb_07 + pca_emb_08 + pca_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.002801 (SE: 0.000184, p=0.0417)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pca_emb_0

  G_adj_value = G / (G - 1)


  Coef: 0.056655 (SE: 0.011230, p=0.1246)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pca_emb_06 + pca_emb_07 + pca_emb_08 + pca_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.002940 (SE: inf, p=nan)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pca_emb_0

  G_adj_value = G / (G - 1)


  Coef: 0.036190 (SE: inf, p=nan)
Running heterogeneity: NOx (PLS, DL≥0.11)...
PLS training: 46 facilities with valid embeddings + target
PLS reduction: 64 → 10 dimensions
  Training R² (facility-level mean beirle_nox_kg_s): 0.936
  Facilities used for training: 46
  Panel observations with valid embeddings: 154 / 187

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + pls_emb_06 + pls_emb_07 + pls_emb_08 + pls_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.003222 (SE: 0.000283, p=0.0015)

######################################################################
# Outcome: beirle_nox_kg_s
###############################



  Coef: 0.007015 (SE: 0.002117, p=0.0453)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + pls_emb_06 + pls_emb_07 + pls_emb_08 + pls_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.003008 (SE: 0.000256, p=0.0013)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + p



  Coef: -0.003345 (SE: 0.021509, p=0.9018)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + pls_emb_06 + pls_emb_07 + pls_emb_08 + pls_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: 0.050720 (SE: 0.000065, p=0.0008)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + p

            4 variables dropped due to multicollinearity.
            The following variables are dropped: ['share_coal', 'pls_emb_07', 'pls_emb_08', 'pls_emb_09'].
            
  G_adj_value = G / (G - 1)
  _adj_factor = (_N - _has_intercept) / (_N - _k - _k_fe)
  _adj_factor_within = (_N - _k_fe) / (_N - _k - _k_fe)
  _adj_factor = (_N - _has_intercept) / (_N - _k - _k_fe)
  _adj_factor_within = (_N - _k_fe) / (_N - _k - _k_fe)



######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + pls_emb_06 + pls_emb_07 + pls_emb_08 + pls_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.003596 (SE: 0.000047, p=0.0084)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + pls_emb_06 + pls_emb_07 + pls_emb_08 + pls_

  G_adj_value = G / (G - 1)


  Coef: 0.078480 (SE: 0.002547, p=0.0207)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + pls_emb_06 + pls_emb_07 + pls_emb_08 + pls_emb_09 | idx + nuts2_region^year
  Cluster: nuts2_region
  Coef: -0.003488 (SE: inf, p=nan)

######################################################################
# Outcome: beirle_nox_kg_s
######################################################################
TWFE: eu_alloc_ratio -> beirle_nox_kg_s

TWFE (Facility + Region×Year FE)
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + capacity_mw + share_coal + share_gas + pls_emb_00 + pls_emb_01 + pls_emb_02 + pls_emb_03 + pls_emb_04 + pls_emb_05 + pls_emb_0

  G_adj_value = G / (G - 1)


Unnamed: 0,Dimension,Group,Coef,SE,P-value,N,Sig
0,Sector,Electricity,-0.2108,0.0402,0.0,2173,***
1,Sector,Other Sectors,-0.0902,0.0269,0.0029,443,***
2,Location,Urban,-0.1544,0.0389,0.0002,1943,***
3,Location,Rural,-0.2395,0.024,0.0,650,***
4,Fuel,Gas,-0.206,0.0586,0.0011,1197,***
5,Fuel,Oil,-0.4446,0.3275,0.2076,129,
6,Fuel,Coal,-0.9815,0.174,0.0,653,***
7,Fuel,Biomass,-0.1049,0.0121,0.0,438,***
8,Country,FR,-0.2502,0.0343,0.0,920,***
9,Country,PL,-0.3266,0.0489,0.0,743,***



### NOx (PCA, DL≥0.03)


Unnamed: 0,Dimension,Group,Coef,SE,P-value,N,Sig
0,Sector,Electricity,-0.0,0.0003,0.923,492,
1,Sector,Other Sectors,-0.004,0.0075,0.6288,52,
2,Location,Urban,0.0005,0.0005,0.3806,513,
3,Location,Rural,0.0018,0.0,0.0,25,***
4,Interference,Isolated (<20km),-0.1297,inf,,4,
5,Interference,Interfered (≥20km),0.0001,0.0002,0.8208,559,
6,Fuel,Gas,0.0004,0.0006,0.516,237,
7,Fuel,Oil,0.0034,0.0,0.0,33,***
8,Fuel,Biomass,-0.0001,0.0006,0.9005,40,
9,Fuel,Coal,0.0407,0.0083,0.0009,166,***



### NOx (PLS, DL≥0.03)


Unnamed: 0,Dimension,Group,Coef,SE,P-value,N,Sig
0,Sector,Electricity,-0.0001,0.0003,0.6943,492,
1,Sector,Other Sectors,-0.0128,0.0105,0.3091,52,
2,Location,Urban,0.0004,0.0005,0.3549,513,
3,Location,Rural,0.0018,0.0,0.0,25,***
4,Interference,Isolated (<20km),-0.1297,inf,,4,
5,Interference,Interfered (≥20km),-0.0,0.0003,0.994,559,
6,Fuel,Gas,0.0003,0.0007,0.6255,237,
7,Fuel,Oil,-0.0031,0.002,0.2695,33,
8,Fuel,Biomass,0.0018,0.001,0.322,40,
9,Fuel,Coal,0.0407,0.0085,0.001,166,***



### NOx (PCA, DL≥0.11)


Unnamed: 0,Dimension,Group,Coef,SE,P-value,N,Sig
0,Sector,Electricity,-0.0033,0.0004,0.0033,132,***
1,Location,Urban,0.0069,0.0015,0.0186,128,**
2,Interference,Interfered (≥20km),-0.0028,0.0006,0.0174,140,**
3,Fuel,Gas,0.0112,0.0154,0.6005,68,
4,Fuel,Coal,0.0653,0.0,0.0,38,***
5,Fuel,Biomass,0.0114,inf,,19,
6,Country,FR,-0.0028,0.0002,0.0417,96,**
7,Country,PL,0.0567,0.0112,0.1246,44,
8,PyPSA (Elec),FR0 9,-0.0029,inf,,82,
9,PyPSA (Elec),PL0 2,0.0362,inf,,34,



### NOx (PLS, DL≥0.11)


Unnamed: 0,Dimension,Group,Coef,SE,P-value,N,Sig
0,Sector,Electricity,-0.0032,0.0003,0.0015,132,***
1,Location,Urban,0.007,0.0021,0.0453,128,**
2,Interference,Interfered (≥20km),-0.003,0.0003,0.0013,140,***
3,Fuel,Gas,-0.0033,0.0215,0.9018,68,
4,Fuel,Coal,0.0507,0.0001,0.0008,38,***
5,Fuel,Biomass,0.0114,inf,,19,
6,Country,FR,-0.0036,0.0,0.0084,96,***
7,Country,PL,0.0785,0.0025,0.0207,44,**
8,PyPSA (Elec),FR0 9,-0.0035,inf,,82,
9,PyPSA (Elec),PL0 2,0.0362,inf,,34,



----------------------------------------------------------------------
Note: Negative coef = policy stringency reduces emissions
      * p<0.1, ** p<0.05, *** p<0.01


### Continuous Interactions

In [10]:
from diagnostics import run_continuous_interaction_analysis
from embedding_reduction import reduce_embeddings, get_reduced_embedding_cols

print("=" * 70)
print("CONTINUOUS INTERACTION ANALYSIS: All Specifications")
print("=" * 70)

interaction_results = {}

# 1. ETS CO₂
print("\n### [1/5] ETS CO₂")
interaction_results["ETS CO₂"] = run_continuous_interaction_analysis(
    panel, outcome_col=LOG_ETS_CO2_COL, treatment_col=TREATMENT_COL,
    controls=CONTROLS, cluster_col=CLUSTER_COL, urban_col=IN_URBAN_AREA_COL
)

# 2-5. NOx Specifications
for dl_col, dl_label in [("above_dl_0_03", "DL≥0.03"), ("above_dl_0_11", "DL≥0.11")]:
    if dl_col not in panel.columns:
        continue
    nox_sample = panel[panel[dl_col] == True].copy()
    if len(nox_sample) < 50:
        print(f"  Skipping {dl_label}: insufficient data")
        continue
    
    for emb_method in ["pca", "pls"]:
        spec_name = f"NOx ({emb_method.upper()}, {dl_label})"
        print(f"\n### [{2 + (dl_col == 'above_dl_0_11') * 2 + (emb_method == 'pls')}/5] {spec_name}")
        
        try:
            df_reduced = reduce_embeddings(
                nox_sample.copy(), method=emb_method, n_components=10, # type: ignore
                target_col=NOX_OUTCOME_COL if emb_method == "pls" else None
            )
            emb_cols = get_reduced_embedding_cols(df_reduced, method=emb_method)
            controls_with_emb = CONTROLS + emb_cols
            
            interaction_results[spec_name] = run_continuous_interaction_analysis(
                df_reduced, outcome_col=NOX_OUTCOME_COL, treatment_col=TREATMENT_COL,
                controls=controls_with_emb, cluster_col=CLUSTER_COL, urban_col=IN_URBAN_AREA_COL
            )
        except Exception as e:
            print(f"  Error: {e}")

# Display all results
print("\n" + "=" * 70)
print("INTERACTION RESULTS SUMMARY")
print("=" * 70)

for spec_name, df in interaction_results.items():
    if df is not None and len(df) > 0:
        print(f"\n### {spec_name}")
        df_display = df.copy()
        df_display["Coef"] = df_display["Coef"].apply(lambda x: f"{x:.6f}")
        df_display["SE"] = df_display["SE"].apply(lambda x: f"{x:.6f}")
        df_display["P-value"] = df_display["P-value"].apply(lambda x: f"{x:.4f}" if x is not None else "N/A")
        display(df_display)

print("\n" + "-" * 70)
print("Interpretation:")
print("  • Treatment (baseline): Effect at mean capacity, zero fuel shares, rural")
print("  • × Fuel: How effect changes per 1-unit (100%) increase in that fuel share")
print("  • × Capacity (std): How effect changes per 1 SD increase in capacity")
print("  • × Urban: Difference in treatment effect for urban vs rural")
print("  • Positive interaction = weaker (less negative) treatment effect")

CONTINUOUS INTERACTION ANALYSIS: All Specifications

### [1/5] ETS CO₂

Continuous Interaction Model:
  Formula: log_ets_co2 ~ eu_alloc_ratio + treat_x_share_coal + treat_x_share_gas + treat_x_share_oil + treat_x_share_biomass + treat_x_capacity + treat_x_urban + share_coal + share_gas + share_oil + share_biomass + capacity_std | idx + nuts2_region^year





### [2/5] NOx (PCA, DL≥0.03)
PCA reduction: 64 → 10 dimensions
  Variance explained: 89.8%
  Valid observations: 677 / 827

Continuous Interaction Model:
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + treat_x_share_coal + treat_x_share_gas + treat_x_share_oil + treat_x_share_biomass + treat_x_capacity + treat_x_urban + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pca_emb_06 + pca_emb_07 + pca_emb_08 + pca_emb_09 + share_coal + share_gas + share_oil + share_biomass + capacity_std | idx + nuts2_region^year

### [3/5] NOx (PLS, DL≥0.03)
PLS training: 200 facilities with valid embeddings + target
PLS reduction: 64 → 10 dimensions
  Training R² (facility-level mean beirle_nox_kg_s): 0.627
  Facilities used for training: 200
  Panel observations with valid embeddings: 677 / 827

Continuous Interaction Model:
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + treat_x_share_coal + treat_x_share_gas + treat_x_share_oil + treat_x_share_biomass + treat_x_capacity + tre




### [4/5] NOx (PCA, DL≥0.11)
PCA reduction: 64 → 10 dimensions
  Variance explained: 94.5%
  Valid observations: 154 / 187

Continuous Interaction Model:
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + treat_x_share_coal + treat_x_share_gas + treat_x_share_oil + treat_x_share_biomass + treat_x_capacity + treat_x_urban + pca_emb_00 + pca_emb_01 + pca_emb_02 + pca_emb_03 + pca_emb_04 + pca_emb_05 + pca_emb_06 + pca_emb_07 + pca_emb_08 + pca_emb_09 + share_coal + share_gas + share_oil + share_biomass + capacity_std | idx + nuts2_region^year

### [5/5] NOx (PLS, DL≥0.11)
PLS training: 46 facilities with valid embeddings + target
PLS reduction: 64 → 10 dimensions
  Training R² (facility-level mean beirle_nox_kg_s): 0.936
  Facilities used for training: 46
  Panel observations with valid embeddings: 154 / 187

Continuous Interaction Model:
  Formula: beirle_nox_kg_s ~ eu_alloc_ratio + treat_x_share_coal + treat_x_share_gas + treat_x_share_oil + treat_x_share_biomass + treat_x_capacity + treat



Unnamed: 0,Variable,Coef,SE,P-value,Sig
0,Treatment (baseline),-0.04554,0.113433,0.6894,
1,× Fuel: coal,-0.631108,0.237774,0.0099,***
2,× Fuel: gas,-0.233971,0.117956,0.0514,*
3,× Fuel: oil,-0.480866,0.17017,0.0062,***
4,× Fuel: biomass,-0.121782,0.11072,0.2753,
5,× Capacity (std),0.01517,0.049613,0.7607,
6,× Urban,0.070837,0.032371,0.0321,**



### NOx (PCA, DL≥0.03)


Unnamed: 0,Variable,Coef,SE,P-value,Sig
0,Treatment (baseline),-0.005689,0.005385,0.2998,
1,× Fuel: coal,0.020721,0.022743,0.37,
2,× Fuel: gas,0.001877,0.005524,0.7365,
3,× Fuel: oil,0.00053,0.005751,0.9273,
4,× Fuel: biomass,0.001488,0.00552,0.7895,
5,× Capacity (std),-0.001385,0.000328,0.0002,***
6,× Urban,0.004496,0.001073,0.0003,***



### NOx (PLS, DL≥0.03)


Unnamed: 0,Variable,Coef,SE,P-value,Sig
0,Treatment (baseline),-0.001833,0.007738,0.8145,
1,× Fuel: coal,0.020418,0.015686,0.2036,
2,× Fuel: gas,-0.001739,0.00751,0.8185,
3,× Fuel: oil,-0.003249,0.007509,0.6685,
4,× Fuel: biomass,-0.002277,0.007605,0.7669,
5,× Capacity (std),-0.001364,0.000453,0.0055,***
6,× Urban,0.004228,0.000638,0.0,***



### NOx (PCA, DL≥0.11)


Unnamed: 0,Variable,Coef,SE,P-value,Sig
0,Treatment (baseline),-0.023292,0.172231,0.901,
1,× Fuel: coal,0.038645,0.161696,0.8265,
2,× Fuel: gas,0.070852,0.171505,0.7073,
3,× Fuel: oil,0.030733,0.164576,0.8638,
4,× Fuel: biomass,0.032869,0.18011,0.8668,
5,× Capacity (std),0.028236,0.01667,0.1889,
6,× Urban,0.005217,0.003589,0.242,



### NOx (PLS, DL≥0.11)


Unnamed: 0,Variable,Coef,SE,P-value,Sig
0,Treatment (baseline),-0.038115,0.134717,0.7956,
1,× Fuel: coal,0.065214,0.123089,0.6329,
2,× Fuel: gas,0.07372,0.137221,0.6284,
3,× Fuel: oil,0.02743,0.142896,0.86,
4,× Fuel: biomass,0.04562,0.143796,0.7718,
5,× Capacity (std),0.024213,0.018051,0.2723,
6,× Urban,0.006977,0.003847,0.1674,



----------------------------------------------------------------------
Interpretation:
  • Treatment (baseline): Effect at mean capacity, zero fuel shares, rural
  • × Fuel: How effect changes per 1-unit (100%) increase in that fuel share
  • × Capacity (std): How effect changes per 1 SD increase in capacity
  • × Urban: Difference in treatment effect for urban vs rural
  • Positive interaction = weaker (less negative) treatment effect
