# Tutorial: Causal Inference 06 - Confounding and Conditions

Audience:
- Learners who want to know when a causal estimate is trustworthy.

Prerequisites:
- Notebooks 01 to 05.

Learning goals:
- Detect imbalance between treatment and control groups.
- Read propensity-overlap diagnostics.
- Explain key conditions required for causal claims.


## Outline

1. Run covariate balance checks (SMD).
2. Run propensity diagnostics (AUC + overlap).
3. Connect diagnostics to causal conditions.
4. Exercise + extension.


In [None]:
from pathlib import Path
import sys

project_root = Path.cwd().resolve()
if not (project_root / "src").exists():
    project_root = project_root.parent

sys.path.insert(0, str(project_root / "src"))
import pandas as pd

from causal_showcase.data import load_marketing_ab_data
from causal_showcase.diagnostics import covariate_balance_table, propensity_diagnostics

data_path = project_root / "data" / "raw" / "marketing_ab.csv"
prepared = load_marketing_ab_data(data_path)

balance_df = covariate_balance_table(prepared.X, prepared.treatment)
prop_diag = propensity_diagnostics(prepared.X, prepared.treatment)

print(f"Propensity AUC: {prop_diag.auc:.4f}")
print(f"Overlap share in [0.05, 0.95]: {prop_diag.overlap_share:.4f}")


## Step 1 - Covariate balance table

Rule of thumb:
- `|SMD| < 0.10`: usually acceptable balance.
- `|SMD| >= 0.10`: potential imbalance worth investigating.


In [None]:
balance_df.head(15)


## Step 2 - Quick diagnostic interpretation

AUC interpretation (for treatment prediction from covariates):
- Near `0.50`: treatment is hard to predict from observed `X`.
- Much above `0.60`: treatment assignment may depend on observed features.

In observational data, high AUC plus imbalance often signals confounding risk.


In [None]:
def diagnostic_summary(balance: pd.DataFrame, auc: float, overlap_share: float) -> pd.DataFrame:
    high_imbalance = int((balance["abs_smd"] >= 0.10).sum())
    if auc < 0.55:
        auc_flag = "low dependence"
    elif auc < 0.70:
        auc_flag = "moderate dependence"
    else:
        auc_flag = "high dependence"

    if overlap_share > 0.9:
        overlap_flag = "good overlap"
    elif overlap_share > 0.7:
        overlap_flag = "moderate overlap"
    else:
        overlap_flag = "poor overlap"

    return pd.DataFrame(
        {
            "metric": ["high_imbalance_features", "propensity_auc", "overlap_share"],
            "value": [high_imbalance, auc, overlap_share],
            "interpretation": [
                "count of features with |SMD| >= 0.10",
                auc_flag,
                overlap_flag,
            ],
        }
    )

diagnostic_summary(balance_df, prop_diag.auc, prop_diag.overlap_share)


## Conditions checklist (plain language)

Before trusting causal conclusions, ask:
1. **Consistency**: does treatment mean the same thing for everyone?
2. **Exchangeability**: after controlling for measured `X`, are treated/control comparable?
3. **Positivity**: for each type of user, is treatment and control both possible?
4. **No interference**: one person's treatment should not directly change another person's outcome.
5. **Correct temporal order**: causes must be measured before outcomes.


In [None]:
# Exercise scaffold:
# Pick one feature with high |SMD| and propose one strategy to reduce confounding risk.

candidate_feature = balance_df.iloc[0]["feature"]
print("Feature to investigate:", candidate_feature)
print("Your strategy: <write here>")
