# 02_rebate_costs_2024.ipynb

## Part A — What we are doing

We compute the **statewide cost** of the VAT rebate for California in 2024, and break it down by:
- **No phase-out** vs **With phase-out**
- **Equivalized income deciles** (AGI ÷ household_size)
- **Household status** (Single vs Married Households)

**Outputs**
- `outputs/vat/rebate_cost_2024.csv` — statewide totals (no-phase & with-phase)
- `outputs/vat/rebate_cost_by_decile_2024.csv` — totals by decile
- `outputs/vat/rebate_cost_by_status_2024.csv` — totals by Single vs Married

**Why this matters**
- These are the primary **cost figures** used to calibrate VAT rates and to report budget impacts.

---

## Part B — Inputs & dependencies

- **Reads:** `intermediate/ca_panel_2024.(parquet|csv)` from notebook `01`.
- Requires the fields created in `01`:
  - `household_weight`, `household_agi`, `household_size`
  - `consumption_allowance`, `rebate_after_phaseout`
  - `filing_status` (for the by-status table)

---

## Part C — Equivalized deciles (weighted)

We define **equivalized income** as:
equiv_income = household_agi / household_size
Then we compute **weighted deciles (10 bins)** using `household_weight`. The method:
1. Sort households by `equiv_income`.
2. Compute cumulative weight share.
3. Cut at 10 equal-weight breakpoints (0.1, 0.2, …, 1.0).

We assign each household to a decile label `D1`..`D10`.

---

## Part D — Statewide totals & consistency checks

We aggregate, using `household_weight`:
- **No-phase total**: sum of `consumption_allowance` (weighted).  
- **With-phase total**: sum of `rebate_after_phaseout` (weighted).

We also compute these totals **by decile** and **by filing status**.

**Consistency checks**
- **With-phase ≤ No-phase** statewide and by decile.
- **Sum across deciles equals statewide totals** (both measures).

---

## Part E — Deliverables & acceptance checks

**Files written**
- `outputs/vat/rebate_cost_2024.csv`
- `outputs/vat/rebate_cost_by_decile_2024.csv`
- `outputs/vat/rebate_cost_by_status_2024.csv`

**Acceptance checks**
- No missing values in output totals.
- Decile totals **exactly match** statewide totals (within floating tolerance).
- With-phase is **never larger** than no-phase.

---

## Part F — Troubleshooting

- **Totals don’t add up**:  
  Ensure decile assignment uses the same `household_weight` and no households are dropped after deciles are computed.
- **Weirdly high totals**:  
  Re-check negative-AGI exclusion in `01`. Confirm no double-counting and that weights are household-level.
- **Missing panel file**:  
  Use absolute path (Windows example) to load:
  `C:\Users\Ali.Melad\Dropbox\Ali Work\Kyle\California VAT\policy_engile_cali_v2\intermediate\ca_panel_2024.csv`


In [13]:
# 02 — Rebate costs (2024 only; no reweighting here — weights already deflated in Step 01)
import os, time, numpy as np, pandas as pd, importlib.util

t0 = time.time()
print("Step 02 start.")

# Load vat_rebate helpers
vat_path = os.path.abspath("../policy/vat_rebate.py")
print("Loading vat_rebate from:", vat_path)
spec = importlib.util.spec_from_file_location("vat_rebate", vat_path)
vr = importlib.util.module_from_spec(spec); spec.loader.exec_module(vr)
print("Loaded:", vr.__file__)

# Load panel from Step 01
parq = "../intermediate/ca_panel_2024.parquet"
csv  = "../intermediate/ca_panel_2024.csv"
panel_path = parq if os.path.exists(parq) else (csv if os.path.exists(csv) else None)
if panel_path is None:
    raise FileNotFoundError("Missing panel; run Step 01 first to create ca_panel_2024.(parquet|csv)")
print("Reading:", panel_path)

df = pd.read_parquet(panel_path) if panel_path.endswith(".parquet") else pd.read_csv(panel_path)
print("Panel shape:", df.shape)
print("Columns:", list(df.columns))

# Normalize weight → df["weight"]
if "weight" not in df.columns:
    wcol = next((c for c in df.columns if c.lower() in ("household_weight","weight","hh_weight")), None)
    if wcol is None:
        raise KeyError("No weight column found (looked for household_weight/weight/hh_weight).")
    df["weight"] = pd.to_numeric(df[wcol], errors="coerce").fillna(0.0)
else:
    df["weight"] = pd.to_numeric(df["weight"], errors="coerce").fillna(0.0)

# Quick diagnostics (household count after Step 01's 11% deflator)
hh_total = float(df["weight"].sum())
print(f"[diag] Weighted CA households (2024; after Step 01 deflator): {hh_total:,.0f}")

# Ensure allowance & phaseout present (compute if needed)
if "consumption_allowance" not in df.columns:
    must = {"size_bucket","is_married_couple"}
    missing = [m for m in must if m not in df.columns]
    if missing:
        raise KeyError(f"Missing {missing} required to compute allowance.")
    df = vr.compute_allowance(df)

if "rebate_after_phaseout" not in df.columns:
    if "household_agi" not in df.columns:
        raise KeyError("household_agi missing; cannot compute phaseout.")
    df = vr.apply_phaseout(df)

# Statewide totals
w = df["weight"].astype(float)
total_no = vr.weighted_sum(df["consumption_allowance"].astype(float), w)
total_ph = vr.weighted_sum(df["rebate_after_phaseout"].astype(float), w)
print(f"Totals — No phase: ${total_no:,.0f} | With phase: ${total_ph:,.0f}")

os.makedirs("../outputs/vat", exist_ok=True)
pd.DataFrame(
    {"year":[2024], "no_phaseout_total":[total_no], "phaseout_total":[total_ph]}
).to_csv("../outputs/vat/rebate_cost_2024.csv", index=False)
print("Saved ../outputs/vat/rebate_cost_2024.csv")

# Deciles by equivalized income (AGI / size)
need_dec = {"household_agi","household_size"} - set(df.columns)
if need_dec:
    raise KeyError(f"Missing columns for deciles: {sorted(need_dec)}")

df["equiv_income"] = (
    df["household_agi"].astype(float) / np.maximum(df["household_size"].astype(float), 1.0)
)
df = vr.add_weighted_deciles(df, income_col="equiv_income", weight_col="weight", label="decile")

by_dec = (
    df.groupby("decile", as_index=False)
      .apply(lambda g: pd.Series({
          "total_no_phaseout": vr.weighted_sum(g["consumption_allowance"], g["weight"]),
          "total_phaseout":    vr.weighted_sum(g["rebate_after_phaseout"], g["weight"]),
          "households_weighted": float(g["weight"].sum()),
      }))
      .reset_index(drop=True)
)
by_dec.to_csv("../outputs/vat/rebate_cost_by_decile_2024.csv", index=False)
print("Saved ../outputs/vat/rebate_cost_by_decile_2024.csv")
print(by_dec.head().to_string(index=False))

# By filing status (if present)
if "filing_status" in df.columns:
    by_fs = (
        df.groupby("filing_status", as_index=False)
          .apply(lambda g: pd.Series({
              "total_no_phaseout": vr.weighted_sum(g["consumption_allowance"], g["weight"]),
              "total_phaseout":    vr.weighted_sum(g["rebate_after_phaseout"], g["weight"]),
          }))
          .reset_index(drop=True)
    )
    by_fs.to_csv("../outputs/vat/rebate_cost_by_status_2024.csv", index=False)
    print("Saved ../outputs/vat/rebate_cost_by_status_2024.csv")
    print(by_fs.to_string(index=False))
else:
    print("filing_status not in panel; skipping by-status table.")

# Integrity checks
assert (by_dec["total_phaseout"] <= by_dec["total_no_phaseout"] + 1e-9).all(), "Phase-out should not exceed no-phase totals."
assert np.isclose(by_dec["total_no_phaseout"].sum(), total_no), "Decile totals (no-phase) do not sum to statewide total."
assert np.isclose(by_dec["total_phaseout"].sum(),   total_ph), "Decile totals (phase-out) do not sum to statewide total."

print(f"✅ Step 02 complete. Elapsed {time.time()-t0:.2f}s")


Step 02 start.
Loading vat_rebate from: c:\Users\Ali.Melad\Dropbox\Ali Work\Kyle\California VAT\policy_engile_cali_v2\policy\vat_rebate.py
Loaded: c:\Users\Ali.Melad\Dropbox\Ali Work\Kyle\California VAT\policy_engile_cali_v2\policy\vat_rebate.py
Reading: ../intermediate/ca_panel_2024.csv
Panel shape: (1747, 15)
Columns: ['state_code', 'household_size', 'household_weight', 'household_agi', 'employment_income', 'fed_income_tax', 'ca_income_tax', 'filing_status', 'is_married_couple', 'size_bucket', 'consumption_allowance', 'rebate_after_phaseout', 'excess_over_threshold', 'allowance_no_phaseout', 'allowance_phaseout']
[diag] Weighted CA households (2024; after Step 01 deflator): 14,431,591
Totals — No phase: $391,504,607,385 | With phase: $283,828,897,363
Saved ../outputs/vat/rebate_cost_2024.csv
Saved ../outputs/vat/rebate_cost_by_decile_2024.csv
decile  total_no_phaseout  total_phaseout  households_weighted
     1       4.356717e+10    4.356717e+10         1.682220e+06
     2       4.72

  df.groupby("decile", as_index=False)
  .apply(lambda g: pd.Series({
  .apply(lambda g: pd.Series({
