# 07_poverty_times_vat_rebate

## Part A — What we’re doing
We compute a **Poverty × VAT** household rebate on the calibrated Step-01 panel.

- **Base**: `poverty_threshold(hh_size) × vat_rate`.
- **Phase-out (AGI basis)**: full at ≤150% of poverty; linearly tapers to **0** by ≥200%.
- **No marriage adjustment**: household size only.

**Outputs**
- `outputs/rebates/poverty_times_vat/rebate_records_2024.csv` — household-level rebates
- `outputs/rebates/poverty_times_vat/summary_2024.csv` — totals & key breakdowns
- `outputs/rebates/poverty_times_vat/by_decile_2024.csv` — decile totals (equivalized income)
- `outputs/rebates/poverty_times_vat/by_size_2024.csv` — totals by size bucket
- `outputs/rebates/poverty_times_vat/plots/deciles_2024.png` — decile bar chart

---

## Part B — Inputs
- Step-01 panel: `intermediate/ca_panel_2024.(parquet|csv)` with
  `household_agi`, `household_size`, `filing_status`, `household_weight`.

- Policy modules:
  - `policy/constants.py` (hardcoded 1–7+ poverty thresholds; 7 used for 7+)
  - `policy/rebates/poverty_times_vat.py` (rebate formula)

- Config (if present): `vat.rate`; defaults to `0.10` if missing (does **not** edit repo config).

---

## Part C — Methods
1) Load panel & parameters  
2) Compute **record-level** rebate (and the **no-phaseout** base)  
3) Aggregate totals and breakdowns (deciles via AGI per capita; size buckets 1..7+)  
4) Save CSVs + a simple decile plot

---

## Part D — Acceptance checks
- Rebates **≥ 0** for all households  
- **With-phaseout ≤ no-phaseout** (overall and by groups)  
- Breakdown sums ≈ overall total (within rounding)

---

## Part E — Troubleshooting
- If panel missing columns, re-run Step-01.  
- If all zeros, check `vat.rate` > 0 and thresholds defined in `policy/constants.py`.  
- If deciles look odd, ensure AGI is numeric and clip hh_size ≥ 1.

---


In [1]:
import json, time
from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from policy.constants import poverty_threshold
from policy.rebates.poverty_times_vat import poverty_times_vat_rebate

START_TS = time.time()

# ---------- Paths ----------
INTERMEDIATE = Path("intermediate")
OUT_DIR = Path("outputs/rebates/poverty_times_vat")
(OUT_DIR / "plots").mkdir(parents=True, exist_ok=True)

# ---------- Config (optional) ----------
def _load_config():
    for p in ["config.json", "config/config.json"]:
        f = Path(p)
        if f.exists():
            try:
                with open(f, "r") as fh:
                    return json.load(fh)
            except Exception:
                pass
    return {}

CFG = _load_config()
VAT_RATE = CFG.get("vat", {}).get("rate", 0.10)  # default if not set

print("Parameters:", {"vat.rate": VAT_RATE})

# ---------- Load Step-01 panel ----------
panel = None
for fname in ["ca_panel_2024.parquet", "ca_panel_2024.csv",
              "ca_panel_2024_2025.parquet", "ca_panel_2024_2025.csv"]:
    p = INTERMEDIATE / fname
    if p.exists():
        panel = pd.read_parquet(p) if p.suffix == ".parquet" else pd.read_csv(p)
        break
if panel is None:
    raise FileNotFoundError("Step 01 panel not found in intermediate/.")

req = ["household_agi","household_size","household_weight","filing_status"]
miss = [c for c in req if c not in panel.columns]
if miss:
    raise KeyError(f"Panel missing columns: {miss}")

# Types & derived
panel["household_agi"] = pd.to_numeric(panel["household_agi"], errors="coerce").fillna(0.0)
panel["household_size"] = pd.to_numeric(panel["household_size"], errors="coerce").fillna(1).astype(int).clip(lower=1)
panel["household_weight"] = pd.to_numeric(panel["household_weight"], errors="coerce").fillna(0.0)
panel["filing_status"] = panel["filing_status"].astype(str)
panel["size_bucket"] = np.where(panel["household_size"] >= 7, 7, panel["household_size"]).astype(int)

# ---------- Compute rebate (record level) ----------
# Base (no phase-out): poverty_threshold * VAT_RATE
panel["rebate_ptv_base"] = [poverty_threshold(sz) * float(VAT_RATE) for sz in panel["household_size"]]

# With phase-out
panel["rebate_ptv"] = [poverty_times_vat_rebate(agi, sz, VAT_RATE)
                       for agi, sz in zip(panel["household_agi"], panel["household_size"])]

# Acceptance: non-negativity
assert (panel["rebate_ptv"] >= 0).all(), "Negative rebate found."

# ---------- Aggregations ----------
def wsum(x, w): return float((x.astype(float) * w.astype(float)).sum())

# Equivalized income deciles (AGI per capita)
inc_pc = panel["household_agi"].astype(float) / panel["household_size"].clip(lower=1).astype(float)
x = inc_pc.to_numpy(); w = panel["household_weight"].astype(float).to_numpy()
idx = np.argsort(x); xs, ws = x[idx], w[idx]; cw = np.cumsum(ws)
if len(ws) > 0 and cw[-1] > 0:
    cuts = [cw[-1] * k / 10 for k in range(1, 10)]
    edges = [-np.inf]
    for c in cuts:
        i = np.searchsorted(cw, c, side="left")
        i = min(max(i, 0), len(xs)-1)
        edges.append(xs[i])
    edges.append(np.inf)
    panel["decile"] = pd.cut(inc_pc, bins=edges, labels=range(1,11), include_lowest=True).astype(int)
else:
    panel["decile"] = 1

# Totals
total_with = wsum(panel["rebate_ptv"], panel["household_weight"])
total_base = wsum(panel["rebate_ptv_base"], panel["household_weight"])
assert total_with <= total_base + 1e-6, "With-phaseout exceeds base."

# By size bucket
by_size = (panel.groupby("size_bucket")
                .apply(lambda g: wsum(g["rebate_ptv"], g["household_weight"]))
                .reset_index(name="weighted_total"))

# By decile
by_dec = (panel.groupby("decile")
               .apply(lambda g: wsum(g["rebate_ptv"], g["household_weight"]))
               .reset_index(name="weighted_total")
               .sort_values("decile"))

# ---------- Save outputs ----------
panel.loc[:, ["household_agi","household_size","filing_status","household_weight",
              "rebate_ptv","rebate_ptv_base"]].to_csv(OUT_DIR / "rebate_records_2024.csv", index=False)

pd.DataFrame([{
    "policy": "poverty_times_vat",
    "vat_rate": VAT_RATE,
    "total_with_phaseout": total_with,
    "total_no_phaseout": total_base
}]).to_csv(OUT_DIR / "summary_2024.csv", index=False)

by_dec.to_csv(OUT_DIR / "by_decile_2024.csv", index=False)
by_size.to_csv(OUT_DIR / "by_size_2024.csv", index=False)

# ---------- Plot ----------
plt.figure()
plt.bar(by_dec["decile"].astype(str), by_dec["weighted_total"].astype(float))
plt.title("Poverty × VAT rebate by equivalized-income decile (2024)")
plt.xlabel("Decile"); plt.ylabel("Weighted rebate total")
plt.tight_layout()
plt.savefig(OUT_DIR / "plots" / "deciles_2024.png", dpi=150)
plt.close()

print(f"✅ 07_poverty_times_vat_rebate complete in {time.time()-START_TS:,.1f}s")


ModuleNotFoundError: No module named 'matplotlib'