In [None]:
import statsmodels.formula.api as smf
from heterogenity_clusters import get_trade_code_clusters
import pandas as pd

In [None]:

df = pd.read_csv("pricing_treatment.csv", parse_dates=["date"])
# Step 1: Get the last date per trade_code
last_dates = df.groupby("trade_code")["date"].max().reset_index()
last_dates.columns = ["trade_code", "last_date"]

# Step 2: Check how many unique last dates there are
unique_last_dates = last_dates["last_date"].nunique()
print(f"\nNumber of unique last dates: {unique_last_dates}")

# Step 3: Show distribution of last dates
print("\nDistribution of last dates per trade_code:")
print(last_dates["last_date"].value_counts().sort_index())

# Step 4: Show min and max last date across all trade_codes
min_date = last_dates["last_date"].min()
max_date = last_dates["last_date"].max()
print(f"\nEarliest last date: {min_date}")
print(f"Latest last date  : {max_date}")
print(f"Difference between earliest and latest last date: {(max_date - min_date).days} days")

# Optional: View full per-trade breakdown
# print(last_dates.sort_values("last_date"))

# DiD with Cluster hetrogenity and variable margins

In [None]:
df = pd.read_csv("pricing_treatment.csv", parse_dates=["date"])
df["date"] = df["date"].dt.tz_localize(None)

# Step 1: Get clusters
trade_clusters = get_trade_code_clusters(df, n_clusters=3)
df["cluster"] = df["trade_code"].map(trade_clusters)

# Step 2: DiD variables
cutoff = df["date"].max() - pd.Timedelta(days=60)
df["treated"] = (df["treatment"] == "treatment").astype(int)
df["post"] = (df["date"] >= cutoff).astype(int)
df["did"] = df["treated"] * df["post"]

# Step 3: Define profit margin percentages (percentage drop from selling price to cost price)
profit_percentages = [0.10, 0.15, 0.20, 0.30, 0.40, 0.50]  # 10%, 15%, etc.

results = []

# Loop over profit percentage scenarios
for margin in profit_percentages:
    # Calculate gross profit column for this scenario
    df["gross_profit"] = (df["price"] - (df["price"] * (1 - margin))) * df["bookings"]
    # This simplifies to: df["gross_profit"] = df["price"] * margin * df["bookings"]

    # For each cluster, run DiD on gross_profit
    for cluster_id in sorted(df["cluster"].unique()):
        df_cluster = df[df["cluster"] == cluster_id]

        # Basic group diff method (ATE sanity check)
        try:
            g = df_cluster.groupby(["treated", "post"])["gross_profit"].mean().unstack()
            ate_gp = (g.loc[1, 1] - g.loc[1, 0]) - (g.loc[0, 1] - g.loc[0, 0])
        except:
            ate_gp = float("nan")

        # TWFE model
        try:
            mod = smf.ols('gross_profit ~ did + C(trade_code) + C(date)', data=df_cluster)
            res = mod.fit(cov_type="cluster", cov_kwds={"groups": df_cluster["trade_code"]})
            did_coef = res.params.get("did", float("nan"))
            p_val = res.pvalues.get("did", float("nan"))
        except:
            did_coef, p_val = float("nan"), float("nan")

        results.append({
            "margin_pct": f"{int(margin*100)}%",
            "cluster": cluster_id,
            "n_trades": df_cluster["trade_code"].nunique(),
            "ate_gross_profit": round(ate_gp, 2) if pd.notna(ate_gp) else None,
            "did_coef": round(did_coef, 2) if pd.notna(did_coef) else None,
            "p_value": round(p_val, 4) if pd.notna(p_val) else None
        })

# Step 4: Summary table
results_df = pd.DataFrame(results)
results_df = results_df.sort_values(["margin_pct", "cluster"]).reset_index(drop=True)
print(results_df)


   margin_pct  cluster  n_trades  ate_gross_profit  did_coef  p_value
0         10%        0        32            385.72    385.72   0.6514
1         10%        1         6         -19385.61 -19385.61   0.0000
2         10%        2        16          -2108.01  -2108.01   0.3269
3         15%        0        32            578.58    578.58   0.6514
4         15%        1         6         -29078.41 -29078.41   0.0000
5         15%        2        16          -3162.02  -3162.02   0.3269
6         20%        0        32            771.44    771.44   0.6514
7         20%        1         6         -38771.22 -38771.22   0.0000
8         20%        2        16          -4216.02  -4216.02   0.3269
9         30%        0        32           1157.16   1157.16   0.6514
10        30%        1         6         -58156.83 -58156.83   0.0000
11        30%        2        16          -6324.04  -6324.04   0.3269
12        40%        0        32           1542.88   1542.88   0.6514
13        40%       

## Treatment Effect on Gross Profit by Assumed Margin %

The table below shows the estimated **Difference-in-Differences** (DiD) effect of the price intervention on **gross profit**, under different **uniform profit margin assumptions** (cost price = price × (1 − margin%)).

$$
GP = B \times \left( P \times m \right)
$$

| **Margin %** | **Cluster** | **n_trades** | **ATE Gross Profit** | **p-value** | **Significance** |
|--------------|-------------|--------------|----------------------|-------------|------------------|
| 10%          | 0           | 32           | +385.72              | 0.6514      | Not significant  |
| 10%          | 1           | 6            | −19,385.61           | 0.0000      | ***              |
| 10%          | 2           | 16           | −2,108.01            | 0.3269      | Not significant  |
| 15%          | 0           | 32           | +578.58              | 0.6514      | Not significant  |
| 15%          | 1           | 6            | −29,078.41           | 0.0000      | ***              |
| 15%          | 2           | 16           | −3,162.02            | 0.3269      | Not significant  |
| 20%          | 0           | 32           | +771.44              | 0.6514      | Not significant  |
| 20%          | 1           | 6            | −38,771.22           | 0.0000      | ***              |
| 20%          | 2           | 16           | −4,216.02            | 0.3269      | Not significant  |
| 30%          | 0           | 32           | +1,157.16            | 0.6514      | Not significant  |
| 30%          | 1           | 6            | −58,156.83           | 0.0000      | ***              |
| 30%          | 2           | 16           | −6,324.04            | 0.3269      | Not significant  |
| 40%          | 0           | 32           | +1,542.88            | 0.6514      | Not significant  |
| 40%          | 1           | 6            | −77,542.44           | 0.0000      | ***              |
| 40%          | 2           | 16           | −8,432.05            | 0.3269      | Not significant  |
| 50%          | 0           | 32           | +1,928.60            | 0.6514      | Not significant  |
| 50%          | 1           | 6            | −96,928.05           | 0.0000      | ***              |
| 50%          | 2           | 16           | −10,540.06           | 0.3269      | Not significant  |

---

### Cluster 0 (32 trade codes)
- **Pattern**: Small positive ATE in all margin scenarios (+385 to +1,928) but **p-values ~0.65** → not statistically significant.
- **Interpretation**: Price intervention doesn’t have a reliable effect on gross profit for this cluster, regardless of margin assumption.

---

### Cluster 1 (6 trade codes)
- **Pattern**: Large, negative ATE across all margins (−19k to −97k), **p < 0.001** in all cases.
- **Interpretation**: Strong, significant drop in gross profit — the effect worsens as margin % increases (because absolute loss scales with margin).
- **Recommendation**: Intervention is harmful here. Consider immediate rollback or redesign for these trade codes.

---

### Cluster 2 (16 trade codes)
- **Pattern**: Modest negative ATE (−2k to −10k), **p ~0.33** → not statistically significant.
- **Interpretation**: The loss is not statistically robust; could be noise.

---

## Overall Takeaways
1. **Cluster 1 is clearly hurt** by the intervention — and the harm is economically large and statistically significant.
2. **Clusters 0 & 2** show no statistically significant change in gross profit, even when scaling the assumed margin from 10% to 50%.
3. Effect size **scales linearly with assumed margin** because profit per unit is proportional to the margin.

---
**Legend**:  
- *** p < 0.001 → Highly significant  
- "Not significant" → p ≥ 0.05


**Gross Profit Formula (with variable cost price from margin assumption)**

Let:  
- \( B \) = number of bookings  
- \( P \) = selling price per booking  
- \( m \) = assumed profit margin (e.g., 0.10 for 10%)  
- \( C \) = cost price per booking  

We define the cost price as:  
$$
C = P \times (1 - m)
$$

Then gross profit (\(GP\)) is:  
$$
GP = B \times (P - C)
$$

Substituting \( C \) from above:  
$$
GP = B \times \left( P - P \times (1 - m) \right)
$$

Simplifying:  
$$
GP = B \times \left( P \times m \right)
$$
