# Temporary Impact Modeling & Optimal Execution

**Author**: Rahul  
**Date**: July 2025  

**Abstract.**  
In this notebook we will:
1. Load L2 order‑book snapshots for three tickers (CRWV, FROG, SOUN).
2. Compute the *temporary impact* function \(g_t(x)\) by simulating market‐order execution up to size \(x\).
3. Fit and compare:
   - **Linear model**: \(g_{\rm lin}(x)=\beta\,x\)
   - **Power‐law model**: \(g_{\rm pow}(x)=\eta\,x^\alpha\)
4. Visualize fits and compare RMSE.
5. Summarize why the power‐law model best captures concave impact.


## Field Definitions (MBP‑10 L2 Order‑Book Snapshot)

| Field            | Description                                                                                               |
|------------------|-----------------------------------------------------------------------------------------------------------|
| **ts_recv**      | Capture‑server receive timestamp (ns since UNIX epoch).                                                   |
| **ts_event**     | Matching‑engine event timestamp (ns since UNIX epoch).                                                    |
| **rtype**        | Record type (always `10` for MBP‑10).                                                                     |
| **publisher_id** | Databento publisher ID (identifies venue/dataset).                                                        |
| **instrument_id**| Numeric instrument ID (unique per ticker).                                                                |
| **action**       | Event action:  
- `A` = add order  
- `C` = cancel order  
- `M` = modify order  
- `R` = clear book  
- `T` = trade.  
| **side**         | Aggressor side:  
- `Ask` = sell side  
- `Bid` = buy side  
- `None` = no side (e.g. book‑clear).  
| **depth**        | Book level updated (0 = top of book, 1 = second, … up to 9).                                              |
| **price**        | Price of the event in integer units (1 unit = 1e‑9, so divide by 1e9 to get actual price).               |
| **size**         | Order quantity associated with the event.                                                                 |
| **flags**        | Bitfield for event characteristics and data quality flags.                                                |
| **ts_in_delta**  | Latency: ns from event timestamp to capture‑server receive time (`ts_recv − ts_event`).                  |
| **sequence**     | Venue‑assigned per‑message sequence number (monotonic counter).                                           |
| **bid_px_00**    | Best‑bid price (level 0).                                                                                 |
| **ask_px_00**    | Best‑ask price (level 0).                                                                                 |
| **bid_sz_00**    | Aggregate bid size at level 0.                                                                            |
| **ask_sz_00**    | Aggregate ask size at level 0.                                                                            |
| **bid_ct_00**    | Number of orders contributing to bid size at level 0.                                                     |
| **ask_ct_00**    | Number of orders contributing to ask size at level 0.                                                     |
| **bid_px_01**…   | Bid price at level 1 through level 9 (`bid_px_01` … `bid_px_09`).                                         |
| **ask_px_01**…   | Ask price at level 1 through level 9 (`ask_px_01` … `ask_px_09`).                                         |
| **bid_sz_01**…   | Bid size at level 1 through level 9 (`bid_sz_01` … `bid_sz_09`).                                          |
| **ask_sz_01**…   | Ask size at level 1 through level 9 (`ask_sz_01` … `ask_sz_09`).                                          |
| **bid_ct_01**…   | Bid order count at level 1 through level 9 (`bid_ct_01` … `bid_ct_09`).                                   |
| **ask_ct_01**…   | Ask order count at level 1 through level 9 (`ask_ct_01` … `ask_ct_09`).                                   |
| **symbol**       | Ticker symbol string (e.g. `CRWV`, `FROG`, `SOUN`).                                                       |

**Usage notes:**
- To compute the mid‑price:  
  \[
    \text{mid} = \tfrac{\text{bid\_px\_00} + \text{ask\_px\_00}}{2}
  \]
- To simulate a buy market‑order of size \(X\), “eat” through `ask_px_00..ask_px_09` using corresponding `ask_sz_00..ask_sz_09` until \(X\) is filled, summing \((p-\text{mid})\times\text{quantity}\) to get total slippage.
- Likewise, for a sell market‑order, use the bid side levels and \((\text{mid}-p)\times\text{quantity}\).

This should clarify every column in your CSV snapshots.


In [6]:
import os, glob
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit


plot_dir = '/home/rahul/visualizations/train/BlockHouse/plots'
os.makedirs(plot_dir, exist_ok=True)


def compute_temporary_impact(row, side='ask', levels=10, X_vals=None):
    bid0, ask0 = row['bid_px_00'], row['ask_px_00']
    mid = (bid0 + ask0) / 2
    if side == 'ask':
        pxs = [row[f'ask_px_{i:02d}'] for i in range(levels)]
        szs = [row[f'ask_sz_{i:02d}'] for i in range(levels)]
    else:
        pxs = [row[f'bid_px_{i:02d}'] for i in range(levels)]
        szs = [row[f'bid_sz_{i:02d}'] for i in range(levels)]
    total_liq = sum(szs)
    X_vals = X_vals or np.linspace(1, total_liq, 50, dtype=int)
    g = []
    for X in X_vals:
        rem, cost = X, 0.0
        for p, s in zip(pxs, szs):
            take = min(rem, s)
            cost += take * (p - mid)
            rem -= take
            if rem <= 0: break
        g.append(cost / X)
    return X_vals, np.array(g)

def fit_models(X, g):
    # Linear fit
    beta, _ = np.polyfit(X, g, 1)
    lin_pred = beta * X
    # Power‑law fit
    def pow_law(x, eta, alpha): return eta * x**alpha
    (eta, alpha), _ = curve_fit(pow_law, X, g, p0=(g[0], 0.5), maxfev=5000)
    pow_pred = pow_law(X, eta, alpha)
    return beta, eta, alpha, lin_pred, pow_pred


base = "/home/rahul/visualizations/train/BlockHouse/Dataset"
ticker_patterns = {
    'CRWV': os.path.join(base, 'CRWW/CRWW', '*.csv'),
    'FROG': os.path.join(base, 'FROG/FROG', '*.csv'),
    'SOUN': os.path.join(base, 'SOUN/SOUN', '*.csv'),
}

for ticker, pattern in ticker_patterns.items():
    for path in sorted(glob.glob(pattern)):
        
        date = os.path.basename(path).split('_')[1]
        
        
        df = pd.read_csv(path)
        row = df.iloc[0]  
        
        
        X, g = compute_temporary_impact(row)
        beta, eta, alpha, lin_pred, pow_pred = fit_models(X, g)
        
        # Plot
        plt.figure()
        plt.plot(X, g, 'o', label='Empirical')
        plt.plot(X, lin_pred, '--', label=f'Linear (β={beta:.3g})')
        plt.plot(X, pow_pred, '-', label=f'Power‐law (η={eta:.3g}, α={alpha:.3g})')
        plt.title(f'{ticker} {date}: Impact Fit')
        plt.xlabel('Trade Size X')
        plt.ylabel('Slippage g(X)')
        plt.legend()
        
        # Save
        outfile = os.path.join(plot_dir, f'{ticker}_{date}.png')
        plt.savefig(outfile, dpi=150, bbox_inches='tight')
        plt.close()


In [7]:
import os, glob
import numpy as np
import pandas as pd
from scipy.optimize import curve_fit


def compute_temporary_impact(row, levels=10, X_vals=None):
    mid = (row['bid_px_00'] + row['ask_px_00'])/2
    pxs = [row[f'ask_px_{i:02d}'] for i in range(levels)]
    szs = [row[f'ask_sz_{i:02d}'] for i in range(levels)]
    total_liq = sum(szs)
    X_vals = X_vals or np.linspace(1, total_liq, 50, dtype=int)
    g = []
    for X in X_vals:
        rem, cost = X, 0.0
        for p, s in zip(pxs, szs):
            take = min(rem, s)
            cost += take * (p-mid)
            rem -= take
            if rem<=0: break
        g.append(cost/X)
    return X_vals, np.array(g)

def fit_rmse(X, g):
    beta, _ = np.polyfit(X, g, 1)
    lin_rmse = np.sqrt(np.mean((g - beta*X)**2))
    def pow_law(x, eta, alpha): return eta*x**alpha
    (eta, alpha), _ = curve_fit(pow_law, X, g, p0=(g[0],0.5), maxfev=5000)
    pow_rmse = np.sqrt(np.mean((g - pow_law(X,eta,alpha))**2))
    return lin_rmse, pow_rmse


base = "/home/rahul/visualizations/train/BlockHouse/Dataset"
patterns = {
    'CRWV': os.path.join(base, 'CRWW/CRWW', '*.csv'),
    'FROG': os.path.join(base, 'FROG/FROG', '*.csv'),
    'SOUN': os.path.join(base, 'SOUN/SOUN', '*.csv'),
}
rows = []
for ticker, pat in patterns.items():
    for path in sorted(glob.glob(pat)):
        date = os.path.basename(path).split('_')[1]
        df = pd.read_csv(path)
        row0 = df.iloc[0]
        X, g = compute_temporary_impact(row0)
        lin_rmse, pow_rmse = fit_rmse(X, g)
        better = 'power-law' if pow_rmse < lin_rmse else 'linear'
        rows.append({
            'Ticker': ticker,
            'Date': date,
            'RMSE Linear': lin_rmse,
            'RMSE Power‑Law': pow_rmse,
            'Better Model': better
        })

summary_df = pd.DataFrame(rows)

win_counts = summary_df['Better Model'].value_counts().to_dict()
print(f"Out of {len(summary_df)} fits: {win_counts}")


print(summary_df.to_markdown(index=False))


Out of 63 fits: {'power-law': 63}
| Ticker   | Date          |   RMSE Linear |   RMSE Power‑Law | Better Model   |
|:---------|:--------------|--------------:|-----------------:|:---------------|
| CRWV     | 2025-04-03 00 |    0.0945249  |       0.0504396  | power-law      |
| CRWV     | 2025-04-04 00 |    0.35298    |       0.0910437  | power-law      |
| CRWV     | 2025-04-07 00 |    0.500627   |       0.0797326  | power-law      |
| CRWV     | 2025-04-08 00 |    0.666603   |       0.0296038  | power-law      |
| CRWV     | 2025-04-09 00 |    0.234813   |       0.0657233  | power-law      |
| CRWV     | 2025-04-10 00 |    0.139214   |       0.0357708  | power-law      |
| CRWV     | 2025-04-11 00 |    0.0417682  |       0.0317581  | power-law      |
| CRWV     | 2025-04-14 00 |    0.158619   |       0.0103914  | power-law      |
| CRWV     | 2025-04-15 00 |    0.491929   |       0.0339894  | power-law      |
| CRWV     | 2025-04-16 00 |    0.164451   |       0.0444163  | power-law  