# Day 2: Market Impact Models & Price Impact Estimation

## Week 21: Market Microstructure

---

## Learning Objectives

1. Understand **market impact** and why it matters for trading
2. Learn **temporary vs permanent** price impact
3. Implement **Almgren-Chriss** market impact model
4. Explore **Kyle's Lambda** and price impact estimation
5. Build **empirical impact models** from trade data
6. Understand **square-root law** of market impact

---

## 1. Introduction to Market Impact

### What is Market Impact?

**Market impact** is the effect that a trade has on the price of an asset. When you buy, prices tend to rise; when you sell, prices tend to fall.

### Components of Market Impact

1. **Temporary Impact**: Price movement that reverts after the trade
   - Caused by liquidity demand
   - Affects execution price but not future prices

2. **Permanent Impact**: Lasting price change from information revelation
   - Market learns from order flow
   - Affects all future prices

### Why Market Impact Matters

- **Transaction costs**: Impact is often the largest component
- **Strategy capacity**: Limits how much capital a strategy can deploy
- **Optimal execution**: Need impact models to minimize costs

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.optimize import minimize, curve_fit
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')

# Set style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('husl')

# Random seed for reproducibility
np.random.seed(42)

print("Market Impact Models - Day 2")
print("="*50)

## 2. Generate Synthetic Market Data

We'll create realistic synthetic data that exhibits market impact characteristics.

In [None]:
def generate_market_data_with_impact(n_trades=5000, 
                                      base_price=100,
                                      daily_volume=1_000_000,
                                      volatility=0.02,
                                      permanent_impact=0.1,
                                      temporary_impact=0.2):
    """
    Generate synthetic trade data with market impact effects.
    
    Parameters:
    -----------
    n_trades : int
        Number of trades to simulate
    base_price : float
        Starting price
    daily_volume : int
        Average daily volume
    volatility : float
        Daily volatility
    permanent_impact : float
        Permanent impact coefficient
    temporary_impact : float
        Temporary impact coefficient
    """
    
    # Generate trade sizes (log-normal distribution - realistic)
    avg_trade_size = daily_volume / 500  # ~500 trades per day
    trade_sizes = np.random.lognormal(
        mean=np.log(avg_trade_size),
        sigma=1.0,
        size=n_trades
    ).astype(int)
    trade_sizes = np.clip(trade_sizes, 100, daily_volume * 0.1)
    
    # Generate trade directions (+1 buy, -1 sell)
    # Slight autocorrelation (order splitting)
    directions = np.zeros(n_trades)
    directions[0] = np.random.choice([-1, 1])
    for i in range(1, n_trades):
        if np.random.random() < 0.6:  # 60% same direction
            directions[i] = directions[i-1]
        else:
            directions[i] = np.random.choice([-1, 1])
    
    # Signed trade sizes
    signed_sizes = trade_sizes * directions
    
    # Generate prices with impact
    prices = np.zeros(n_trades)
    fundamental_price = base_price
    
    for i in range(n_trades):
        # Random walk for fundamental price
        fundamental_price += np.random.normal(0, volatility * base_price / np.sqrt(500))
        
        # Normalized trade size
        q = signed_sizes[i] / daily_volume
        
        # Permanent impact (affects fundamental)
        permanent = permanent_impact * volatility * base_price * np.sign(q) * np.sqrt(np.abs(q))
        fundamental_price += permanent
        
        # Temporary impact (only affects this trade)
        temporary = temporary_impact * volatility * base_price * np.sign(q) * np.sqrt(np.abs(q))
        
        # Observed price
        prices[i] = fundamental_price + temporary
    
    # Create DataFrame
    df = pd.DataFrame({
        'trade_id': range(n_trades),
        'price': prices,
        'size': trade_sizes,
        'direction': directions.astype(int),
        'signed_size': signed_sizes.astype(int),
        'daily_volume': daily_volume
    })
    
    # Add derived features
    df['volume_fraction'] = df['size'] / df['daily_volume']
    df['price_change'] = df['price'].diff()
    df['return'] = df['price'].pct_change()
    df['mid_price'] = df['price'].rolling(5).mean()
    
    return df.dropna().reset_index(drop=True)

# Generate data
market_data = generate_market_data_with_impact()
print(f"Generated {len(market_data)} trades")
print("\nSample data:")
market_data.head(10)

In [None]:
# Visualize the data
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Price path
ax1 = axes[0, 0]
ax1.plot(market_data['trade_id'], market_data['price'], linewidth=0.5, alpha=0.8)
ax1.set_xlabel('Trade Number')
ax1.set_ylabel('Price')
ax1.set_title('Price Evolution with Market Impact')

# Trade size distribution
ax2 = axes[0, 1]
ax2.hist(market_data['size'], bins=50, edgecolor='black', alpha=0.7)
ax2.set_xlabel('Trade Size')
ax2.set_ylabel('Frequency')
ax2.set_title('Trade Size Distribution (Log-Normal)')
ax2.set_yscale('log')

# Price change vs signed volume
ax3 = axes[1, 0]
ax3.scatter(market_data['signed_size'], market_data['price_change'], 
            alpha=0.3, s=10)
ax3.axhline(y=0, color='r', linestyle='--', alpha=0.5)
ax3.axvline(x=0, color='r', linestyle='--', alpha=0.5)
ax3.set_xlabel('Signed Trade Size')
ax3.set_ylabel('Price Change')
ax3.set_title('Price Change vs Signed Volume')

# Returns distribution
ax4 = axes[1, 1]
ax4.hist(market_data['return']*100, bins=50, edgecolor='black', alpha=0.7, density=True)
x = np.linspace(market_data['return'].min()*100, market_data['return'].max()*100, 100)
ax4.plot(x, stats.norm.pdf(x, market_data['return'].mean()*100, market_data['return'].std()*100), 
         'r-', linewidth=2, label='Normal')
ax4.set_xlabel('Return (%)')
ax4.set_ylabel('Density')
ax4.set_title('Return Distribution')
ax4.legend()

plt.tight_layout()
plt.show()

## 3. Kyle's Lambda - Linear Price Impact

### Kyle (1985) Model

Kyle's model introduces **lambda (λ)** - the price impact coefficient:

$$\Delta P = \lambda \cdot Q + \epsilon$$

Where:
- $\Delta P$ = price change
- $\lambda$ = Kyle's lambda (market depth inverse)
- $Q$ = signed order flow
- $\epsilon$ = noise

### Interpretation

- **Higher λ**: Less liquid market, larger impact
- **Lower λ**: More liquid market, smaller impact
- λ measures **information content** of order flow

In [None]:
class KyleLambdaEstimator:
    """
    Estimate Kyle's Lambda from trade data.
    """
    
    def __init__(self):
        self.lambda_ = None
        self.intercept_ = None
        self.r_squared_ = None
        self.model = None
    
    def fit(self, price_changes, signed_volumes):
        """
        Estimate lambda using OLS regression.
        
        Parameters:
        -----------
        price_changes : array-like
            Price changes (ΔP)
        signed_volumes : array-like
            Signed trade volumes (positive=buy, negative=sell)
        """
        X = np.array(signed_volumes).reshape(-1, 1)
        y = np.array(price_changes)
        
        self.model = LinearRegression()
        self.model.fit(X, y)
        
        self.lambda_ = self.model.coef_[0]
        self.intercept_ = self.model.intercept_
        self.r_squared_ = self.model.score(X, y)
        
        return self
    
    def predict_impact(self, volume):
        """
        Predict price impact for a given volume.
        """
        return self.lambda_ * volume + self.intercept_
    
    def summary(self):
        """Print estimation summary."""
        print("Kyle's Lambda Estimation")
        print("="*40)
        print(f"Lambda (λ): {self.lambda_:.2e}")
        print(f"Intercept:  {self.intercept_:.6f}")
        print(f"R-squared:  {self.r_squared_:.4f}")
        print(f"\nInterpretation:")
        print(f"  - A 10,000 share buy moves price by {self.predict_impact(10000):.4f}")
        print(f"  - A 10,000 share sell moves price by {self.predict_impact(-10000):.4f}")

# Estimate Kyle's Lambda
kyle_estimator = KyleLambdaEstimator()
kyle_estimator.fit(market_data['price_change'], market_data['signed_size'])
kyle_estimator.summary()

In [None]:
# Visualize Kyle's Lambda fit
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Scatter with regression line
ax1 = axes[0]
ax1.scatter(market_data['signed_size'], market_data['price_change'], 
            alpha=0.3, s=10, label='Observed')

# Regression line
x_line = np.linspace(market_data['signed_size'].min(), market_data['signed_size'].max(), 100)
y_line = kyle_estimator.predict_impact(x_line)
ax1.plot(x_line, y_line, 'r-', linewidth=2, label=f'λ = {kyle_estimator.lambda_:.2e}')

ax1.axhline(y=0, color='gray', linestyle='--', alpha=0.5)
ax1.axvline(x=0, color='gray', linestyle='--', alpha=0.5)
ax1.set_xlabel('Signed Volume')
ax1.set_ylabel('Price Change')
ax1.set_title("Kyle's Lambda: Linear Price Impact")
ax1.legend()

# Binned analysis
ax2 = axes[1]
bins = pd.qcut(market_data['signed_size'], q=20, duplicates='drop')
binned = market_data.groupby(bins).agg({
    'signed_size': 'mean',
    'price_change': 'mean'
})

ax2.scatter(binned['signed_size'], binned['price_change'], s=100, alpha=0.8, label='Binned Means')
ax2.plot(x_line, y_line, 'r-', linewidth=2, alpha=0.7, label='Linear Fit')
ax2.axhline(y=0, color='gray', linestyle='--', alpha=0.5)
ax2.axvline(x=0, color='gray', linestyle='--', alpha=0.5)
ax2.set_xlabel('Mean Signed Volume (by bin)')
ax2.set_ylabel('Mean Price Change')
ax2.set_title('Binned Analysis: Clearer Impact Pattern')
ax2.legend()

plt.tight_layout()
plt.show()

## 4. Square-Root Law of Market Impact

### Empirical Finding

Research shows that market impact follows a **square-root law**:

$$\text{Impact} = \eta \cdot \sigma \cdot \text{sign}(Q) \cdot \sqrt{\frac{|Q|}{V}}$$

Where:
- $\eta$ = impact coefficient
- $\sigma$ = volatility
- $Q$ = order size
- $V$ = average daily volume

### Why Square-Root?

1. **Order splitting**: Large orders are broken into smaller pieces
2. **Market resilience**: Price partially reverts between trades
3. **Information leakage**: Market learns gradually

In [None]:
class SquareRootImpactModel:
    """
    Square-root market impact model.
    
    Impact = η * σ * sign(Q) * sqrt(|Q|/V)
    """
    
    def __init__(self):
        self.eta = None
        self.sigma = None
        self.daily_volume = None
        self.r_squared_ = None
    
    def fit(self, price_changes, signed_volumes, volatility, daily_volume):
        """
        Fit the square-root impact model.
        """
        self.sigma = volatility
        self.daily_volume = daily_volume
        
        # Transform: sqrt(|Q|/V) * sign(Q)
        Q = np.array(signed_volumes)
        X = np.sign(Q) * np.sqrt(np.abs(Q) / daily_volume)
        y = np.array(price_changes)
        
        # Fit: y = η * σ * X
        # Rearrange: y / σ = η * X
        y_scaled = y / volatility
        
        # OLS estimation
        X_reshaped = X.reshape(-1, 1)
        model = LinearRegression(fit_intercept=False)
        model.fit(X_reshaped, y_scaled)
        
        self.eta = model.coef_[0]
        
        # R-squared
        y_pred = self.predict_impact(signed_volumes)
        ss_res = np.sum((price_changes - y_pred)**2)
        ss_tot = np.sum((price_changes - np.mean(price_changes))**2)
        self.r_squared_ = 1 - (ss_res / ss_tot)
        
        return self
    
    def predict_impact(self, volume):
        """
        Predict impact for given volume.
        """
        Q = np.array(volume)
        impact = self.eta * self.sigma * np.sign(Q) * np.sqrt(np.abs(Q) / self.daily_volume)
        return impact
    
    def summary(self):
        """Print model summary."""
        print("Square-Root Impact Model")
        print("="*40)
        print(f"Impact Coefficient (η): {self.eta:.4f}")
        print(f"Volatility (σ):         {self.sigma:.4f}")
        print(f"Daily Volume:           {self.daily_volume:,.0f}")
        print(f"R-squared:              {self.r_squared_:.4f}")
        print(f"\nExample impacts:")
        for pct in [0.1, 0.5, 1.0, 5.0]:
            vol = int(pct/100 * self.daily_volume)
            impact = self.predict_impact(vol)
            print(f"  - {pct}% of ADV ({vol:,} shares): {impact:.4f}")

# Estimate volatility from returns
volatility = market_data['return'].std() * np.sqrt(500)  # Per-day volatility
daily_vol = market_data['daily_volume'].iloc[0]

# Fit square-root model
sqrt_model = SquareRootImpactModel()
sqrt_model.fit(market_data['price_change'], market_data['signed_size'], 
               volatility, daily_vol)
sqrt_model.summary()

In [None]:
# Compare Linear vs Square-Root models
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Sort by volume for plotting
sorted_data = market_data.sort_values('signed_size')

# Linear predictions
linear_pred = kyle_estimator.predict_impact(sorted_data['signed_size'])

# Square-root predictions
sqrt_pred = sqrt_model.predict_impact(sorted_data['signed_size'])

ax1 = axes[0]
ax1.scatter(sorted_data['signed_size'], sorted_data['price_change'], 
            alpha=0.2, s=5, label='Observed')
ax1.plot(sorted_data['signed_size'], linear_pred, 'r-', 
         linewidth=2, label=f'Linear (R²={kyle_estimator.r_squared_:.3f})')
ax1.plot(sorted_data['signed_size'], sqrt_pred, 'g-', 
         linewidth=2, label=f'Square-Root (R²={sqrt_model.r_squared_:.3f})')
ax1.set_xlabel('Signed Volume')
ax1.set_ylabel('Price Change')
ax1.set_title('Linear vs Square-Root Impact Models')
ax1.legend()

# Focus on large trades (where difference is clearer)
ax2 = axes[1]
large_trades = market_data[market_data['size'] > market_data['size'].quantile(0.9)]
large_sorted = large_trades.sort_values('signed_size')

ax2.scatter(large_sorted['signed_size'], large_sorted['price_change'], 
            alpha=0.5, s=20, label='Large Trades')
ax2.plot(large_sorted['signed_size'], 
         kyle_estimator.predict_impact(large_sorted['signed_size']), 
         'r-', linewidth=2, label='Linear')
ax2.plot(large_sorted['signed_size'], 
         sqrt_model.predict_impact(large_sorted['signed_size']), 
         'g-', linewidth=2, label='Square-Root')
ax2.set_xlabel('Signed Volume')
ax2.set_ylabel('Price Change')
ax2.set_title('Model Comparison: Large Trades Only')
ax2.legend()

plt.tight_layout()
plt.show()

## 5. Almgren-Chriss Market Impact Model

### The Model (Almgren & Chriss, 2000)

Separates impact into **temporary** and **permanent** components:

$$\text{Total Impact} = \underbrace{g(v)}_\text{permanent} + \underbrace{h(v)}_\text{temporary}$$

### Functional Forms

**Permanent Impact** (per unit time):
$$g(v) = \gamma \cdot v$$

**Temporary Impact** (execution price vs mid):
$$h(v) = \epsilon \cdot \text{sign}(v) + \eta \cdot v$$

Where:
- $v$ = trading rate (shares per unit time)
- $\gamma$ = permanent impact coefficient
- $\epsilon$ = fixed cost (half spread)
- $\eta$ = temporary impact coefficient

In [None]:
class AlmgrenChrissModel:
    """
    Almgren-Chriss Market Impact Model.
    
    Decomposes impact into permanent and temporary components.
    """
    
    def __init__(self, sigma, daily_volume, price):
        """
        Initialize model parameters.
        
        Parameters:
        -----------
        sigma : float
            Daily volatility
        daily_volume : float
            Average daily volume
        price : float
            Current price
        """
        self.sigma = sigma
        self.V = daily_volume
        self.S = price
        
        # Impact parameters (to be estimated or calibrated)
        self.gamma = None  # Permanent impact
        self.eta = None    # Temporary impact (linear)
        self.epsilon = None  # Fixed cost (spread)
    
    def fit(self, trades_df):
        """
        Estimate model parameters from trade data.
        """
        # Estimate spread from data (fixed cost)
        # Using price changes at direction changes
        direction_changes = trades_df['direction'].diff().abs() > 0
        spread_estimate = trades_df.loc[direction_changes, 'price_change'].abs().median()
        self.epsilon = spread_estimate / 2
        
        # For permanent impact, look at price drift in direction of trades
        # Group into windows and measure drift
        window_size = 50
        n_windows = len(trades_df) // window_size
        
        window_flows = []
        window_drifts = []
        
        for i in range(n_windows):
            start = i * window_size
            end = start + window_size
            window = trades_df.iloc[start:end]
            
            # Net order flow in window
            net_flow = window['signed_size'].sum()
            
            # Price drift
            price_drift = window['price'].iloc[-1] - window['price'].iloc[0]
            
            window_flows.append(net_flow)
            window_drifts.append(price_drift)
        
        # Regress drift on flow for permanent impact
        X = np.array(window_flows).reshape(-1, 1)
        y = np.array(window_drifts)
        model = LinearRegression()
        model.fit(X, y)
        self.gamma = model.coef_[0]
        
        # For temporary impact, look at immediate price move vs volume
        # Remove the permanent component first
        temp_impact = trades_df['price_change'] - self.gamma * trades_df['signed_size']
        
        # Regress on signed volume
        X = trades_df['signed_size'].values.reshape(-1, 1)
        y = temp_impact.values
        model = LinearRegression(fit_intercept=False)
        model.fit(X, y)
        self.eta = model.coef_[0]
        
        return self
    
    def permanent_impact(self, volume):
        """Calculate permanent impact."""
        return self.gamma * volume
    
    def temporary_impact(self, volume):
        """Calculate temporary impact."""
        return self.epsilon * np.sign(volume) + self.eta * volume
    
    def total_impact(self, volume):
        """Calculate total market impact."""
        return self.permanent_impact(volume) + self.temporary_impact(volume)
    
    def execution_cost(self, total_shares, n_trades, side='buy'):
        """
        Calculate expected execution cost for an order.
        
        Parameters:
        -----------
        total_shares : int
            Total shares to execute
        n_trades : int
            Number of trades to split into
        side : str
            'buy' or 'sell'
        """
        direction = 1 if side == 'buy' else -1
        shares_per_trade = total_shares / n_trades
        
        total_cost = 0
        cumulative_permanent = 0
        
        for i in range(n_trades):
            # Permanent impact accumulates
            cumulative_permanent += self.permanent_impact(shares_per_trade * direction)
            
            # Temporary impact at each trade
            temp = self.temporary_impact(shares_per_trade * direction)
            
            # Cost at this trade
            trade_cost = (cumulative_permanent + temp) * shares_per_trade
            total_cost += trade_cost
        
        return abs(total_cost)
    
    def summary(self):
        """Print model summary."""
        print("Almgren-Chriss Impact Model")
        print("="*50)
        print(f"Market Parameters:")
        print(f"  - Volatility (σ):    {self.sigma:.4f}")
        print(f"  - Daily Volume:      {self.V:,.0f}")
        print(f"  - Price:             ${self.S:.2f}")
        print(f"\nImpact Parameters:")
        print(f"  - Permanent (γ):     {self.gamma:.2e}")
        print(f"  - Temporary (η):     {self.eta:.2e}")
        print(f"  - Fixed Cost (ε):    {self.epsilon:.4f}")

# Fit Almgren-Chriss model
ac_model = AlmgrenChrissModel(
    sigma=volatility,
    daily_volume=daily_vol,
    price=market_data['price'].mean()
)
ac_model.fit(market_data)
ac_model.summary()

In [None]:
# Visualize permanent vs temporary impact
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

volumes = np.linspace(-50000, 50000, 100)

# Permanent impact
ax1 = axes[0]
perm_impact = [ac_model.permanent_impact(v) for v in volumes]
ax1.plot(volumes, perm_impact, 'b-', linewidth=2)
ax1.axhline(y=0, color='gray', linestyle='--', alpha=0.5)
ax1.axvline(x=0, color='gray', linestyle='--', alpha=0.5)
ax1.set_xlabel('Volume')
ax1.set_ylabel('Price Impact')
ax1.set_title('Permanent Impact: g(v) = γv')
ax1.fill_between(volumes, 0, perm_impact, alpha=0.3)

# Temporary impact
ax2 = axes[1]
temp_impact = [ac_model.temporary_impact(v) for v in volumes]
ax2.plot(volumes, temp_impact, 'r-', linewidth=2)
ax2.axhline(y=0, color='gray', linestyle='--', alpha=0.5)
ax2.axvline(x=0, color='gray', linestyle='--', alpha=0.5)
ax2.set_xlabel('Volume')
ax2.set_ylabel('Price Impact')
ax2.set_title('Temporary Impact: h(v) = ε·sign(v) + ηv')
ax2.fill_between(volumes, 0, temp_impact, alpha=0.3, color='red')

# Total impact
ax3 = axes[2]
total_impact = [ac_model.total_impact(v) for v in volumes]
ax3.plot(volumes, total_impact, 'g-', linewidth=2, label='Total')
ax3.plot(volumes, perm_impact, 'b--', linewidth=1, alpha=0.7, label='Permanent')
ax3.plot(volumes, temp_impact, 'r--', linewidth=1, alpha=0.7, label='Temporary')
ax3.axhline(y=0, color='gray', linestyle='--', alpha=0.5)
ax3.axvline(x=0, color='gray', linestyle='--', alpha=0.5)
ax3.set_xlabel('Volume')
ax3.set_ylabel('Price Impact')
ax3.set_title('Total Impact Decomposition')
ax3.legend()

plt.tight_layout()
plt.show()

## 6. Impact Decay Analysis

### Temporary Impact Decay

After a trade, temporary impact decays over time. Understanding this decay is crucial for:
- Optimal execution timing
- Estimating reversion opportunities

In [None]:
def analyze_impact_decay(trades_df, lags=20):
    """
    Analyze how price reverts after trades (impact decay).
    
    Parameters:
    -----------
    trades_df : DataFrame
        Trade data
    lags : int
        Number of trades to look forward
    """
    # Focus on large trades
    large_trades = trades_df[trades_df['size'] > trades_df['size'].quantile(0.9)].copy()
    
    # Calculate forward returns at various lags
    decay_results = []
    
    for lag in range(1, lags + 1):
        # Forward price change
        forward_change = trades_df['price'].shift(-lag) - trades_df['price']
        large_trades[f'fwd_{lag}'] = forward_change.loc[large_trades.index]
        
        # Correlation with trade direction
        corr = large_trades[f'fwd_{lag}'].corr(large_trades['direction'])
        
        # Mean reversion (negative correlation = reversion)
        mean_fwd_buy = large_trades[large_trades['direction'] > 0][f'fwd_{lag}'].mean()
        mean_fwd_sell = large_trades[large_trades['direction'] < 0][f'fwd_{lag}'].mean()
        
        decay_results.append({
            'lag': lag,
            'correlation': corr,
            'mean_fwd_buy': mean_fwd_buy,
            'mean_fwd_sell': mean_fwd_sell
        })
    
    return pd.DataFrame(decay_results)

# Analyze decay
decay_df = analyze_impact_decay(market_data)

# Plot decay
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Correlation decay
ax1 = axes[0]
ax1.bar(decay_df['lag'], decay_df['correlation'], alpha=0.7)
ax1.axhline(y=0, color='red', linestyle='--')
ax1.set_xlabel('Lag (trades)')
ax1.set_ylabel('Correlation')
ax1.set_title('Price-Direction Correlation Decay')

# Forward returns
ax2 = axes[1]
ax2.plot(decay_df['lag'], decay_df['mean_fwd_buy'], 'g-', marker='o', 
         linewidth=2, label='After Large Buy')
ax2.plot(decay_df['lag'], decay_df['mean_fwd_sell'], 'r-', marker='o', 
         linewidth=2, label='After Large Sell')
ax2.axhline(y=0, color='gray', linestyle='--')
ax2.set_xlabel('Lag (trades)')
ax2.set_ylabel('Mean Forward Price Change')
ax2.set_title('Price Reversion After Large Trades')
ax2.legend()

plt.tight_layout()
plt.show()

print("\nImpact Decay Analysis:")
print(decay_df.head(10).to_string(index=False))

## 7. Execution Cost Analysis

### Trade-Off: Speed vs Cost

- **Trade quickly**: High temporary impact, low timing risk
- **Trade slowly**: Low temporary impact, high timing risk

The optimal execution strategy balances these factors.

In [None]:
def analyze_execution_costs(model, total_shares, n_trades_range, side='buy'):
    """
    Analyze execution costs for different trade schedules.
    
    Parameters:
    -----------
    model : AlmgrenChrissModel
        Fitted impact model
    total_shares : int
        Total shares to execute
    n_trades_range : range
        Range of number of trades to analyze
    side : str
        'buy' or 'sell'
    """
    results = []
    
    for n_trades in n_trades_range:
        cost = model.execution_cost(total_shares, n_trades, side)
        cost_bps = (cost / (total_shares * model.S)) * 10000  # In basis points
        
        results.append({
            'n_trades': n_trades,
            'shares_per_trade': total_shares / n_trades,
            'total_cost': cost,
            'cost_bps': cost_bps,
            'pct_adv': (total_shares / n_trades) / model.V * 100
        })
    
    return pd.DataFrame(results)

# Analyze for a 100,000 share order
order_size = 100000
execution_analysis = analyze_execution_costs(
    ac_model, 
    order_size, 
    range(1, 51),
    side='buy'
)

# Plot
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Cost vs number of trades
ax1 = axes[0]
ax1.plot(execution_analysis['n_trades'], execution_analysis['cost_bps'], 
         'b-', linewidth=2, marker='o', markersize=4)
ax1.set_xlabel('Number of Trades')
ax1.set_ylabel('Execution Cost (bps)')
ax1.set_title(f'Execution Cost vs Trade Schedule\n(Order: {order_size:,} shares)')
ax1.axhline(y=execution_analysis['cost_bps'].min(), color='green', 
            linestyle='--', alpha=0.7, label=f"Min: {execution_analysis['cost_bps'].min():.1f} bps")
ax1.legend()

# Cost vs participation rate
ax2 = axes[1]
ax2.plot(execution_analysis['pct_adv'], execution_analysis['cost_bps'], 
         'r-', linewidth=2, marker='o', markersize=4)
ax2.set_xlabel('Participation Rate (% of ADV per trade)')
ax2.set_ylabel('Execution Cost (bps)')
ax2.set_title('Cost vs Participation Rate')
ax2.invert_xaxis()  # Higher participation = fewer trades

plt.tight_layout()
plt.show()

# Optimal execution
optimal_idx = execution_analysis['cost_bps'].idxmin()
optimal = execution_analysis.loc[optimal_idx]
print(f"\nOptimal Execution for {order_size:,} shares:")
print(f"  - Number of trades: {optimal['n_trades']:.0f}")
print(f"  - Shares per trade: {optimal['shares_per_trade']:,.0f}")
print(f"  - Cost: {optimal['cost_bps']:.1f} bps (${optimal['total_cost']:,.2f})")
print(f"  - Participation: {optimal['pct_adv']:.2f}% of ADV per trade")

## 8. Non-Linear Impact Models

### Power-Law Impact

More general form that nests both linear and square-root:

$$\text{Impact} = \alpha \cdot \text{sign}(Q) \cdot |Q|^\beta$$

Where:
- $\beta = 1$: Linear impact
- $\beta = 0.5$: Square-root impact
- Empirically: $\beta \approx 0.5 - 0.7$

In [None]:
class PowerLawImpactModel:
    """
    Power-law market impact model.
    
    Impact = α * sign(Q) * |Q|^β
    """
    
    def __init__(self):
        self.alpha = None
        self.beta = None
        self.r_squared_ = None
    
    def fit(self, price_changes, volumes):
        """
        Fit power-law model using non-linear regression.
        """
        # Only use positive volumes for fitting (then apply sign)
        pos_mask = np.array(volumes) > 0
        neg_mask = np.array(volumes) < 0
        
        # Combine: use |volume| and sign-adjusted price_change
        abs_vol = np.abs(volumes)
        signed_impact = np.array(price_changes) * np.sign(volumes)
        
        # Filter zero volumes
        valid = abs_vol > 0
        abs_vol = abs_vol[valid]
        signed_impact = signed_impact[valid]
        
        # Log-transform for linear regression
        # log(impact) = log(α) + β * log(|Q|)
        # Only use positive impacts for log
        pos_impact = signed_impact > 0
        
        log_vol = np.log(abs_vol[pos_impact])
        log_impact = np.log(signed_impact[pos_impact])
        
        # Linear regression in log space
        model = LinearRegression()
        model.fit(log_vol.reshape(-1, 1), log_impact)
        
        self.alpha = np.exp(model.intercept_)
        self.beta = model.coef_[0]
        
        # Calculate R-squared in original space
        y_pred = self.predict_impact(volumes)
        y_true = np.array(price_changes)
        ss_res = np.sum((y_true - y_pred)**2)
        ss_tot = np.sum((y_true - np.mean(y_true))**2)
        self.r_squared_ = 1 - (ss_res / ss_tot)
        
        return self
    
    def predict_impact(self, volume):
        """Predict impact."""
        Q = np.array(volume)
        return self.alpha * np.sign(Q) * np.power(np.abs(Q), self.beta)
    
    def summary(self):
        """Print summary."""
        print("Power-Law Impact Model")
        print("="*40)
        print(f"Alpha (α):  {self.alpha:.2e}")
        print(f"Beta (β):   {self.beta:.4f}")
        print(f"R-squared:  {self.r_squared_:.4f}")
        print(f"\nInterpretation:")
        if self.beta < 0.6:
            print("  - β < 0.6: Concave impact (closer to square-root)")
        elif self.beta > 0.8:
            print("  - β > 0.8: Nearly linear impact")
        else:
            print(f"  - β ≈ {self.beta:.2f}: Between linear and square-root")

# Fit power-law model
power_model = PowerLawImpactModel()
power_model.fit(market_data['price_change'], market_data['signed_size'])
power_model.summary()

In [None]:
# Compare all models
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

volumes = np.linspace(1000, 50000, 100)

# Model predictions for buys
ax1 = axes[0]
ax1.plot(volumes, kyle_estimator.predict_impact(volumes), 
         'b-', linewidth=2, label=f'Linear (Kyle) R²={kyle_estimator.r_squared_:.3f}')
ax1.plot(volumes, sqrt_model.predict_impact(volumes), 
         'g-', linewidth=2, label=f'Square-Root R²={sqrt_model.r_squared_:.3f}')
ax1.plot(volumes, power_model.predict_impact(volumes), 
         'r-', linewidth=2, label=f'Power-Law (β={power_model.beta:.2f}) R²={power_model.r_squared_:.3f}')
ax1.set_xlabel('Buy Volume')
ax1.set_ylabel('Price Impact')
ax1.set_title('Impact Model Comparison: Buys')
ax1.legend()

# Log-log plot (power law appears linear)
ax2 = axes[1]
ax2.loglog(volumes, kyle_estimator.predict_impact(volumes), 
           'b-', linewidth=2, label='Linear')
ax2.loglog(volumes, sqrt_model.predict_impact(volumes), 
           'g-', linewidth=2, label='Square-Root')
ax2.loglog(volumes, power_model.predict_impact(volumes), 
           'r-', linewidth=2, label=f'Power-Law (β={power_model.beta:.2f})')
ax2.set_xlabel('Buy Volume (log)')
ax2.set_ylabel('Price Impact (log)')
ax2.set_title('Log-Log Plot: Power Law is Linear')
ax2.legend()

plt.tight_layout()
plt.show()

## 9. Implementation Shortfall

### Definition

**Implementation Shortfall (IS)** = Difference between paper return and actual return

$$IS = \text{Paper P\&L} - \text{Actual P\&L}$$

### Components

1. **Market Impact**: Price moved against us
2. **Timing Cost**: Price drifted during execution
3. **Opportunity Cost**: Shares not executed
4. **Commission**: Explicit trading costs

In [None]:
class ImplementationShortfallAnalyzer:
    """
    Analyze implementation shortfall from execution data.
    """
    
    def __init__(self):
        self.results = None
    
    def analyze_order(self, decision_price, executions, side='buy'):
        """
        Analyze implementation shortfall for an order.
        
        Parameters:
        -----------
        decision_price : float
            Price when decision was made
        executions : list of dict
            List of {'price': float, 'shares': int, 'time': int}
        side : str
            'buy' or 'sell'
        """
        direction = 1 if side == 'buy' else -1
        
        total_shares = sum(e['shares'] for e in executions)
        avg_price = sum(e['price'] * e['shares'] for e in executions) / total_shares
        final_price = executions[-1]['price']
        
        # Paper P&L: If we could have bought at decision price
        paper_cost = decision_price * total_shares
        
        # Actual cost
        actual_cost = sum(e['price'] * e['shares'] for e in executions)
        
        # Implementation Shortfall
        if side == 'buy':
            shortfall = actual_cost - paper_cost
        else:
            shortfall = paper_cost - actual_cost
        
        # Decomposition
        arrival_price = executions[0]['price']  # Price at arrival
        
        # Delay cost (decision to arrival)
        delay_cost = (arrival_price - decision_price) * total_shares * direction
        
        # Market impact (arrival to avg execution)
        impact_cost = (avg_price - arrival_price) * total_shares * direction
        
        # Timing cost (residual)
        timing_cost = shortfall - delay_cost - impact_cost
        
        self.results = {
            'decision_price': decision_price,
            'arrival_price': arrival_price,
            'avg_execution_price': avg_price,
            'total_shares': total_shares,
            'total_shortfall': shortfall,
            'shortfall_bps': (shortfall / paper_cost) * 10000,
            'delay_cost': delay_cost,
            'impact_cost': impact_cost,
            'timing_cost': timing_cost
        }
        
        return self.results
    
    def report(self):
        """Print IS report."""
        if self.results is None:
            print("No analysis run yet.")
            return
        
        r = self.results
        print("Implementation Shortfall Analysis")
        print("="*50)
        print(f"\nPrices:")
        print(f"  - Decision Price:  ${r['decision_price']:.2f}")
        print(f"  - Arrival Price:   ${r['arrival_price']:.2f}")
        print(f"  - Avg Execution:   ${r['avg_execution_price']:.2f}")
        print(f"\nExecution:")
        print(f"  - Total Shares:    {r['total_shares']:,}")
        print(f"\nCosts:")
        print(f"  - Total IS:        ${r['total_shortfall']:,.2f} ({r['shortfall_bps']:.1f} bps)")
        print(f"  - Delay Cost:      ${r['delay_cost']:,.2f}")
        print(f"  - Impact Cost:     ${r['impact_cost']:,.2f}")
        print(f"  - Timing Cost:     ${r['timing_cost']:,.2f}")

# Simulate an order execution
np.random.seed(42)

decision_price = 100.00
order_shares = 50000
n_executions = 10

# Simulate executions with impact
executions = []
current_price = decision_price + 0.05  # Small delay cost

for i in range(n_executions):
    shares = order_shares // n_executions
    # Price increases due to impact
    current_price += 0.02 * np.sqrt(shares/1000)  # Impact
    current_price += np.random.normal(0, 0.01)    # Noise
    
    executions.append({
        'price': current_price,
        'shares': shares,
        'time': i
    })

# Analyze
is_analyzer = ImplementationShortfallAnalyzer()
is_analyzer.analyze_order(decision_price, executions, side='buy')
is_analyzer.report()

In [None]:
# Visualize execution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Execution prices over time
ax1 = axes[0]
times = [e['time'] for e in executions]
prices = [e['price'] for e in executions]
shares = [e['shares'] for e in executions]

ax1.scatter(times, prices, s=[s/100 for s in shares], alpha=0.7, label='Executions')
ax1.axhline(y=decision_price, color='green', linestyle='--', label='Decision Price')
ax1.axhline(y=is_analyzer.results['avg_execution_price'], color='red', 
            linestyle='--', label='Avg Execution')
ax1.fill_between(times, decision_price, prices, alpha=0.3, color='red')
ax1.set_xlabel('Trade Number')
ax1.set_ylabel('Price')
ax1.set_title('Order Execution Path')
ax1.legend()

# Cost breakdown
ax2 = axes[1]
r = is_analyzer.results
costs = [r['delay_cost'], r['impact_cost'], r['timing_cost']]
labels = ['Delay Cost', 'Impact Cost', 'Timing Cost']
colors = ['#ff9999', '#ff6666', '#ff3333']

bars = ax2.bar(labels, costs, color=colors, edgecolor='black')
ax2.axhline(y=0, color='gray', linestyle='-')
ax2.set_ylabel('Cost ($)')
ax2.set_title('Implementation Shortfall Decomposition')

# Add value labels
for bar, cost in zip(bars, costs):
    height = bar.get_height()
    ax2.text(bar.get_x() + bar.get_width()/2., height,
             f'${cost:,.0f}', ha='center', va='bottom', fontsize=12)

plt.tight_layout()
plt.show()

## 10. Practical Application: Impact-Aware Backtesting

### Why It Matters

Backtests that ignore market impact **overestimate** strategy performance. We need to incorporate realistic impact models.

In [None]:
class ImpactAwareBacktester:
    """
    Backtester that accounts for market impact.
    """
    
    def __init__(self, impact_model, daily_volume, volatility):
        """
        Initialize with impact model.
        
        Parameters:
        -----------
        impact_model : object
            Impact model with predict_impact method
        daily_volume : float
            Average daily volume
        volatility : float
            Daily volatility
        """
        self.impact_model = impact_model
        self.daily_volume = daily_volume
        self.volatility = volatility
    
    def calculate_execution_price(self, mid_price, trade_size, side):
        """
        Calculate realistic execution price including impact.
        """
        direction = 1 if side == 'buy' else -1
        signed_size = trade_size * direction
        
        impact = self.impact_model.predict_impact(signed_size)
        
        # Execution price is mid + impact
        execution_price = mid_price + impact
        
        return execution_price
    
    def backtest_strategy(self, prices, signals, position_size=10000):
        """
        Backtest a strategy with and without impact.
        
        Parameters:
        -----------
        prices : array
            Price series
        signals : array
            Trading signals (+1 buy, -1 sell, 0 hold)
        position_size : int
            Shares per trade
        """
        n = len(prices)
        
        # No impact backtest
        no_impact_pnl = np.zeros(n)
        position = 0
        entry_price = 0
        
        for i in range(1, n):
            if signals[i] != 0 and position == 0:
                # Enter position
                position = signals[i] * position_size
                entry_price = prices[i]
            elif position != 0 and signals[i] == -np.sign(position):
                # Exit position
                no_impact_pnl[i] = position * (prices[i] - entry_price)
                position = 0
        
        # With impact backtest
        impact_pnl = np.zeros(n)
        position = 0
        entry_price = 0
        
        for i in range(1, n):
            if signals[i] != 0 and position == 0:
                # Enter position with impact
                side = 'buy' if signals[i] > 0 else 'sell'
                exec_price = self.calculate_execution_price(
                    prices[i], position_size, side
                )
                position = signals[i] * position_size
                entry_price = exec_price
            elif position != 0 and signals[i] == -np.sign(position):
                # Exit position with impact
                side = 'sell' if position > 0 else 'buy'
                exec_price = self.calculate_execution_price(
                    prices[i], abs(position), side
                )
                impact_pnl[i] = position * (exec_price - entry_price) / abs(position) * position_size
                position = 0
        
        return {
            'no_impact_pnl': np.cumsum(no_impact_pnl),
            'impact_pnl': np.cumsum(impact_pnl),
            'total_no_impact': no_impact_pnl.sum(),
            'total_impact': impact_pnl.sum(),
            'impact_cost': no_impact_pnl.sum() - impact_pnl.sum()
        }

# Generate a simple momentum signal
np.random.seed(42)
n_days = 252
prices = 100 * np.exp(np.cumsum(np.random.normal(0.0002, 0.02, n_days)))

# Simple momentum signal: buy when 5-day return > 0, sell when < 0
returns_5d = pd.Series(prices).pct_change(5).values
signals = np.zeros(n_days)
signals[returns_5d > 0.01] = 1   # Buy signal
signals[returns_5d < -0.01] = -1  # Sell signal

# Run backtest
backtester = ImpactAwareBacktester(
    impact_model=sqrt_model,
    daily_volume=daily_vol,
    volatility=volatility
)

results = backtester.backtest_strategy(prices, signals, position_size=50000)

print("Backtest Results")
print("="*50)
print(f"Without Impact: ${results['total_no_impact']:,.2f}")
print(f"With Impact:    ${results['total_impact']:,.2f}")
print(f"Impact Cost:    ${results['impact_cost']:,.2f}")
print(f"Cost as % of Gross: {results['impact_cost']/max(1, results['total_no_impact'])*100:.1f}%")

In [None]:
# Visualize backtest comparison
fig, axes = plt.subplots(2, 1, figsize=(14, 8))

# Cumulative P&L
ax1 = axes[0]
ax1.plot(results['no_impact_pnl'], 'b-', linewidth=2, label='Without Impact')
ax1.plot(results['impact_pnl'], 'r-', linewidth=2, label='With Impact')
ax1.fill_between(range(len(results['no_impact_pnl'])), 
                  results['impact_pnl'], results['no_impact_pnl'],
                  alpha=0.3, color='red', label='Impact Cost')
ax1.set_xlabel('Day')
ax1.set_ylabel('Cumulative P&L ($)')
ax1.set_title('Impact-Aware Backtesting: P&L Comparison')
ax1.legend()
ax1.axhline(y=0, color='gray', linestyle='--')

# Price and signals
ax2 = axes[1]
ax2.plot(prices, 'k-', linewidth=1, alpha=0.7, label='Price')
buy_signals = np.where(signals == 1)[0]
sell_signals = np.where(signals == -1)[0]
ax2.scatter(buy_signals, prices[buy_signals], marker='^', color='green', 
            s=100, label='Buy Signal', zorder=5)
ax2.scatter(sell_signals, prices[sell_signals], marker='v', color='red', 
            s=100, label='Sell Signal', zorder=5)
ax2.set_xlabel('Day')
ax2.set_ylabel('Price')
ax2.set_title('Price Series with Trading Signals')
ax2.legend()

plt.tight_layout()
plt.show()

## 11. Summary & Key Takeaways

### Market Impact Models Covered

| Model | Formula | Use Case |
|-------|---------|----------|
| **Kyle's Lambda** | $\Delta P = \lambda Q$ | Simple linear impact |
| **Square-Root** | Impact $\propto \sqrt{Q/V}$ | Empirically validated |
| **Almgren-Chriss** | Permanent + Temporary | Optimal execution |
| **Power-Law** | Impact $\propto |Q|^\beta$ | Flexible, general |

### Key Insights

1. **Market impact is significant**: Often the largest transaction cost
2. **Two components**: Permanent (information) and Temporary (liquidity)
3. **Square-root law**: Impact scales with $\sqrt{\text{volume}}$
4. **Impact decays**: Temporary impact reverts over time
5. **Backtesting matters**: Ignoring impact overestimates returns

### Best Practices

- Always include impact costs in strategy evaluation
- Use participation rate limits (e.g., < 5% of ADV)
- Split large orders to reduce impact
- Monitor implementation shortfall
- Recalibrate impact models regularly

## 12. Exercises

### Exercise 1: Kyle's Lambda Sensitivity
How does Kyle's lambda change with:
- Different volatility levels?
- Different daily volumes?

### Exercise 2: Optimal Trade Size
Given the Almgren-Chriss model, find the optimal number of trades to minimize total cost for a 200,000 share order.

### Exercise 3: Impact Decay
Fit an exponential decay model to the impact reversion data. What is the half-life?

### Exercise 4: Strategy Capacity
Given a strategy with 10 bps expected return per trade, what is the maximum position size before impact eliminates the alpha?

In [None]:
# Exercise space - try the exercises here!

# Exercise 4 starter:
def find_capacity_limit(impact_model, expected_alpha_bps=10, price=100):
    """
    Find maximum trade size where alpha > impact cost.
    """
    expected_alpha = expected_alpha_bps / 10000 * price
    
    for size in range(1000, 1000000, 1000):
        impact = abs(impact_model.predict_impact(size))
        if impact > expected_alpha:
            return size - 1000
    
    return 1000000  # Max tested

capacity = find_capacity_limit(sqrt_model, expected_alpha_bps=10)
print(f"Strategy capacity at 10 bps alpha: {capacity:,} shares")
print(f"As % of ADV: {capacity/daily_vol*100:.2f}%")

---

## References

1. **Kyle, A.S. (1985)** - "Continuous Auctions and Insider Trading"
2. **Almgren, R. & Chriss, N. (2000)** - "Optimal Execution of Portfolio Transactions"
3. **Bouchaud, J.P. et al. (2009)** - "How Markets Slowly Digest Changes in Supply and Demand"
4. **Gatheral, J. (2010)** - "No-Dynamic-Arbitrage and Market Impact"
5. **Cont, R. et al. (2014)** - "The Price Impact of Order Book Events"

---

**Next**: Day 3 - Order Book Dynamics & LOB Modeling