# Day 05: Hawkes Processes for Order Flow Modeling

## Week 21: Market Microstructure

---

## Learning Objectives

By the end of this notebook, you will:

1. **Understand Hawkes Processes**: Learn the mathematical foundation of self-exciting point processes
2. **Simulate Hawkes Processes**: Implement Ogata's thinning algorithm for simulation
3. **Estimate Parameters**: Use Maximum Likelihood Estimation (MLE) to fit Hawkes models
4. **Model Order Flow**: Apply Hawkes processes to real market microstructure data
5. **Analyze Market Dynamics**: Extract insights about order clustering and market activity

---

## Table of Contents

1. [Introduction to Hawkes Processes](#1-introduction)
2. [Mathematical Foundation](#2-math-foundation)
3. [Simulating Hawkes Processes](#3-simulation)
4. [Parameter Estimation](#4-estimation)
5. [Order Flow Modeling](#5-order-flow)
6. [Multivariate Hawkes for Buy/Sell Orders](#6-multivariate)
7. [Market Microstructure Applications](#7-applications)
8. [Summary & Interview Questions](#8-summary)

---

## 1. Introduction to Hawkes Processes <a id='1-introduction'></a>

### What is a Hawkes Process?

A **Hawkes process** is a self-exciting point process where the occurrence of an event increases the probability of future events. This is particularly relevant in financial markets where:

- **Order clustering**: One trade often triggers more trades
- **News events**: Breaking news causes bursts of trading activity
- **Algorithmic trading**: Algorithms react to each other, creating feedback loops
- **Market maker behavior**: Inventory adjustments lead to correlated order flow

### Why Hawkes Processes for Finance?

| Property | Financial Interpretation |
|----------|-------------------------|
| Self-excitation | One order triggers more orders |
| Clustering | Bursts of trading activity |
| Memory | Past events influence future arrival rates |
| Tractability | Closed-form likelihood for estimation |

In [None]:
# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from scipy.stats import expon, kstest
from typing import Tuple, List, Optional
import warnings
warnings.filterwarnings('ignore')

# Set style
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 11

# Set random seed for reproducibility
np.random.seed(42)

print("Libraries loaded successfully!")

---

## 2. Mathematical Foundation <a id='2-math-foundation'></a>

### Intensity Function

The **conditional intensity** of a univariate Hawkes process is:

$$\lambda(t) = \mu + \sum_{t_i < t} \phi(t - t_i)$$

where:
- $\mu > 0$: baseline intensity (exogenous rate)
- $\phi(\cdot)$: kernel function (excitation kernel)
- $t_i$: past event times

### Exponential Kernel

The most common choice is the **exponential kernel**:

$$\phi(t) = \alpha \cdot e^{-\beta t}$$

This gives the intensity:

$$\lambda(t) = \mu + \alpha \sum_{t_i < t} e^{-\beta(t - t_i)}$$

### Parameters

| Parameter | Description | Interpretation |
|-----------|-------------|----------------|
| $\mu$ | Baseline intensity | Rate of exogenous events |
| $\alpha$ | Jump size | Immediate impact of an event |
| $\beta$ | Decay rate | How quickly influence fades |
| $\alpha/\beta$ | Branching ratio | Average number of offspring per event |

### Stationarity Condition

For the process to be stationary: $\frac{\alpha}{\beta} < 1$

This ensures events don't trigger infinite cascades.

In [None]:
class HawkesProcess:
    """
    Univariate Hawkes Process with Exponential Kernel
    
    Intensity: λ(t) = μ + α * Σ exp(-β(t - t_i))
    """
    
    def __init__(self, mu: float, alpha: float, beta: float):
        """
        Initialize Hawkes process parameters.
        
        Parameters:
        -----------
        mu : float
            Baseline intensity (> 0)
        alpha : float
            Jump size (> 0)
        beta : float
            Decay rate (> 0)
        """
        assert mu > 0, "mu must be positive"
        assert alpha > 0, "alpha must be positive"
        assert beta > 0, "beta must be positive"
        
        self.mu = mu
        self.alpha = alpha
        self.beta = beta
        
    @property
    def branching_ratio(self) -> float:
        """Calculate the branching ratio (criticality parameter)."""
        return self.alpha / self.beta
    
    @property
    def is_stationary(self) -> bool:
        """Check if the process is stationary."""
        return self.branching_ratio < 1
    
    @property
    def expected_intensity(self) -> float:
        """Expected intensity in stationary state."""
        if not self.is_stationary:
            return np.inf
        return self.mu / (1 - self.branching_ratio)
    
    def intensity(self, t: float, events: np.ndarray) -> float:
        """
        Calculate intensity at time t given past events.
        
        Parameters:
        -----------
        t : float
            Current time
        events : np.ndarray
            Array of past event times
            
        Returns:
        --------
        float : Intensity at time t
        """
        past_events = events[events < t]
        if len(past_events) == 0:
            return self.mu
        
        excitation = self.alpha * np.sum(np.exp(-self.beta * (t - past_events)))
        return self.mu + excitation
    
    def intensity_path(self, t_grid: np.ndarray, events: np.ndarray) -> np.ndarray:
        """
        Calculate intensity over a time grid.
        
        Parameters:
        -----------
        t_grid : np.ndarray
            Time points to evaluate intensity
        events : np.ndarray
            Event times
            
        Returns:
        --------
        np.ndarray : Intensity values at each time point
        """
        return np.array([self.intensity(t, events) for t in t_grid])
    
    def __repr__(self):
        return (f"HawkesProcess(μ={self.mu:.4f}, α={self.alpha:.4f}, β={self.beta:.4f}, "
                f"branching_ratio={self.branching_ratio:.4f})")


# Example: Create a Hawkes process
hp = HawkesProcess(mu=0.5, alpha=0.8, beta=1.2)
print(hp)
print(f"Stationary: {hp.is_stationary}")
print(f"Expected intensity: {hp.expected_intensity:.4f}")

---

## 3. Simulating Hawkes Processes <a id='3-simulation'></a>

### Ogata's Thinning Algorithm

The most common method to simulate Hawkes processes is **Ogata's thinning algorithm**:

1. Start with current time $t = 0$ and empty event list
2. Calculate upper bound $\lambda^* \geq \lambda(t)$ for all $t$ in next interval
3. Generate candidate inter-arrival time $\Delta t \sim \text{Exp}(\lambda^*)$
4. Accept the event with probability $\lambda(t + \Delta t) / \lambda^*$
5. Update time and repeat

### Why Thinning Works

- We simulate a homogeneous Poisson process with rate $\lambda^*$
- Then "thin" (reject) events to match the true intensity
- The upper bound $\lambda^*$ can be set as $\lambda(t)$ right after an event (when it's highest)

In [None]:
def simulate_hawkes(mu: float, alpha: float, beta: float, 
                    T: float, seed: Optional[int] = None) -> np.ndarray:
    """
    Simulate a univariate Hawkes process using Ogata's thinning algorithm.
    
    Parameters:
    -----------
    mu : float
        Baseline intensity
    alpha : float
        Jump size
    beta : float
        Decay rate
    T : float
        End time for simulation
    seed : int, optional
        Random seed
        
    Returns:
    --------
    np.ndarray : Array of event times
    """
    if seed is not None:
        np.random.seed(seed)
    
    events = []
    t = 0
    
    # Initial intensity is just baseline
    lambda_current = mu
    
    while t < T:
        # Upper bound on intensity
        lambda_bar = lambda_current
        
        # Generate candidate inter-arrival time
        u = np.random.uniform()
        dt = -np.log(u) / lambda_bar  # Exponential with rate lambda_bar
        
        t = t + dt
        
        if t >= T:
            break
        
        # Calculate actual intensity at new time
        # Intensity decays between events
        if len(events) == 0:
            lambda_t = mu
        else:
            past_events = np.array(events)
            lambda_t = mu + alpha * np.sum(np.exp(-beta * (t - past_events)))
        
        # Accept with probability lambda_t / lambda_bar
        if np.random.uniform() <= lambda_t / lambda_bar:
            events.append(t)
            # Update current intensity (jumps up by alpha)
            lambda_current = lambda_t + alpha
        else:
            # Update current intensity (continued decay)
            lambda_current = lambda_t
    
    return np.array(events)


# Simulate a Hawkes process
mu, alpha, beta = 0.5, 0.8, 1.2
T = 100

events = simulate_hawkes(mu, alpha, beta, T, seed=42)
print(f"Number of events: {len(events)}")
print(f"First 10 event times: {events[:10].round(3)}")

In [None]:
def plot_hawkes_simulation(events: np.ndarray, mu: float, alpha: float, 
                           beta: float, T: float, title: str = "Hawkes Process Simulation"):
    """
    Visualize a Hawkes process simulation with intensity.
    """
    hp = HawkesProcess(mu, alpha, beta)
    
    # Create time grid for intensity
    t_grid = np.linspace(0, T, 1000)
    intensity = hp.intensity_path(t_grid, events)
    
    fig, axes = plt.subplots(2, 1, figsize=(14, 8), height_ratios=[2, 1])
    
    # Plot intensity
    ax1 = axes[0]
    ax1.plot(t_grid, intensity, 'b-', linewidth=1, label='Intensity λ(t)')
    ax1.axhline(y=mu, color='gray', linestyle='--', alpha=0.7, label=f'Baseline μ = {mu}')
    ax1.axhline(y=hp.expected_intensity, color='red', linestyle=':', 
                alpha=0.7, label=f'E[λ] = {hp.expected_intensity:.2f}')
    
    # Mark events on intensity plot
    event_intensities = [hp.intensity(t, events) for t in events]
    ax1.scatter(events, event_intensities, color='red', s=20, alpha=0.5, zorder=5)
    
    ax1.set_xlabel('Time')
    ax1.set_ylabel('Intensity')
    ax1.set_title(title)
    ax1.legend(loc='upper right')
    ax1.set_xlim(0, T)
    
    # Plot event arrivals
    ax2 = axes[1]
    ax2.eventplot([events], colors='blue', lineoffsets=0.5, linelengths=0.8)
    ax2.set_xlabel('Time')
    ax2.set_ylabel('Events')
    ax2.set_yticks([])
    ax2.set_xlim(0, T)
    ax2.set_title('Event Arrivals')
    
    plt.tight_layout()
    plt.show()
    
    return fig


# Visualize the simulation
fig = plot_hawkes_simulation(events, mu, alpha, beta, T)

In [None]:
# Zoom into a smaller window to see clustering
T_zoom = 20
events_zoom = events[events < T_zoom]

fig = plot_hawkes_simulation(events_zoom, mu, alpha, beta, T_zoom, 
                             title="Hawkes Process - Zoomed View (Showing Clustering)")

In [None]:
# Compare different parameter regimes
fig, axes = plt.subplots(3, 2, figsize=(14, 10))

params = [
    (0.5, 0.3, 1.0, "Low excitation (α/β = 0.3)"),
    (0.5, 0.7, 1.0, "Medium excitation (α/β = 0.7)"),
    (0.5, 0.95, 1.0, "High excitation (α/β = 0.95, near-critical)"),
    (1.0, 0.5, 1.0, "High baseline (μ = 1.0)"),
    (0.5, 0.8, 0.5, "Slow decay (β = 0.5)"),
    (0.5, 0.8, 2.0, "Fast decay (β = 2.0)"),
]

T_sim = 50

for ax, (mu_i, alpha_i, beta_i, title) in zip(axes.flat, params):
    events_i = simulate_hawkes(mu_i, alpha_i, beta_i, T_sim, seed=42)
    hp_i = HawkesProcess(mu_i, alpha_i, beta_i)
    
    t_grid = np.linspace(0, T_sim, 500)
    intensity = hp_i.intensity_path(t_grid, events_i)
    
    ax.plot(t_grid, intensity, 'b-', linewidth=0.8)
    ax.axhline(y=mu_i, color='gray', linestyle='--', alpha=0.5)
    ax.set_title(f"{title}\n({len(events_i)} events)")
    ax.set_xlabel('Time')
    ax.set_ylabel('λ(t)')

plt.tight_layout()
plt.suptitle('Effect of Different Parameters on Hawkes Process', y=1.02, fontsize=14)
plt.show()

---

## 4. Parameter Estimation <a id='4-estimation'></a>

### Maximum Likelihood Estimation (MLE)

For a Hawkes process observed over $[0, T]$ with events $\{t_1, ..., t_n\}$, the log-likelihood is:

$$\log L = \sum_{i=1}^{n} \log \lambda(t_i) - \int_0^T \lambda(s) ds$$

For the exponential kernel, the integral has a closed form:

$$\int_0^T \lambda(s) ds = \mu T + \frac{\alpha}{\beta} \sum_{i=1}^{n} \left(1 - e^{-\beta(T - t_i)}\right)$$

### Recursive Computation of Intensity

Define $R_i = \sum_{j<i} e^{-\beta(t_i - t_j)}$, then:

$$R_i = e^{-\beta(t_i - t_{i-1})}(1 + R_{i-1})$$

This allows $O(n)$ computation instead of $O(n^2)$.

In [None]:
def hawkes_log_likelihood(params: np.ndarray, events: np.ndarray, T: float) -> float:
    """
    Compute negative log-likelihood for Hawkes process with exponential kernel.
    Uses efficient recursive computation.
    
    Parameters:
    -----------
    params : np.ndarray
        Parameters [mu, alpha, beta]
    events : np.ndarray
        Event times
    T : float
        Observation window end time
        
    Returns:
    --------
    float : Negative log-likelihood (for minimization)
    """
    mu, alpha, beta = params
    
    # Check parameter validity
    if mu <= 0 or alpha <= 0 or beta <= 0:
        return np.inf
    if alpha / beta >= 1:  # Non-stationary
        return np.inf
    
    n = len(events)
    if n == 0:
        return mu * T  # Just baseline
    
    # Compute R values recursively (efficient O(n) computation)
    R = np.zeros(n)
    R[0] = 0
    for i in range(1, n):
        dt = events[i] - events[i-1]
        R[i] = np.exp(-beta * dt) * (1 + R[i-1])
    
    # Log-likelihood: sum of log(lambda(t_i))
    lambda_vals = mu + alpha * R
    if np.any(lambda_vals <= 0):
        return np.inf
    
    log_likelihood = np.sum(np.log(lambda_vals))
    
    # Compensator: integral of lambda from 0 to T
    # = mu*T + (alpha/beta) * sum(1 - exp(-beta*(T - t_i)))
    compensator = mu * T + (alpha / beta) * np.sum(1 - np.exp(-beta * (T - events)))
    
    # Negative log-likelihood for minimization
    neg_ll = -(log_likelihood - compensator)
    
    return neg_ll


def fit_hawkes_mle(events: np.ndarray, T: float, 
                  init_params: Optional[np.ndarray] = None) -> Tuple[np.ndarray, float]:
    """
    Fit Hawkes process parameters using MLE.
    
    Parameters:
    -----------
    events : np.ndarray
        Event times
    T : float
        Observation window end time
    init_params : np.ndarray, optional
        Initial parameter guess [mu, alpha, beta]
        
    Returns:
    --------
    Tuple[np.ndarray, float] : Estimated parameters and negative log-likelihood
    """
    n = len(events)
    
    # Initial guesses if not provided
    if init_params is None:
        mu_init = n / (2 * T)  # Half the average rate
        alpha_init = 0.5
        beta_init = 1.0
        init_params = np.array([mu_init, alpha_init, beta_init])
    
    # Bounds: all positive, alpha/beta < 1 enforced in likelihood
    bounds = [(1e-6, None), (1e-6, None), (1e-6, None)]
    
    result = minimize(
        hawkes_log_likelihood,
        init_params,
        args=(events, T),
        method='L-BFGS-B',
        bounds=bounds,
        options={'disp': False}
    )
    
    return result.x, result.fun


# Fit the model to our simulated data
true_params = np.array([mu, alpha, beta])
estimated_params, neg_ll = fit_hawkes_mle(events, T)

print("Parameter Estimation Results:")
print("=" * 50)
print(f"{'Parameter':<12} {'True':<12} {'Estimated':<12} {'Error %':<12}")
print("-" * 50)
param_names = ['mu', 'alpha', 'beta']
for name, true, est in zip(param_names, true_params, estimated_params):
    error = abs(est - true) / true * 100
    print(f"{name:<12} {true:<12.4f} {est:<12.4f} {error:<12.2f}")

print("-" * 50)
print(f"Branching ratio: True = {alpha/beta:.4f}, Estimated = {estimated_params[1]/estimated_params[2]:.4f}")

In [None]:
# Goodness-of-fit: Residual analysis using time transformation
def compute_transformed_times(events: np.ndarray, mu: float, 
                              alpha: float, beta: float) -> np.ndarray:
    """
    Compute transformed times using the compensator.
    If model is correct, transformed times should be unit exponential.
    
    Λ(t_i) - Λ(t_{i-1}) should be Exp(1) distributed.
    """
    n = len(events)
    if n == 0:
        return np.array([])
    
    # Compute compensator at each event time
    compensator = np.zeros(n)
    
    for i, t in enumerate(events):
        # Λ(t) = mu*t + (alpha/beta) * sum_{t_j < t}(1 - exp(-beta*(t - t_j)))
        past_events = events[:i]
        comp = mu * t
        if len(past_events) > 0:
            comp += (alpha / beta) * np.sum(1 - np.exp(-beta * (t - past_events)))
        compensator[i] = comp
    
    # Inter-compensator times
    transformed = np.diff(compensator)
    transformed = np.insert(transformed, 0, compensator[0])  # First interval
    
    return transformed


# Compute transformed times with estimated parameters
mu_est, alpha_est, beta_est = estimated_params
transformed_times = compute_transformed_times(events, mu_est, alpha_est, beta_est)

# Plot QQ-plot against exponential distribution
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Histogram
ax1 = axes[0]
ax1.hist(transformed_times, bins=30, density=True, alpha=0.7, label='Transformed times')
x = np.linspace(0, max(transformed_times), 100)
ax1.plot(x, expon.pdf(x), 'r-', linewidth=2, label='Exp(1) PDF')
ax1.set_xlabel('Transformed inter-arrival time')
ax1.set_ylabel('Density')
ax1.set_title('Residual Analysis: Histogram')
ax1.legend()

# QQ-plot
ax2 = axes[1]
sorted_transformed = np.sort(transformed_times)
theoretical_quantiles = expon.ppf(np.linspace(0.01, 0.99, len(sorted_transformed)))
empirical_quantiles = sorted_transformed[:len(theoretical_quantiles)]

ax2.scatter(theoretical_quantiles, empirical_quantiles, alpha=0.5, s=10)
max_val = max(max(theoretical_quantiles), max(empirical_quantiles))
ax2.plot([0, max_val], [0, max_val], 'r--', linewidth=2, label='45° line')
ax2.set_xlabel('Theoretical Quantiles (Exp(1))')
ax2.set_ylabel('Empirical Quantiles')
ax2.set_title('Residual Analysis: QQ-Plot')
ax2.legend()

plt.tight_layout()
plt.show()

# KS test
ks_stat, ks_pval = kstest(transformed_times, 'expon')
print(f"\nKolmogorov-Smirnov test:")
print(f"  Statistic: {ks_stat:.4f}")
print(f"  P-value: {ks_pval:.4f}")
print(f"  Result: {'Good fit' if ks_pval > 0.05 else 'Poor fit'} at 5% significance level")

---

## 5. Order Flow Modeling <a id='5-order-flow'></a>

### Financial Interpretation

In market microstructure, Hawkes processes model:

1. **Trade arrivals**: Each trade increases the probability of subsequent trades
2. **Order submissions**: New orders trigger algorithmic responses
3. **Quote updates**: Market makers adjust quotes after trades

### Generating Realistic Order Flow Data

In [None]:
def simulate_order_flow(T: float, seed: int = 42) -> pd.DataFrame:
    """
    Simulate realistic order flow using Hawkes process.
    
    Generates trade times, prices, and volumes.
    """
    np.random.seed(seed)
    
    # Market parameters
    mu_trade = 2.0       # Base trade arrival rate (per second)
    alpha_trade = 1.5    # Excitation strength
    beta_trade = 2.5     # Decay rate
    
    # Simulate trade times
    trade_times = simulate_hawkes(mu_trade, alpha_trade, beta_trade, T, seed=seed)
    n_trades = len(trade_times)
    
    # Generate trade characteristics
    # Side: more likely to be same as previous during clusters
    sides = np.zeros(n_trades, dtype=int)
    sides[0] = np.random.choice([-1, 1])  # -1 = sell, 1 = buy
    
    for i in range(1, n_trades):
        dt = trade_times[i] - trade_times[i-1]
        # Probability of same side increases with clustering
        prob_same = 0.5 + 0.3 * np.exp(-2 * dt)
        sides[i] = sides[i-1] if np.random.uniform() < prob_same else -sides[i-1]
    
    # Generate prices (random walk with jumps at trades)
    initial_price = 100.0
    tick_size = 0.01
    
    prices = np.zeros(n_trades)
    prices[0] = initial_price
    
    for i in range(1, n_trades):
        # Price impact + noise
        impact = sides[i] * tick_size * (1 + np.random.exponential(0.5))
        noise = np.random.normal(0, tick_size * 0.5)
        prices[i] = prices[i-1] + impact + noise
    
    # Generate volumes (clustered trades have correlated volumes)
    base_volume = 100
    volumes = np.zeros(n_trades)
    volumes[0] = base_volume * np.random.lognormal(0, 0.5)
    
    for i in range(1, n_trades):
        dt = trade_times[i] - trade_times[i-1]
        # Volume clustering
        if dt < 0.5:  # Clustered
            volumes[i] = volumes[i-1] * np.random.lognormal(0, 0.3)
        else:
            volumes[i] = base_volume * np.random.lognormal(0, 0.5)
    
    # Create DataFrame
    df = pd.DataFrame({
        'timestamp': trade_times,
        'price': prices,
        'volume': volumes.astype(int),
        'side': np.where(sides == 1, 'buy', 'sell')
    })
    
    return df


# Generate order flow data
T_market = 3600  # 1 hour of trading
order_flow = simulate_order_flow(T_market, seed=42)

print(f"Generated {len(order_flow)} trades over {T_market/60:.0f} minutes")
print(f"\nFirst 10 trades:")
order_flow.head(10)

In [None]:
# Visualize the order flow
fig, axes = plt.subplots(3, 1, figsize=(14, 10), sharex=True)

# Price path
ax1 = axes[0]
ax1.plot(order_flow['timestamp'], order_flow['price'], 'b-', linewidth=0.5)
ax1.set_ylabel('Price')
ax1.set_title('Order Flow Simulation (1 Hour)')

# Volume
ax2 = axes[1]
colors = ['green' if s == 'buy' else 'red' for s in order_flow['side']]
ax2.bar(order_flow['timestamp'], order_flow['volume'], color=colors, width=2, alpha=0.6)
ax2.set_ylabel('Volume')
ax2.legend(['Buy', 'Sell'], loc='upper right')

# Trade inter-arrival times
ax3 = axes[2]
inter_arrivals = np.diff(order_flow['timestamp'].values)
ax3.scatter(order_flow['timestamp'].values[1:], inter_arrivals, s=1, alpha=0.3)
ax3.set_ylabel('Inter-arrival time (s)')
ax3.set_xlabel('Time (seconds)')
ax3.set_yscale('log')

plt.tight_layout()
plt.show()

In [None]:
# Fit Hawkes model to order flow data
trade_times = order_flow['timestamp'].values

# Fit the model
params_order_flow, neg_ll = fit_hawkes_mle(trade_times, T_market)
mu_fit, alpha_fit, beta_fit = params_order_flow

print("Hawkes Process Fit to Order Flow:")
print("=" * 50)
print(f"Baseline intensity (μ): {mu_fit:.4f} trades/sec")
print(f"Jump size (α): {alpha_fit:.4f}")
print(f"Decay rate (β): {beta_fit:.4f}")
print(f"Branching ratio: {alpha_fit/beta_fit:.4f}")
print(f"Expected intensity: {mu_fit/(1 - alpha_fit/beta_fit):.4f} trades/sec")
print(f"\nHalf-life of excitation: {np.log(2)/beta_fit:.4f} seconds")

In [None]:
# Analyze clustering patterns
def analyze_order_clustering(events: np.ndarray, window_sizes: List[float]) -> pd.DataFrame:
    """
    Analyze order clustering at different time scales.
    """
    T = events[-1]
    results = []
    
    for window in window_sizes:
        # Count events in each window
        n_windows = int(T / window)
        counts = np.zeros(n_windows)
        
        for i in range(n_windows):
            start = i * window
            end = (i + 1) * window
            counts[i] = np.sum((events >= start) & (events < end))
        
        # Fano factor: var/mean (=1 for Poisson, >1 for clustered)
        fano = np.var(counts) / np.mean(counts) if np.mean(counts) > 0 else np.nan
        
        results.append({
            'window_size': window,
            'mean_count': np.mean(counts),
            'var_count': np.var(counts),
            'fano_factor': fano
        })
    
    return pd.DataFrame(results)


# Analyze clustering
windows = [1, 5, 10, 30, 60, 120, 300]  # seconds
clustering_analysis = analyze_order_clustering(trade_times, windows)

print("Order Clustering Analysis:")
print("="*60)
print(clustering_analysis.to_string(index=False))
print("\nFano factor > 1 indicates clustering (over-dispersion)")

---

## 6. Multivariate Hawkes for Buy/Sell Orders <a id='6-multivariate'></a>

### Bivariate Hawkes Process

In practice, buy and sell orders influence each other. A **bivariate Hawkes process** models this:

$$\lambda_1(t) = \mu_1 + \alpha_{11} \sum_{t_j^{(1)} < t} e^{-\beta(t - t_j^{(1)})} + \alpha_{12} \sum_{t_j^{(2)} < t} e^{-\beta(t - t_j^{(2)})}$$

$$\lambda_2(t) = \mu_2 + \alpha_{21} \sum_{t_j^{(1)} < t} e^{-\beta(t - t_j^{(1)})} + \alpha_{22} \sum_{t_j^{(2)} < t} e^{-\beta(t - t_j^{(2)})}$$

Where:
- $\alpha_{11}, \alpha_{22}$: self-excitation (same side triggers same side)
- $\alpha_{12}, \alpha_{21}$: cross-excitation (opposite side triggers this side)

In [None]:
def simulate_bivariate_hawkes(
    mu: np.ndarray,  # [mu1, mu2]
    alpha: np.ndarray,  # [[alpha11, alpha12], [alpha21, alpha22]]
    beta: float,
    T: float,
    seed: int = 42
) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
    """
    Simulate bivariate Hawkes process (e.g., buy/sell orders).
    
    Returns:
    --------
    Tuple: (all_events, event_types, event_times_by_type)
    """
    np.random.seed(seed)
    
    events = []  # (time, type)
    t = 0
    
    # Track cumulative excitation for each dimension
    R = np.zeros(2)  # Excitation from each type
    last_event_time = 0
    
    while t < T:
        # Calculate current intensities
        decay = np.exp(-beta * (t - last_event_time))
        lambda_vec = mu + alpha @ (R * decay)
        lambda_total = np.sum(lambda_vec)
        
        # Upper bound
        lambda_bar = np.sum(mu) + np.sum(alpha) * np.sum(R)
        lambda_bar = max(lambda_bar, lambda_total * 1.1)
        
        # Candidate time
        dt = np.random.exponential(1 / lambda_bar)
        t = t + dt
        
        if t >= T:
            break
        
        # Update R with decay
        R = R * np.exp(-beta * dt)
        
        # Calculate actual intensity
        lambda_vec = mu + alpha @ R
        lambda_total = np.sum(lambda_vec)
        
        # Accept/reject
        if np.random.uniform() <= lambda_total / lambda_bar:
            # Determine which type
            probs = lambda_vec / lambda_total
            event_type = np.random.choice([0, 1], p=probs)
            
            events.append((t, event_type))
            R[event_type] += 1
            last_event_time = t
    
    # Process output
    events = np.array(events)
    if len(events) == 0:
        return np.array([]), np.array([]), (np.array([]), np.array([]))
    
    times = events[:, 0]
    types = events[:, 1].astype(int)
    
    times_by_type = (times[types == 0], times[types == 1])
    
    return times, types, times_by_type


# Parameters for buy/sell order flow
mu_bi = np.array([1.0, 1.0])  # Baseline rates
alpha_bi = np.array([
    [0.4, 0.3],  # Buy orders: self-excite 0.4, cross-excite from sells 0.3
    [0.3, 0.4]   # Sell orders: cross-excite from buys 0.3, self-excite 0.4
])
beta_bi = 1.5

T_bi = 1000
times_bi, types_bi, (buy_times, sell_times) = simulate_bivariate_hawkes(
    mu_bi, alpha_bi, beta_bi, T_bi, seed=42
)

print(f"Bivariate Hawkes Simulation:")
print(f"Total events: {len(times_bi)}")
print(f"Buy orders: {len(buy_times)}")
print(f"Sell orders: {len(sell_times)}")

In [None]:
# Visualize bivariate process
fig, axes = plt.subplots(3, 1, figsize=(14, 8), sharex=True)

# Buy orders
ax1 = axes[0]
ax1.eventplot([buy_times], colors='green', lineoffsets=0.5, linelengths=0.8)
ax1.set_ylabel('Buy Orders')
ax1.set_yticks([])
ax1.set_title('Bivariate Hawkes Process: Buy and Sell Orders')

# Sell orders
ax2 = axes[1]
ax2.eventplot([sell_times], colors='red', lineoffsets=0.5, linelengths=0.8)
ax2.set_ylabel('Sell Orders')
ax2.set_yticks([])

# Combined order imbalance
ax3 = axes[2]
window = 5  # 5 second windows
n_windows = int(T_bi / window)
buy_counts = np.array([np.sum((buy_times >= i*window) & (buy_times < (i+1)*window)) 
                       for i in range(n_windows)])
sell_counts = np.array([np.sum((sell_times >= i*window) & (sell_times < (i+1)*window)) 
                        for i in range(n_windows)])
imbalance = (buy_counts - sell_counts) / (buy_counts + sell_counts + 1)
time_bins = np.arange(n_windows) * window + window/2

ax3.bar(time_bins, imbalance, width=window*0.8, 
        color=['green' if x > 0 else 'red' for x in imbalance], alpha=0.6)
ax3.axhline(y=0, color='black', linestyle='-', linewidth=0.5)
ax3.set_ylabel('Order Imbalance')
ax3.set_xlabel('Time (seconds)')
ax3.set_xlim(0, min(200, T_bi))  # Show first 200 seconds

plt.tight_layout()
plt.show()

In [None]:
# Cross-correlation analysis
def compute_cross_intensity(events1: np.ndarray, events2: np.ndarray, 
                            max_lag: float, n_bins: int = 50) -> Tuple[np.ndarray, np.ndarray]:
    """
    Compute cross-intensity function between two event sequences.
    Shows how events of type 2 cluster around events of type 1.
    """
    lags = []
    
    for t1 in events1:
        # Find events of type 2 within max_lag of t1
        nearby = events2[(events2 > t1 - max_lag) & (events2 < t1 + max_lag)]
        lags.extend(nearby - t1)
    
    lags = np.array(lags)
    
    # Histogram
    counts, bins = np.histogram(lags, bins=n_bins, range=(-max_lag, max_lag))
    bin_centers = (bins[:-1] + bins[1:]) / 2
    
    # Normalize by number of type 1 events and bin width
    bin_width = bins[1] - bins[0]
    intensity = counts / (len(events1) * bin_width)
    
    return bin_centers, intensity


# Compute cross-intensities
max_lag = 10

fig, axes = plt.subplots(2, 2, figsize=(12, 8))

# Buy -> Buy (autocorrelation)
lags, intensity = compute_cross_intensity(buy_times, buy_times, max_lag)
axes[0, 0].bar(lags, intensity, width=lags[1]-lags[0], alpha=0.7, color='green')
axes[0, 0].set_title('Buy → Buy (Self-excitation)')
axes[0, 0].set_xlabel('Lag (seconds)')
axes[0, 0].set_ylabel('Intensity')

# Sell -> Sell (autocorrelation)
lags, intensity = compute_cross_intensity(sell_times, sell_times, max_lag)
axes[0, 1].bar(lags, intensity, width=lags[1]-lags[0], alpha=0.7, color='red')
axes[0, 1].set_title('Sell → Sell (Self-excitation)')
axes[0, 1].set_xlabel('Lag (seconds)')
axes[0, 1].set_ylabel('Intensity')

# Buy -> Sell (cross-correlation)
lags, intensity = compute_cross_intensity(buy_times, sell_times, max_lag)
axes[1, 0].bar(lags, intensity, width=lags[1]-lags[0], alpha=0.7, color='purple')
axes[1, 0].set_title('Buy → Sell (Cross-excitation)')
axes[1, 0].set_xlabel('Lag (seconds)')
axes[1, 0].set_ylabel('Intensity')

# Sell -> Buy (cross-correlation)
lags, intensity = compute_cross_intensity(sell_times, buy_times, max_lag)
axes[1, 1].bar(lags, intensity, width=lags[1]-lags[0], alpha=0.7, color='orange')
axes[1, 1].set_title('Sell → Buy (Cross-excitation)')
axes[1, 1].set_xlabel('Lag (seconds)')
axes[1, 1].set_ylabel('Intensity')

plt.suptitle('Cross-Intensity Functions (Order Flow Dynamics)', fontsize=14)
plt.tight_layout()
plt.show()

---

## 7. Market Microstructure Applications <a id='7-applications'></a>

### Application 1: Measuring Market Excitability

In [None]:
def measure_market_excitability(events: np.ndarray, window: float, T: float) -> pd.DataFrame:
    """
    Measure how market excitability changes over time.
    Fit Hawkes model to rolling windows.
    """
    results = []
    
    start_times = np.arange(0, T - window, window / 2)  # 50% overlap
    
    for start in start_times:
        end = start + window
        window_events = events[(events >= start) & (events < end)] - start
        
        if len(window_events) < 10:
            continue
        
        try:
            params, _ = fit_hawkes_mle(window_events, window)
            mu, alpha, beta = params
            branching = alpha / beta
            
            results.append({
                'time': start + window / 2,
                'n_events': len(window_events),
                'mu': mu,
                'alpha': alpha,
                'beta': beta,
                'branching_ratio': branching,
                'half_life': np.log(2) / beta
            })
        except:
            pass
    
    return pd.DataFrame(results)


# Analyze excitability over time
excitability = measure_market_excitability(trade_times, window=600, T=T_market)

if len(excitability) > 0:
    fig, axes = plt.subplots(2, 2, figsize=(12, 8))
    
    axes[0, 0].plot(excitability['time']/60, excitability['branching_ratio'], 'b-o')
    axes[0, 0].set_xlabel('Time (minutes)')
    axes[0, 0].set_ylabel('Branching Ratio')
    axes[0, 0].set_title('Market Excitability Over Time')
    axes[0, 0].axhline(y=1, color='red', linestyle='--', label='Critical threshold')
    axes[0, 0].legend()
    
    axes[0, 1].plot(excitability['time']/60, excitability['half_life'], 'g-o')
    axes[0, 1].set_xlabel('Time (minutes)')
    axes[0, 1].set_ylabel('Half-life (seconds)')
    axes[0, 1].set_title('Excitation Decay Speed')
    
    axes[1, 0].plot(excitability['time']/60, excitability['n_events']/10, 'r-o')
    axes[1, 0].set_xlabel('Time (minutes)')
    axes[1, 0].set_ylabel('Trade Rate (per minute)')
    axes[1, 0].set_title('Trading Activity')
    
    axes[1, 1].scatter(excitability['n_events']/10, excitability['branching_ratio'], alpha=0.6)
    axes[1, 1].set_xlabel('Trade Rate (per minute)')
    axes[1, 1].set_ylabel('Branching Ratio')
    axes[1, 1].set_title('Activity vs Excitability')
    
    plt.tight_layout()
    plt.show()

### Application 2: Price Impact Estimation

In [None]:
def estimate_hawkes_price_impact(order_flow: pd.DataFrame, 
                                  horizon: float = 10.0) -> pd.DataFrame:
    """
    Estimate price impact using Hawkes-based order flow clustering.
    
    Key insight: Orders during high-intensity periods may have
    different price impact than isolated orders.
    """
    # Fit Hawkes to get intensity at each trade
    trade_times = order_flow['timestamp'].values
    params, _ = fit_hawkes_mle(trade_times, trade_times[-1])
    mu, alpha, beta = params
    
    hp = HawkesProcess(mu, alpha, beta)
    
    # Calculate intensity at each trade
    intensities = hp.intensity_path(trade_times, trade_times)
    
    # Categorize trades by intensity quintile
    order_flow = order_flow.copy()
    order_flow['intensity'] = intensities
    order_flow['intensity_quintile'] = pd.qcut(order_flow['intensity'], 5, labels=False)
    
    # Calculate forward returns
    order_flow['return_forward'] = order_flow['price'].pct_change().shift(-1) * 10000  # bps
    
    # Sign the return by trade direction
    order_flow['signed_return'] = order_flow['return_forward'] * order_flow['side'].map({'buy': 1, 'sell': -1})
    
    # Impact by intensity quintile
    impact_by_intensity = order_flow.groupby('intensity_quintile').agg({
        'signed_return': ['mean', 'std', 'count'],
        'intensity': 'mean'
    }).round(4)
    
    return impact_by_intensity, order_flow


# Analyze price impact
impact_analysis, flow_with_intensity = estimate_hawkes_price_impact(order_flow)

print("Price Impact by Order Flow Intensity:")
print("="*60)
print(impact_analysis)
print("\nInterpretation: Higher intensity periods may show different")
print("price impact characteristics due to clustering effects.")

### Application 3: Detecting Regime Changes

In [None]:
def detect_intensity_anomalies(events: np.ndarray, mu: float, alpha: float, 
                                beta: float, threshold: float = 3.0) -> np.ndarray:
    """
    Detect periods of abnormally high intensity (potential flash events).
    
    Parameters:
    -----------
    threshold : float
        Number of standard deviations above mean for anomaly
    """
    hp = HawkesProcess(mu, alpha, beta)
    
    # Compute intensity at each event
    intensities = np.array([hp.intensity(t, events) for t in events])
    
    # Expected intensity in stationary state
    mean_intensity = hp.expected_intensity
    std_intensity = np.std(intensities)
    
    # Find anomalies
    anomaly_threshold = mean_intensity + threshold * std_intensity
    anomaly_mask = intensities > anomaly_threshold
    
    return events[anomaly_mask], intensities[anomaly_mask], anomaly_threshold


# Detect anomalies in our simulated data
anomaly_times, anomaly_intensities, threshold = detect_intensity_anomalies(
    trade_times, *params_order_flow, threshold=2.5
)

print(f"Detected {len(anomaly_times)} anomalous periods (>{threshold:.2f} intensity)")
print(f"That's {len(anomaly_times)/len(trade_times)*100:.2f}% of all trades")

# Visualize
hp_fitted = HawkesProcess(*params_order_flow)
t_grid = np.linspace(0, T_market, 2000)
intensity_path = hp_fitted.intensity_path(t_grid, trade_times)

fig, ax = plt.subplots(figsize=(14, 5))

ax.plot(t_grid/60, intensity_path, 'b-', linewidth=0.5, alpha=0.7, label='Intensity')
ax.axhline(y=threshold, color='red', linestyle='--', label=f'Anomaly threshold ({threshold:.2f})')
ax.scatter(anomaly_times/60, anomaly_intensities, color='red', s=20, alpha=0.5, label='Anomalies')
ax.set_xlabel('Time (minutes)')
ax.set_ylabel('Intensity')
ax.set_title('Intensity Anomaly Detection')
ax.legend()

plt.tight_layout()
plt.show()

### Application 4: Order Flow Prediction

In [None]:
def predict_next_event_time(events: np.ndarray, t_current: float,
                            mu: float, alpha: float, beta: float,
                            n_samples: int = 1000) -> dict:
    """
    Predict the distribution of time until next event using Monte Carlo.
    """
    hp = HawkesProcess(mu, alpha, beta)
    
    # Current intensity
    lambda_current = hp.intensity(t_current, events)
    
    # Simulate next arrival times
    next_times = []
    
    for _ in range(n_samples):
        t = t_current
        lambda_t = lambda_current
        
        while True:
            # Upper bound
            lambda_bar = lambda_t
            
            # Candidate time
            dt = np.random.exponential(1 / lambda_bar)
            t_new = t + dt
            
            # New intensity (with decay)
            past = events[events < t_new]
            lambda_new = mu + alpha * np.sum(np.exp(-beta * (t_new - past)))
            
            # Accept/reject
            if np.random.uniform() <= lambda_new / lambda_bar:
                next_times.append(t_new - t_current)
                break
            
            t = t_new
            lambda_t = lambda_new
    
    next_times = np.array(next_times)
    
    return {
        'mean': np.mean(next_times),
        'median': np.median(next_times),
        'std': np.std(next_times),
        'quantile_5': np.percentile(next_times, 5),
        'quantile_95': np.percentile(next_times, 95),
        'samples': next_times
    }


# Predict next event at different states
print("Next Event Time Predictions:")
print("="*60)

# After a cluster (high intensity)
t_high = trade_times[100]  # Just after several events
pred_high = predict_next_event_time(trade_times[:100], t_high, *params_order_flow, n_samples=500)

# During quiet period
# Find a time with long inter-arrival
inter_arrivals = np.diff(trade_times)
quiet_idx = np.argmax(inter_arrivals) + 1
t_quiet = trade_times[quiet_idx]
pred_quiet = predict_next_event_time(trade_times[:quiet_idx], t_quiet, *params_order_flow, n_samples=500)

print(f"After cluster (high intensity):")
print(f"  Expected next event in: {pred_high['mean']:.3f} ± {pred_high['std']:.3f} seconds")
print(f"  90% CI: [{pred_high['quantile_5']:.3f}, {pred_high['quantile_95']:.3f}]")

print(f"\nDuring quiet period (low intensity):")
print(f"  Expected next event in: {pred_quiet['mean']:.3f} ± {pred_quiet['std']:.3f} seconds")
print(f"  90% CI: [{pred_quiet['quantile_5']:.3f}, {pred_quiet['quantile_95']:.3f}]")

---

## 8. Summary & Interview Questions <a id='8-summary'></a>

### Key Takeaways

| Concept | Description |
|---------|-------------|
| **Self-excitation** | Events trigger more events - fundamental to order flow |
| **Branching ratio** | $\alpha/\beta$ - criticality parameter (<1 for stability) |
| **Exponential kernel** | Tractable, captures decaying influence |
| **MLE estimation** | Closed-form compensator enables efficient fitting |
| **Multivariate** | Buy/sell cross-excitation reveals market dynamics |

### Financial Applications

1. **Order flow modeling**: Capture clustering in trade arrivals
2. **Market stress detection**: High branching ratio = fragile market
3. **Execution algorithms**: Predict when to trade based on intensity
4. **Flash crash analysis**: Understand cascading effects
5. **Market making**: Adjust quotes based on predicted order flow

In [None]:
# Summary visualization
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# 1. Basic Hawkes simulation
ax1 = axes[0, 0]
events_demo = simulate_hawkes(0.5, 0.8, 1.2, 30, seed=123)
hp_demo = HawkesProcess(0.5, 0.8, 1.2)
t_demo = np.linspace(0, 30, 300)
ax1.plot(t_demo, hp_demo.intensity_path(t_demo, events_demo), 'b-')
ax1.axhline(y=0.5, color='gray', linestyle='--', alpha=0.5)
for t in events_demo:
    ax1.axvline(x=t, color='red', alpha=0.2, linewidth=0.5)
ax1.set_xlabel('Time')
ax1.set_ylabel('Intensity')
ax1.set_title('1. Hawkes Process Intensity\n(Events shown as vertical lines)')

# 2. Branching ratio effect
ax2 = axes[0, 1]
branching_ratios = [0.3, 0.6, 0.9]
for br in branching_ratios:
    events_br = simulate_hawkes(0.5, br, 1.0, 50, seed=42)
    ax2.hist(np.diff(events_br), bins=30, alpha=0.5, density=True, 
             label=f'α/β = {br}')
ax2.set_xlabel('Inter-arrival time')
ax2.set_ylabel('Density')
ax2.set_title('2. Effect of Branching Ratio on\nInter-arrival Distribution')
ax2.legend()

# 3. Cross-excitation
ax3 = axes[1, 0]
lags, intensity = compute_cross_intensity(buy_times[:500], sell_times[:500], 5)
ax3.bar(lags, intensity, width=0.18, alpha=0.7)
ax3.set_xlabel('Lag (seconds)')
ax3.set_ylabel('Cross-intensity')
ax3.set_title('3. Cross-Excitation: Buy → Sell\n(How buys trigger sells)')

# 4. Market applications
ax4 = axes[1, 1]
applications = ['Order Flow\nModeling', 'Flash Crash\nDetection', 
                'Execution\nOptimization', 'Market\nMaking']
importance = [0.9, 0.85, 0.8, 0.75]
colors = plt.cm.Blues(np.linspace(0.4, 0.8, 4))
bars = ax4.barh(applications, importance, color=colors)
ax4.set_xlabel('Relevance')
ax4.set_title('4. Hawkes Process Applications\nin Market Microstructure')
ax4.set_xlim(0, 1)

plt.tight_layout()
plt.show()

### Common Interview Questions

**Q1: What is a Hawkes process and why is it used in finance?**
> A Hawkes process is a self-exciting point process where each event increases the probability of future events. In finance, it models order flow clustering - one trade often triggers more trades due to algorithmic reactions, market maker adjustments, and information cascades.

**Q2: Explain the branching ratio and its significance.**
> The branching ratio α/β represents the average number of "offspring" events triggered by each event. For stationarity, it must be < 1. Values close to 1 indicate a fragile market prone to cascading events. It's analogous to the reproduction number R in epidemiology.

**Q3: How would you detect flash crashes using Hawkes processes?**
> Monitor the real-time intensity and branching ratio. Abnormally high intensity or a branching ratio approaching 1 signals potential instability. Additionally, track regime changes in parameters across rolling windows to detect shifts toward critical regimes.

**Q4: What are limitations of exponential kernels?**
> Exponential kernels assume constant decay rate, but real markets show more complex behavior. Power-law kernels (long memory) may better capture slow-decaying effects. Also, real markets have intraday seasonality that basic Hawkes models don't capture.

**Q5: How would you use Hawkes processes for execution optimization?**
> Estimate current intensity to predict near-term order flow. Execute during low-intensity periods to minimize market impact. In high-intensity periods, your order gets "hidden" among others but may face more price volatility.

In [None]:
# Quick reference: Key formulas
print("""
╔══════════════════════════════════════════════════════════════════╗
║               HAWKES PROCESS - QUICK REFERENCE                   ║
╠══════════════════════════════════════════════════════════════════╣
║                                                                  ║
║  INTENSITY:     λ(t) = μ + α Σ exp(-β(t - tᵢ))                  ║
║                                                                  ║
║  PARAMETERS:                                                     ║
║    • μ = baseline intensity (exogenous rate)                     ║
║    • α = jump size (excitation magnitude)                        ║
║    • β = decay rate (how fast influence fades)                   ║
║                                                                  ║
║  KEY METRICS:                                                    ║
║    • Branching ratio: n* = α/β (must be < 1)                     ║
║    • Expected intensity: E[λ] = μ/(1 - n*)                       ║
║    • Half-life: t₁/₂ = ln(2)/β                                   ║
║                                                                  ║
║  LOG-LIKELIHOOD:                                                 ║
║    log L = Σ log λ(tᵢ) - ∫₀ᵀ λ(s) ds                            ║
║                                                                  ║
║  COMPENSATOR (closed form):                                      ║
║    ∫₀ᵀ λ(s) ds = μT + (α/β) Σ (1 - exp(-β(T - tᵢ)))             ║
║                                                                  ║
╚══════════════════════════════════════════════════════════════════╝
""")

print("\n✅ Notebook completed! Key skills learned:")
print("   • Hawkes process theory and simulation")
print("   • Maximum likelihood estimation")
print("   • Order flow modeling and analysis")
print("   • Market microstructure applications")