## Historical Fee Simulation (Uniswap v3) — Outline

Goal: Given Uniswap v3 on-chain datasets (swaps, slot0, mints, burns) and token metadata, estimate fees a position would have earned when providing liquidity in a specific price range during a historical interval.

### 1) Data inputs and assumptions
- **Required files**
  - `dune_pipeline/swaps_YYYY_MM_DD_to_YYYY_MM_DD_*.csv` (has `amount0`, `amount1`, `sqrtPriceX96`, `tick`, `liquidity` per swap)
  - `dune_pipeline/slot0_*.csv` (periodic `sqrtPriceX96`, `tick`, `feeProtocol` snapshots)
  - `dune_pipeline/mints_*.csv`, `dune_pipeline/burns_*.csv` (position changes; optional for v1)
  - `dune_pipeline/token_metadata_*.csv` and `dune_pipeline/pool_config_*.csv`
- **Fee model (whitepaper)**
  - Swap pays feeRate on input token amount; fees accrue to active liquidity at that instant proportionally to liquidity share.
  - Protocol fee (if any) is carved out of fees. We'll read from `slot0.output_feeProtocol` when available; v1 assumes 0 if unavailable.
- **Active liquidity**
  - At the current tick, pool variable `liquidity` equals active liquidity. We use `swaps.liquidity` for per-swap active liquidity (post-swap).
- **First version constraints**
  - Event-based accrual at swap instants only; ignore intra-swap tick-cross nuance (acceptable approximation for analysis).
  - No compounding; fees are summed over time. No gas or price impact modeled.

### 2) User-configurable inputs
- Pool: from `pool_config_*` (includes `token0`, `token1`, `fee` bps, `tickSpacing`).
- Time window: `start_time`, `end_time`.
- Range: specify by ticks (`tickLower`, `tickUpper`) or prices; we map prices to nearest ticks using `tickSpacing`.
- Initial position sizing (one of):
  - Provide `liquidity L_position` directly (advanced), or
  - Provide deposit amounts (`amount0`, `amount1`) at initial price; we compute `L_position` using whitepaper formulas.
- Reporting currency: raw token0/token1 and optional USD (if external price source is available; out of scope v1).

### 3) Math recap (whitepaper)
- Let `P = sqrtPrice`, `a = sqrtPriceLower`, `b = sqrtPriceUpper` (real sqrt prices).
- Amounts vs liquidity (key formulas):
  - If `P <= a`: `amount0 = L * (b - a) / (a * b)`, `amount1 = 0`.
  - If `a < P < b`: `amount0 = L * (b - P) / (P * b)`, `amount1 = L * (P - a)`.
  - If `P >= b`: `amount0 = 0`, `amount1 = L * (b - a)`.
- Inversion to get `L` from provided `amount0, amount1` at `P` follows directly from the above.
- Price from `sqrtPriceX96`: `P = sqrtPriceX96 / 2^96`; token1 per token0 price is `P^2 * 10^(decimals0 - decimals1)`.

### 4) Fee attribution per swap (v1)
For each swap row in time window:
- Determine direction and input token:
  - If `amount0 > 0`, input is token0; else if `amount1 > 0`, input is token1. Use absolute input amount in human units.
- Fee amount on input: `fee_amount = fee_rate * input_amount` (adjust by protocol fee if present).
- Check active status: `tickLower <= currentTick < tickUpper` → position is active; otherwise fee share is 0.
- If active, share = `L_position / liquidity_at_swap`. Fees to position:
  - Add `fee_amount` to `fees_token0` or `fees_token1` per input token.
- Optionally compute notional value using price at swap for illustrations.

### 5) Processing pipeline
1. Load CSVs; parse datetimes to UTC; lower-case addresses; sort by time.
2. Read pool config (fee bps, tickSpacing) and token metadata (decimals, symbols).
3. Build a swap-timeseries dataframe with: `evt_block_time`, `tick`, `sqrtPriceX96`, `liquidity`, `amount0`, `amount1`.
4. Prepare user range:
   - If given prices → map to ticks using `tickSpacing` and sqrt mapping; otherwise accept ticks directly.
   - Compute `a`, `b` sqrt boundaries; compute `L_position` from provided deposit or accept given `L`.
5. Iterate swaps in window; compute per-swap fee attribution per section (4); accumulate token0/token1 fee totals and cumulative timeseries.
6. Summaries: totals (token0, token1), per-day aggregates, and simple APR estimate vs notional at start (optional).
7. Visualizations: price line, active range shading, and cumulative fee lines. Scatter swaps sized by input volume.

### 6) Validation plan
- Sanity: Sum of all swaps' fee amounts ≈ `fee_rate * sum(|inputs|)` across dataset; protocol fee 0 should match.
- Spot-check a window where `tick` stays inside range; compare manual calculation for a few swaps.
- Edge: windows where `tick` exits/enters the range—fees should cease/resume.

### 7) Caveats and extensions
- v1 ignores intra-swap tick crossing and per-tick fee growth accounting; accuracy is high when swaps do not traverse boundaries materially.
- Protocol fee: read from `slot0.output_feeProtocol` (packed); implement token-wise extraction if non-zero.
- Advanced: Use `mints`/`burns` to reconcile liquidity jumps; cross-check `swaps.liquidity` vs derived.
- Multi-range and rebalancing strategies: simulate periodic re-mint/burn; track inventory and fee reinvestment.
- PnL in a single numeraire (USD): integrate external price feeds if desired.

### 8) Implementation skeleton (to follow after your review)
- `load_pool_context()` → pool config, tokens, fee rate, tickSpacing
- `load_swaps()` → cleaned swaps df with derived columns (input token, input amount, fee amount)
- `range_to_ticks()` and `ticks_to_sqrts()` helpers
- `compute_liquidity_from_deposit(amount0, amount1, P, a, b)`
- `simulate_fees(swaps_df, L_position, tickLower, tickUpper)` → per-swap and aggregated outputs
- `plot_results()`

After you review/approve, I will implement v1 (sections 1–5) and add validation/plots.
