# 02 - Forward Return Calculation

Test `vf.forward_return()` to calculate y_* columns for trades.

**Process:**
1. Load trade and alpha data
2. Parse timestamps
3. Add mid price
4. Calculate forward returns at multiple horizons

**VizFlow v0.5.0 Note:**
Zero prices return `null` instead of `inf` for y_* columns.

In [12]:
import polars as pl
import vizflow as vf
from pathlib import Path

print(f"VizFlow version: {vf.__version__}")

VizFlow version: 0.5.0


In [13]:
# Load config
import sys
sys.path.insert(0, str(Path.cwd().parent / "configs"))
from default import config

vf.set_config(config)

## 1. Load and Prepare Data

In [None]:
DATE = "11110101"  # Use same date for trade and alpha

# Load trade data
df_trade = vf.scan_trade(DATE)
df_trade = vf.parse_time(df_trade, timestamp_col="alpha_ts")
df_trade = df_trade.with_columns(
    ((pl.col("bid_px0") + pl.col("ask_px0")) / 2).alias("mid")
)

# Load alpha data (same date)
df_alpha = vf.scan_alpha(DATE)
df_alpha = vf.parse_time(df_alpha, timestamp_col="ticktime")
df_alpha = df_alpha.with_columns(
    ((pl.col("bid_px0") + pl.col("ask_px0")) / 2).alias("mid")
)

print(f"Trade rows: {df_trade.select(pl.len()).collect().item():,}")
print(f"Alpha rows: {df_alpha.select(pl.len()).collect().item():,}")

## 2. Calculate Forward Returns

In [15]:
# Calculate forward returns at 60s, 3m, 30m horizons
HORIZONS = [60, 180, 1800]  # seconds

df_result = vf.forward_return(
    df_trade,
    df_alpha,
    horizons=HORIZONS,
    trade_time_col="elapsed_alpha_ts",
    alpha_time_col="elapsed_ticktime",
    price_col="mid",
    symbol_col="ukey",
)

print("New columns added:")
for col in df_result.collect_schema().names():
    if col.startswith("forward_") or col.startswith("y_"):
        print(f"  {col}")

New columns added:
  forward_mid_60s
  y_60s
  forward_mid_3m
  y_3m
  forward_mid_30m
  y_30m


In [16]:
# View results
result = df_result.select([
    "ukey",
    "order_side",
    "elapsed_alpha_ts",
    "mid",
    "forward_mid_60s",
    "forward_mid_3m",
    "forward_mid_30m",
    "y_60s",
    "y_3m",
    "y_30m",
]).collect()

print(result.head(10))

shape: (10, 10)
┌──────────┬────────────┬────────────┬────────┬───┬────────────┬───────────┬───────────┬───────────┐
│ ukey     ┆ order_side ┆ elapsed_al ┆ mid    ┆ … ┆ forward_mi ┆ y_60s     ┆ y_3m      ┆ y_30m     │
│ ---      ┆ ---        ┆ pha_ts     ┆ ---    ┆   ┆ d_30m      ┆ ---       ┆ ---       ┆ ---       │
│ i64      ┆ str        ┆ ---        ┆ f64    ┆   ┆ ---        ┆ f64       ┆ f64       ┆ f64       │
│          ┆            ┆ i64        ┆        ┆   ┆ f64        ┆           ┆           ┆           │
╞══════════╪════════════╪════════════╪════════╪═══╪════════════╪═══════════╪═══════════╪═══════════╡
│ 11000408 ┆ Sell       ┆ 6300845    ┆ 22.355 ┆ … ┆ 21.38      ┆ -0.022366 ┆ -0.028853 ┆ -0.043614 │
│ 11000408 ┆ Buy        ┆ 2734044    ┆ 22.65  ┆ … ┆ null       ┆ null      ┆ null      ┆ null      │
│ 11000408 ┆ Buy        ┆ 4843824    ┆ 22.28  ┆ … ┆ null       ┆ null      ┆ null      ┆ null      │
│ 11000408 ┆ Sell       ┆ 5613695    ┆ 22.365 ┆ … ┆ null       ┆ null      

## 3. Analyze Forward Returns

In [17]:
# Summary statistics
stats = result.select([
    pl.col("y_60s").mean().alias("y_60s_mean"),
    pl.col("y_60s").std().alias("y_60s_std"),
    pl.col("y_3m").mean().alias("y_3m_mean"),
    pl.col("y_3m").std().alias("y_3m_std"),
    pl.col("y_30m").mean().alias("y_30m_mean"),
    pl.col("y_30m").std().alias("y_30m_std"),
])
print("Forward Return Statistics:")
print(stats)

Forward Return Statistics:
shape: (1, 6)
┌────────────┬───────────┬───────────┬──────────┬────────────┬───────────┐
│ y_60s_mean ┆ y_60s_std ┆ y_3m_mean ┆ y_3m_std ┆ y_30m_mean ┆ y_30m_std │
│ ---        ┆ ---       ┆ ---       ┆ ---      ┆ ---        ┆ ---       │
│ f64        ┆ f64       ┆ f64       ┆ f64      ┆ f64        ┆ f64       │
╞════════════╪═══════════╪═══════════╪══════════╪════════════╪═══════════╡
│ -0.020882  ┆ 0.026382  ┆ -0.020097 ┆ 0.024476 ┆ -0.019683  ┆ 0.031715  │
└────────────┴───────────┴───────────┴──────────┴────────────┴───────────┘


In [18]:
# Check null rates (trades near market close may not have forward prices)
null_rates = result.select([
    (pl.col("y_60s").null_count() / pl.len()).alias("y_60s_null_rate"),
    (pl.col("y_3m").null_count() / pl.len()).alias("y_3m_null_rate"),
    (pl.col("y_30m").null_count() / pl.len()).alias("y_30m_null_rate"),
])
print("Null rates (trades without forward price):")
print(null_rates)

Null rates (trades without forward price):
shape: (1, 3)
┌─────────────────┬────────────────┬─────────────────┐
│ y_60s_null_rate ┆ y_3m_null_rate ┆ y_30m_null_rate │
│ ---             ┆ ---            ┆ ---             │
│ f64             ┆ f64            ┆ f64             │
╞═════════════════╪════════════════╪═════════════════╡
│ 0.726415        ┆ 0.91195        ┆ 0.933962        │
└─────────────────┴────────────────┴─────────────────┘


In [19]:
# Forward return by order side
by_side = result.group_by("order_side").agg([
    pl.len().alias("count"),
    pl.col("y_60s").mean().alias("y_60s_mean"),
    pl.col("y_3m").mean().alias("y_3m_mean"),
    pl.col("y_30m").mean().alias("y_30m_mean"),
])
print("Forward return by order side:")
print(by_side)

Forward return by order side:
shape: (2, 5)
┌────────────┬───────┬────────────┬───────────┬────────────┐
│ order_side ┆ count ┆ y_60s_mean ┆ y_3m_mean ┆ y_30m_mean │
│ ---        ┆ ---   ┆ ---        ┆ ---       ┆ ---        │
│ str        ┆ u32   ┆ f64        ┆ f64       ┆ f64        │
╞════════════╪═══════╪════════════╪═══════════╪════════════╡
│ Buy        ┆ 137   ┆ -0.015877  ┆ -0.022409 ┆ -0.008115  │
│ Sell       ┆ 181   ┆ -0.025141  ┆ -0.018813 ┆ -0.025467  │
└────────────┴───────┴────────────┴───────────┴────────────┘


## 4. Verify Calculation

Manually verify a few rows to ensure the calculation is correct.

In [20]:
# Take first trade and verify
first_trade = result.head(1)
print("First trade:")
print(first_trade)

# Manual calculation
mid = first_trade["mid"][0]
forward_mid_60s = first_trade["forward_mid_60s"][0]
expected_y_60s = (forward_mid_60s - mid) / mid if forward_mid_60s else None

print(f"\nManual verification:")
print(f"  mid = {mid}")
print(f"  forward_mid_60s = {forward_mid_60s}")
print(f"  expected y_60s = {expected_y_60s}")
print(f"  actual y_60s = {first_trade['y_60s'][0]}")

First trade:
shape: (1, 10)
┌──────────┬────────────┬────────────┬────────┬───┬────────────┬───────────┬───────────┬───────────┐
│ ukey     ┆ order_side ┆ elapsed_al ┆ mid    ┆ … ┆ forward_mi ┆ y_60s     ┆ y_3m      ┆ y_30m     │
│ ---      ┆ ---        ┆ pha_ts     ┆ ---    ┆   ┆ d_30m      ┆ ---       ┆ ---       ┆ ---       │
│ i64      ┆ str        ┆ ---        ┆ f64    ┆   ┆ ---        ┆ f64       ┆ f64       ┆ f64       │
│          ┆            ┆ i64        ┆        ┆   ┆ f64        ┆           ┆           ┆           │
╞══════════╪════════════╪════════════╪════════╪═══╪════════════╪═══════════╪═══════════╪═══════════╡
│ 11000408 ┆ Sell       ┆ 6300845    ┆ 22.355 ┆ … ┆ 21.38      ┆ -0.022366 ┆ -0.028853 ┆ -0.043614 │
└──────────┴────────────┴────────────┴────────┴───┴────────────┴───────────┴───────────┴───────────┘

Manual verification:
  mid = 22.355
  forward_mid_60s = 21.855
  expected y_60s = -0.022366360993066426
  actual y_60s = -0.022366360993066426


## Next Steps

Forward returns calculated. Continue to:
- **03_toxicity_analysis.ipynb**: Tag trades with conditions and analyze toxicity