# QuantDL Tutorial

A comprehensive guide to using QuantDL for alpha research.

**Contents:**
1. Setup & Configuration
2. Data Fetching (Daily Prices, Fundamentals, Metrics)
3. Element-wise Operators (15 arithmetic + 11 logical)
4. Time-Series Operators (26 operators)
5. Cross-Sectional Operators (6 operators)
6. Group Operators (6 operators)
7. Vector & Transformational Operators (4 operators)
8. Alpha Factor Examples
9. Summary (68 operators total)

---
## 1. Setup & Configuration

In [None]:
# Load environment variables (for AWS credentials)
from dotenv import load_dotenv
load_dotenv()

# Import quantdl
from quantdl import QuantDLClient
from quantdl.operators import (
    # Time-series (basic)
    ts_mean, ts_sum, ts_std, ts_min, ts_max, ts_delta, ts_delay,
    # Time-series (rolling)
    ts_product, ts_count_nans, ts_zscore, ts_scale, ts_av_diff, ts_step,
    # Time-series (arg)
    ts_arg_max, ts_arg_min,
    # Time-series (lookback)
    ts_backfill, kth_element, last_diff_value, days_from_last_change,
    # Time-series (stateful)
    hump, ts_decay_linear, ts_rank,
    # Time-series (two-variable)
    ts_corr, ts_covariance, ts_quantile, ts_regression,
    # Cross-sectional
    rank, zscore, normalize, scale, quantile, winsorize,
    # Group operators
    group_rank, group_zscore, group_scale, group_neutralize, group_mean, group_backfill,
    # Vector operators
    vec_avg, vec_sum,
    # Arithmetic operators
    add, subtract, multiply, divide, inverse, log, power, signed_power, sqrt, sign, reverse, densify,
    # Logical operators
    and_, or_, not_, if_else, is_nan, lt, le, gt, ge, eq, ne,
    # Transformational operators
    bucket, trade_when,
)
# Note: abs, max, min shadow builtins, import with alias
from quantdl.operators import abs as ops_abs, max as ops_max, min as ops_min

import polars as pl
from datetime import date
import nest_asyncio
nest_asyncio.apply()  # Allow nested event loops in Jupyter

print(f"QuantDL version: {__import__('quantdl').__version__}")

In [None]:
# Initialize QuantDL client
# This connects to the us-equity-datalake S3 bucket
client = QuantDLClient()
print("Client initialized - connected to S3")

---
## 2. Data Fetching

QuantDL fetches data from S3 and returns **wide tables** (rows = dates, columns = symbols).

**Available data:**
- `ticks()`: Daily OHLCV price data
- `fundamentals()`: SEC filing data (revenue, net income, etc.)
- `metrics()`: Derived metrics (PE ratio, ROE, etc.)

### 2.1 Security Resolution

Resolve symbols, CIKs, or security IDs to `SecurityInfo` with point-in-time accuracy.

In [None]:
# Resolve single symbol
info = client.resolve("IBM")
if info:
    print(f"Symbol: {info.symbol}")
    print(f"Security ID: {info.security_id}")
    print(f"Company: {info.company}")
    print(f"CIK: {info.cik}")

In [None]:
# Point-in-time resolution - important for ticker changes
# META was called FB before 2022
print("META today:", client.resolve("META"))
print("META in 2020:", client.resolve("META", as_of=date(2020, 1, 1)))
print("FB in 2020:", client.resolve("FB", as_of=date(2020, 1, 1)))

### 2.2 Daily Price Data

Fetch OHLCV data as wide tables. We use symbols with reliable data coverage in 2024.

In [None]:
# Define symbols with good S3 data coverage in 2024
# These were verified to have 100+ trading days in 2024 H1
symbols = ["IBM", "TXN", "NOW", "BMY", "LMT", "META", "JNJ", "GD", "SO", "NEE"]

# Fetch daily close prices
prices = client.ticks(
    symbols,
    field="close",
    start="2024-01-01",
    end="2024-06-30"
)
print(f"Shape: {prices.shape}")
print(prices.head())

In [None]:
# Fetch volume data (for correlation examples later)
volume = client.ticks(
    symbols,
    field="volume",
    start="2024-01-01",
    end="2024-06-30"
)
print(f"Volume shape: {volume.shape}")

In [None]:
# Preview the data
print("Price data tail:")
print(prices.tail())

### 2.3 Fundamentals Data

Fetch SEC filing fundamentals. Available concepts include:
- `rev`: Revenue
- `net_inc`: Net Income
- `ta`: Total Assets
- `tl`: Total Liabilities
- And more...

In [None]:
# Fetch revenue data (quarterly filings)
revenue = client.fundamentals(["IBM", "JNJ"], concept="rev", start="2022-01-01", end="2024-12-31")
print("Revenue data (quarterly):")
print(revenue.drop_nulls())

In [None]:
# Fetch net income
net_income = client.fundamentals(["IBM", "JNJ"], concept="net_inc", start="2022-01-01", end="2024-12-31")
print("Net Income data:")
print(net_income.drop_nulls())

### 2.4 Derived Metrics

Fetch pre-computed metrics (PE ratio, ROE, etc.).

In [None]:
# Fetch PE ratio (if available)
try:
    pe = client.metrics(["IBM", "JNJ"], metric="pe_ratio", start="2022-01-01", end="2024-12-31")
    print("PE Ratio:")
    print(pe.drop_nulls())
except Exception as e:
    print(f"Metrics not available: {e}")

---
## 3. Element-wise Operators

Element-wise operators transform values at each cell independently.

**When to use:**
- Arithmetic: Build composite features (returns, ratios, signals)
- Logical: Conditional alpha logic (filters, masks, branching)

### 3.1 Arithmetic Operators (15 operators)

| Operator | Description |
|----------|-------------|
| `abs` | Absolute value |
| `add` | Element-wise addition (variadic) |
| `subtract` | Element-wise subtraction |
| `multiply` | Element-wise multiplication (variadic) |
| `divide` | Safe division (null on div-by-zero) |
| `inverse` | 1/x (null on zero) |
| `log` | Natural log (null on <=0) |
| `max` | Element-wise max across DataFrames |
| `min` | Element-wise min across DataFrames |
| `power` | x^y |
| `signed_power` | sign(x) * |x|^y |
| `sqrt` | Square root (null on negative) |
| `sign` | Sign function (-1, 0, 1) |
| `reverse` | Negation (-x) |
| `densify` | Remap unique values to 0..n-1 |

In [None]:
# Setup: column names for later use
date_col = prices.columns[0]
value_cols = prices.columns[1:]

In [None]:
# abs: Absolute value
daily_change = ts_delta(prices, 1)
abs_change = ops_abs(daily_change)
print("Absolute daily change:")
print(abs_change.head())

In [None]:
# add: Variadic addition (sum multiple DataFrames)
# Example: combine price and volume signals
price_signal = ts_zscore(prices, 20)
vol_signal = ts_zscore(volume, 20)
combined = add(price_signal, vol_signal)  # Can add more: add(a, b, c, d)
print("Combined signal (price zscore + volume zscore):")
print(combined.tail())

In [None]:
# subtract: Element-wise subtraction
price_momentum = ts_delta(prices, 5)
price_momentum_10 = ts_delta(prices, 10)
momentum_diff = subtract(price_momentum, price_momentum_10)
print("Momentum spread (5d - 10d):")
print(momentum_diff.tail())

In [None]:
# multiply: Variadic multiplication
# Example: volume-weighted price change
weighted = multiply(daily_change, volume)
print("Volume-weighted price change:")
print(weighted.tail())

In [None]:
# divide: Safe division (handles div-by-zero as null)
lagged_prices = ts_delay(prices, 1)
daily_return = divide(daily_change, lagged_prices)  # (P_t - P_{t-1}) / P_{t-1}
print("Daily returns (safe division):")
print(daily_return.tail())

In [None]:
# inverse: 1/x with null handling
inv_prices = inverse(prices)
print("Inverse of prices (1/price):")
print(inv_prices.head())

In [None]:
# log: Natural log (null for <=0 values)
log_prices = log(prices)
print("Log prices (for log-returns):")
print(log_prices.head())

In [None]:
# max/min: Element-wise max/min across DataFrames
ma_5 = ts_mean(prices, 5)
ma_20 = ts_mean(prices, 20)
ma_upper = ops_max(ma_5, ma_20)  # Higher of 5d and 20d MA
ma_lower = ops_min(ma_5, ma_20)  # Lower of 5d and 20d MA
print("Upper envelope (max of MA5, MA20):")
print(ma_upper.tail())

In [None]:
# power: x^y element-wise
# Create exponent DataFrame (constant 2.0 for squaring)
exponent = prices.select(pl.col(date_col), *[pl.lit(2.0).alias(c) for c in value_cols])
squared = power(prices, exponent)
print("Prices squared:")
print(squared.head())

In [None]:
# signed_power: sign(x) * |x|^y - preserves sign
# Useful for non-linear transformations that preserve direction
returns = ts_delta(log_prices, 1)
exp_half = prices.select(pl.col(date_col), *[pl.lit(0.5).alias(c) for c in value_cols])
sqrt_returns = signed_power(returns, exp_half)
print("Signed sqrt of returns (preserves direction):")
print(sqrt_returns.tail())

In [None]:
# sqrt: Square root (null for negative)
sqrt_prices = sqrt(prices)
print("Sqrt of prices:")
print(sqrt_prices.head())

In [None]:
# sign: Returns -1, 0, or 1
sign_change = sign(daily_change)
print("Sign of daily change:")
print(sign_change.head())

In [None]:
# reverse: Negation (-x)
neg_momentum = reverse(price_momentum)
print("Negative momentum (for mean reversion):")
print(neg_momentum.tail())

In [None]:
# densify: Remap unique values to consecutive integers per row
# Useful for categorical encoding
ranked = rank(prices)
bucketed = bucket(ranked, range_spec="0,1,0.2")  # 5 buckets
dense = densify(bucketed)
print("Densified bucket indices:")
print(dense.head())

### 3.2 Logical Operators (11 operators)

| Operator | Description |
|----------|-------------|
| `and_` | Logical AND |
| `or_` | Logical OR |
| `not_` | Logical NOT |
| `if_else` | Conditional selection |
| `is_nan` | Detect NaN/null |
| `lt` | Less than (<) |
| `le` | Less than or equal (<=) |
| `gt` | Greater than (>) |
| `ge` | Greater than or equal (>=) |
| `eq` | Equal (==) |
| `ne` | Not equal (!=) |

In [None]:
# Comparison operators with scalar
# gt: Greater than
above_ma = gt(prices, ma_20)  # Price > 20-day MA
print("Price above 20-day MA (True/False):")
print(above_ma.head())

In [None]:
# lt, le, ge: Other comparisons
below_ma = lt(prices, ma_20)  # Price < MA
at_or_above = ge(prices, ma_20)  # Price >= MA
print("Price below MA:")
print(below_ma.head())

In [None]:
# eq, ne: Equality comparisons (useful for categorical data)
# Compare with scalar
is_positive = gt(daily_change, 0)  # Up day
is_negative = lt(daily_change, 0)  # Down day
print("Is up day:")
print(is_positive.head())

In [None]:
# and_: Logical AND
# Buy signal: price above MA AND positive momentum
pos_momentum = gt(price_momentum, 0)
buy_signal = and_(above_ma, pos_momentum)
print("Buy signal (above MA AND positive momentum):")
print(buy_signal.head())

In [None]:
# or_: Logical OR
# Volatility signal: big move up OR big move down
big_up = gt(daily_return, 0.02)    # > 2% return
big_down = lt(daily_return, -0.02)  # < -2% return
volatile = or_(big_up, big_down)
print("Volatile day (|return| > 2%):")
print(volatile.tail())

In [None]:
# not_: Logical NOT
not_volatile = not_(volatile)
print("Not volatile:")
print(not_volatile.tail())

In [None]:
# is_nan: Detect NaN/null values
has_nan = is_nan(daily_return)
print("Is NaN (first row has NaN from delta):")
print(has_nan.head())

In [None]:
# if_else: Conditional selection with scalar branches
# Example: Cap large returns at +/-5%
capped_return = if_else(
    gt(daily_return, 0.05),  # condition
    0.05,                     # then (scalar)
    if_else(
        lt(daily_return, -0.05),
        -0.05,
        daily_return           # else (DataFrame)
    )
)
print("Capped returns (+/-5%):")
print(capped_return.tail())

In [None]:
# if_else with DataFrame branches
# Example: Use momentum alpha when trend is up, mean-reversion when down
momentum_alpha = rank(ts_delta(prices, 20))
mean_rev_alpha = reverse(rank(ts_delta(prices, 5)))
adaptive_alpha = if_else(above_ma, momentum_alpha, mean_rev_alpha)
print("Adaptive alpha (trend-following when above MA, mean-reversion when below):")
print(adaptive_alpha.tail())

---
## 4. Time-Series Operators (26 operators)

Time-series operators work **column-wise** (down each column over time).

**When to use:** Moving averages, momentum, volatility, trend signals.

### 4.1 Basic Rolling (7 operators)

| Operator | Description |
|----------|-------------|
| `ts_mean` | Rolling mean |
| `ts_sum` | Rolling sum |
| `ts_std` | Rolling standard deviation |
| `ts_min` | Rolling minimum |
| `ts_max` | Rolling maximum |
| `ts_delta` | Difference from d days ago |
| `ts_delay` | Lag values by d days |

In [None]:
# ts_mean: Moving average
ma_20 = ts_mean(prices, 20)
print("20-day moving average:")
print(ma_20.tail())

In [None]:
# ts_sum: Rolling sum (e.g., cumulative volume)
vol_20d = ts_sum(volume, 20)
print("20-day cumulative volume:")
print(vol_20d.tail())

In [None]:
# ts_std: Rolling volatility
volatility = ts_std(daily_return, 20)
print("20-day rolling volatility:")
print(volatility.tail())

In [None]:
# ts_min, ts_max: Rolling min/max (support/resistance)
rolling_high = ts_max(prices, 20)
rolling_low = ts_min(prices, 20)
print("20-day high/low:")
print(rolling_high.tail())

In [None]:
# ts_delta: Price momentum (difference from d days ago)
momentum_20d = ts_delta(prices, 20)
print("20-day price change:")
print(momentum_20d.tail())

In [None]:
# ts_delay: Lagged values (for computing returns)
prices_5d_ago = ts_delay(prices, 5)
print("Prices 5 days ago:")
print(prices_5d_ago.tail())

### 4.2 Advanced Rolling (6 operators)

| Operator | Description |
|----------|-------------|
| `ts_product` | Rolling product |
| `ts_count_nans` | Count nulls in window |
| `ts_zscore` | Rolling z-score |
| `ts_scale` | Rolling min-max scale |
| `ts_av_diff` | Deviation from rolling mean |
| `ts_step` | Row counter |

In [None]:
# ts_product: Cumulative returns
# First compute 1 + daily_return
one_df = prices.select(pl.col(date_col), *[pl.lit(1.0).alias(c) for c in value_cols])
return_factor = add(daily_return, one_df)
cum_return = ts_product(return_factor, 5)
print("5-day cumulative return factor:")
print(cum_return.tail())

In [None]:
# ts_count_nans: Count missing values in window
nan_count = ts_count_nans(daily_return, 10)
print("Count of NaN in 10-day window:")
print(nan_count.head())

In [None]:
# ts_zscore: Rolling z-score (normalized deviation)
price_zscore = ts_zscore(prices, 20)
print("20-day rolling z-score:")
print(price_zscore.tail())

In [None]:
# ts_scale: Rolling min-max normalization [0, 1]
scaled_price = ts_scale(prices, 20)
print("20-day scaled price [0,1]:")
print(scaled_price.tail())

In [None]:
# ts_av_diff: Deviation from rolling mean
price_dev = ts_av_diff(prices, 20)
print("Deviation from 20-day mean:")
print(price_dev.tail())

In [None]:
# ts_step: Row counter (time index)
time_idx = ts_step(prices)
print("Row counter:")
print(time_idx.head())

### 4.3 Arg and Lookback (6 operators)

| Operator | Description |
|----------|-------------|
| `ts_arg_max` | Days since window max |
| `ts_arg_min` | Days since window min |
| `ts_backfill` | Fill nulls with last valid |
| `kth_element` | K-th element in lookback |
| `last_diff_value` | Last different value |
| `days_from_last_change` | Days since value changed |

In [None]:
# ts_arg_max: Days since rolling high (0 = today is the high)
days_since_high = ts_arg_max(prices, 20)
print("Days since 20-day high:")
print(days_since_high.tail())

In [None]:
# ts_arg_min: Days since rolling low
days_since_low = ts_arg_min(prices, 20)
print("Days since 20-day low:")
print(days_since_low.tail())

In [None]:
# ts_backfill: Forward-fill NaN values
sparse = daily_return.head(10)  # Has NaN in first row
filled = ts_backfill(sparse, 5)
print("Original (with NaN):")
print(sparse.head(3))
print("After backfill:")
print(filled.head(3))

In [None]:
# kth_element: Get k-th element in lookback window
third_from_last = kth_element(prices, 5, 3)  # 3rd element in 5-day window
print("3rd element in 5-day lookback:")
print(third_from_last.tail())

In [None]:
# last_diff_value: Last value that was different
discrete_signal = bucket(rank(prices), range_spec="0,1,0.25")  # Discretize to buckets
last_different = last_diff_value(discrete_signal, 10)
print("Discrete signal:")
print(discrete_signal.head())
print("Last different value:")
print(last_different.head())

In [None]:
# days_from_last_change: Days since value changed
days_unchanged = days_from_last_change(discrete_signal)
print("Days since signal changed:")
print(days_unchanged.head())

### 4.4 Stateful Operators (3 operators)

| Operator | Description |
|----------|-------------|
| `hump` | Limit change magnitude |
| `ts_decay_linear` | Linear decay weighted average |
| `ts_rank` | Percentile rank in window |

In [None]:
# hump: Limit how much value can change between rows
# Useful for smoothing signals and preventing whipsaws
smooth_signal = hump(price_zscore, 0.5)  # Max change of 0.5 per period
print("Original z-score:")
print(price_zscore.tail(3))
print("Humped (smoothed) z-score:")
print(smooth_signal.tail(3))

In [None]:
# ts_decay_linear: Weighted average with linear decay (recent weighted more)
# Weights: [1, 2, 3, ..., d] normalized
decay_avg = ts_decay_linear(prices, 10)
print("10-day linear decay weighted average:")
print(decay_avg.tail())

In [None]:
# ts_rank: Percentile rank of current value in rolling window
# Returns 0-1 (1 = highest in window)
percentile = ts_rank(prices, 20)
print("Percentile rank in 20-day window:")
print(percentile.tail())

### 4.5 Two-Variable Operators (4 operators)

| Operator | Description |
|----------|-------------|
| `ts_corr` | Rolling correlation |
| `ts_covariance` | Rolling covariance |
| `ts_quantile` | Rank + inverse CDF transform |
| `ts_regression` | Rolling OLS regression |

In [None]:
# ts_corr: Rolling correlation between two DataFrames
# Correlates matching columns (IBM price with IBM volume, etc.)
price_vol_corr = ts_corr(prices, volume, 20)
print("20-day rolling price-volume correlation:")
print(price_vol_corr.tail())

In [None]:
# ts_covariance: Rolling covariance
price_vol_cov = ts_covariance(prices, volume, 20)
print("20-day rolling covariance:")
print(price_vol_cov.tail())

In [None]:
# ts_quantile: Transform rank to Gaussian via inverse CDF
gaussian_rank = ts_quantile(prices, 20)
print("Gaussian quantile transform:")
print(gaussian_rank.tail())

In [None]:
# ts_regression: Rolling OLS regression (y ~ x)
# Returns beta coefficient by default
beta = ts_regression(prices, volume, 20, rettype="beta")
print("20-day rolling beta (price vs volume):")
print(beta.tail())

In [None]:
# ts_regression with different return types
alpha_reg = ts_regression(prices, volume, 20, rettype="alpha")  # Intercept
resid = ts_regression(prices, volume, 20, rettype="resid")  # Residual (last)
print("Regression alpha (intercept):")
print(alpha_reg.tail())

---
## 5. Cross-Sectional Operators (6 operators)

Cross-sectional operators work **row-wise** (across symbols at each date).

**When to use:** Ranking stocks, standardizing across universe, portfolio construction.

| Operator | Description |
|----------|-------------|
| `rank` | Rank to [0, 1] across symbols |
| `zscore` | Standardize (mean=0, std=1) |
| `normalize` | Demean (subtract row mean) |
| `scale` | Scale to target abs sum |
| `quantile` | Rank + inverse CDF |
| `winsorize` | Clip to mean +/- n*std |

In [None]:
# rank: Cross-sectional rank [0, 1]
# rate parameter: 1.0 = standard, 2.0 = squared ranks (emphasize extremes)
price_rank = rank(prices)
print("Cross-sectional rank (highest price = 1.0):")
print(price_rank.head())

In [None]:
# rank with rate parameter
rank_squared = rank(prices, rate=2.0)
print("Squared rank (emphasizes extremes):")
print(rank_squared.head())

In [None]:
# zscore: Cross-sectional standardization
cs_zscore = zscore(momentum_20d)
print("Cross-sectional z-score of momentum:")
print(cs_zscore.tail())

In [None]:
# normalize: Demean (row sums to ~0)
demeaned = normalize(momentum_20d)
print("Demeaned momentum:")
print(demeaned.tail())
# Verify row sums
row_sums = demeaned.select(pl.sum_horizontal(pl.exclude(date_col))).to_series()
print(f"Row sums (should be ~0): {row_sums.tail(3).to_list()}")

In [None]:
# scale: Scale to target absolute sum (for portfolio weights)
# longscale/shortscale: separate scaling for long and short positions
weights = scale(demeaned, scale=1.0)
print("Portfolio weights (|sum| = 1):")
print(weights.tail())
# Verify
abs_sums = weights.select(pl.sum_horizontal(*[pl.col(c).abs() for c in value_cols])).to_series()
print(f"Abs sums (should be ~1): {abs_sums.tail(3).to_list()}")

In [None]:
# scale with longscale/shortscale
weights_asymmetric = scale(demeaned, longscale=0.6, shortscale=0.4)
print("Asymmetric weights (60% long, 40% short):")
print(weights_asymmetric.tail())

In [None]:
# quantile: Rank + inverse CDF transform
# driver: "gaussian" (default), "uniform", "cauchy"
gaussian_quantile = quantile(momentum_20d, driver="gaussian")
print("Gaussian quantile transform:")
print(gaussian_quantile.tail())

In [None]:
# winsorize: Clip outliers to mean +/- n*std
winsorized = winsorize(momentum_20d, std=2.0)
print("Winsorized momentum (clipped to +/-2 std):")
print(winsorized.tail())

---
## 6. Group Operators (6 operators)

Group operators apply cross-sectional operations within defined groups (e.g., sectors).

**When to use:** Sector-neutral alphas, industry-relative signals.

| Operator | Description |
|----------|-------------|
| `group_rank` | Rank within groups |
| `group_zscore` | Z-score within groups |
| `group_scale` | Min-max scale within groups |
| `group_neutralize` | Subtract group mean |
| `group_mean` | Weighted mean within groups |
| `group_backfill` | Fill NaN with group mean |

In [None]:
# Define sector groups
# Tech: IBM, TXN, NOW, META
# Healthcare: BMY, JNJ
# Defense: LMT, GD
# Utilities: SO, NEE
sector_map = {
    "IBM": 1, "TXN": 1, "NOW": 1, "META": 1,  # Tech
    "BMY": 2, "JNJ": 2,                          # Healthcare
    "LMT": 3, "GD": 3,                           # Defense
    "SO": 4, "NEE": 4,                           # Utilities
}

# Create group DataFrame (same structure as prices)
groups = prices.select(
    pl.col(date_col),
    *[pl.lit(sector_map.get(c, 0)).alias(c) for c in value_cols]
)
print("Sector groups:")
print(groups.head(1))

In [None]:
# group_rank: Rank within sector
sector_rank = group_rank(momentum_20d, groups)
print("Momentum rank within sector:")
print(sector_rank.tail())

In [None]:
# group_zscore: Z-score within sector
sector_zscore = group_zscore(momentum_20d, groups)
print("Momentum z-score within sector:")
print(sector_zscore.tail())

In [None]:
# group_scale: Min-max scale within sector [0, 1]
sector_scaled = group_scale(momentum_20d, groups)
print("Momentum scaled within sector:")
print(sector_scaled.tail())

In [None]:
# group_neutralize: Subtract sector mean (sector-neutral alpha)
sector_neutral = group_neutralize(momentum_20d, groups)
print("Sector-neutral momentum:")
print(sector_neutral.tail())

In [None]:
# group_mean: Weighted mean within sector
# Use market cap (approximated by price * volume) as weight
market_cap_proxy = multiply(prices, volume)
sector_avg = group_mean(momentum_20d, market_cap_proxy, groups)
print("Market-cap weighted sector average momentum:")
print(sector_avg.tail())

In [None]:
# group_backfill: Fill NaN with winsorized group mean
print("group_backfill fills NaN values with the winsorized group mean")
print("Parameters: d=lookback window, std=winsorization threshold")
print("Example usage: group_backfill(sparse_data, groups, d=10, std=4.0)")

---
## 7. Vector & Transformational Operators (4 operators)

### 7.1 Vector Operators (2 operators)

Work on list-type columns (arrays within cells).

| Operator | Description |
|----------|-------------|
| `vec_avg` | Mean of list elements |
| `vec_sum` | Sum of list elements |

In [None]:
# Create data with list-type columns
# Example: multiple analyst price targets per stock
list_data = pl.DataFrame({
    "timestamp": [date(2024, 1, 1), date(2024, 1, 2)],
    "IBM": [[180.0, 185.0, 190.0], [182.0, 187.0]],
    "TXN": [[200.0, 205.0], [210.0, 215.0, 220.0]],
})
print("Data with list columns:")
print(list_data)

In [None]:
# vec_avg: Average of list elements
avg_targets = vec_avg(list_data)
print("Average of list elements:")
print(avg_targets)

In [None]:
# vec_sum: Sum of list elements
sum_targets = vec_sum(list_data)
print("Sum of list elements:")
print(sum_targets)

### 7.2 Transformational Operators (2 operators)

| Operator | Description |
|----------|-------------|
| `bucket` | Discretize to bucket indices |
| `trade_when` | Stateful entry/exit logic |

In [None]:
# bucket with range_spec: evenly spaced boundaries
# range_spec="start,end,step" -> boundaries at start, start+step, ..., end
momentum_buckets = bucket(cs_zscore, range_spec="-2,2,0.5")
print("Momentum z-score buckets (boundaries at -2, -1.5, ..., 2):")
print(momentum_buckets.tail())

In [None]:
# bucket with explicit boundaries
momentum_quintiles = bucket(cs_zscore, buckets="-1.5,-0.5,0.5,1.5")
print("Momentum quintiles:")
print(momentum_quintiles.tail())

In [None]:
# bucket with skipBegin/skipEnd: exclude edge buckets
inner_buckets = bucket(cs_zscore, range_spec="-1,1,0.5", skipBoth=True)
print("Inner buckets only (skip extremes):")
print(inner_buckets.tail())

In [None]:
# bucket with NANGroup: assign NaN to separate bucket
with_nan = bucket(daily_return, range_spec="-0.02,0.02,0.01", NANGroup=True)
print("Returns buckets with NANGroup:")
print(with_nan.head())  # First row has NaN -> gets special bucket index

In [None]:
# trade_when: Stateful entry/exit trading logic
# Entry when momentum z-score > 1, exit when < 0
entry_trigger = gt(cs_zscore, 1.0)  # Boolean: enter when zscore > 1
exit_trigger = lt(cs_zscore, 0.0)   # Boolean: exit when zscore < 0

# Convert to numeric (trade_when expects > 0 as True)
entry_numeric = if_else(entry_trigger, 1.0, 0.0)
exit_numeric = if_else(exit_trigger, 1.0, 0.0)

# Alpha to use when in position
alpha_signal = cs_zscore

# Stateful trading signal
trade_signal = trade_when(entry_numeric, alpha_signal, exit_numeric)
print("Trade signal (NaN = no position):")
print(trade_signal.tail(10))

---
## 8. Alpha Factor Examples

### 8.1 Momentum Alpha (Simple)

In [None]:
# Build 20-day momentum factor with dollar-neutral weights
# Step 1: Calculate returns
price_20d_ago = ts_delay(prices, 20)
momentum = divide(subtract(prices, price_20d_ago), price_20d_ago)

# Step 2: Rank cross-sectionally
momentum_ranked = rank(momentum)

# Step 3: Z-score to center around 0
alpha = zscore(momentum_ranked)

# Step 4: Scale to dollar-neutral weights
weights = scale(alpha, scale=1.0)

print("Momentum alpha weights:")
print(weights.tail())

In [None]:
# Verify dollar-neutrality
last_row = weights.tail(1)
weight_values = [last_row[c][0] for c in value_cols]
print(f"Sum of weights: {sum(w for w in weight_values if w is not None):.6f} (should be ~0)")
print(f"Sum of |weights|: {sum(abs(w) for w in weight_values if w is not None):.6f} (should be ~1)")
print(f"Long positions: {sum(1 for w in weight_values if w is not None and w > 0)}")
print(f"Short positions: {sum(1 for w in weight_values if w is not None and w < 0)}")

### 8.2 Combined Alpha (Multi-Factor)

In [None]:
# Combine multiple signals:
# 1. Price momentum (trend-following)
# 2. Volume momentum (liquidity)
# 3. Volatility-adjusted (risk-aware)

# Signal 1: Price momentum (20-day)
price_mom = rank(ts_delta(log_prices, 20))

# Signal 2: Volume trend (positive = increasing interest)
vol_trend = rank(ts_delta(volume, 20))

# Signal 3: Inverse volatility (prefer stable stocks)
inv_vol = rank(reverse(volatility))

# Combine with equal weights
combined_signal = add(price_mom, vol_trend, inv_vol)
combined_alpha = zscore(combined_signal)
combined_weights = scale(combined_alpha, scale=1.0)

print("Combined multi-factor alpha weights:")
print(combined_weights.tail())

In [None]:
# Sector-neutral version
sector_neutral_alpha = group_neutralize(combined_alpha, groups)
sector_neutral_weights = scale(sector_neutral_alpha, scale=1.0)
print("Sector-neutral combined alpha:")
print(sector_neutral_weights.tail())

---
## 9. Caching & Cleanup

In [None]:
# Check cache statistics
stats = client.cache_stats()
print(f"Cached entries: {stats['entries']}")
print(f"Cache size: {stats['total_size_bytes'] / 1024 / 1024:.2f} MB")
print(f"Location: {stats['cache_dir']}")

In [None]:
# Cleanup
client.close()
print("Done!")

---
## Summary: All 68 Operators

### Time-Series (26)
| Operator | Description |
|----------|-------------|
| `ts_mean(x, d)` | Rolling mean |
| `ts_sum(x, d)` | Rolling sum |
| `ts_std(x, d)` | Rolling std |
| `ts_min(x, d)` | Rolling min |
| `ts_max(x, d)` | Rolling max |
| `ts_delta(x, d)` | Difference from d days ago |
| `ts_delay(x, d)` | Lag by d days |
| `ts_product(x, d)` | Rolling product |
| `ts_count_nans(x, d)` | Count nulls in window |
| `ts_zscore(x, d)` | Rolling z-score |
| `ts_scale(x, d)` | Rolling min-max scale |
| `ts_av_diff(x, d)` | Deviation from rolling mean |
| `ts_step(x)` | Row counter |
| `ts_arg_max(x, d)` | Days since window max |
| `ts_arg_min(x, d)` | Days since window min |
| `ts_backfill(x, d)` | Fill nulls with last valid |
| `kth_element(x, d, k)` | k-th element in lookback |
| `last_diff_value(x, d)` | Last different value |
| `days_from_last_change(x)` | Days since value changed |
| `hump(x, hump)` | Limit change magnitude |
| `ts_decay_linear(x, d)` | Linear decay weighted avg |
| `ts_rank(x, d)` | Percentile rank in window |
| `ts_corr(x, y, d)` | Rolling correlation |
| `ts_covariance(x, y, d)` | Rolling covariance |
| `ts_quantile(x, d)` | Rank to Gaussian transform |
| `ts_regression(y, x, d)` | Rolling OLS regression |

### Cross-Sectional (6)
| Operator | Description |
|----------|-------------|
| `rank(x)` | Rank to [0, 1] |
| `zscore(x)` | Standardize (mean=0, std=1) |
| `normalize(x)` | Demean |
| `scale(x, scale)` | Scale to target abs sum |
| `quantile(x)` | Rank + inverse CDF |
| `winsorize(x, std)` | Clip to mean +/- n*std |

### Arithmetic (15)
| Operator | Description |
|----------|-------------|
| `abs(x)` | Absolute value |
| `add(*args)` | Element-wise addition |
| `subtract(x, y)` | Element-wise subtraction |
| `multiply(*args)` | Element-wise multiplication |
| `divide(x, y)` | Safe division |
| `inverse(x)` | 1/x |
| `log(x)` | Natural log |
| `max(*args)` | Element-wise max |
| `min(*args)` | Element-wise min |
| `power(x, y)` | x^y |
| `signed_power(x, y)` | sign(x) * \|x\|^y |
| `sqrt(x)` | Square root |
| `sign(x)` | Sign function |
| `reverse(x)` | Negation |
| `densify(x)` | Remap to 0..n-1 |

### Logical (11)
| Operator | Description |
|----------|-------------|
| `and_(x, y)` | Logical AND |
| `or_(x, y)` | Logical OR |
| `not_(x)` | Logical NOT |
| `if_else(cond, then, else)` | Conditional |
| `is_nan(x)` | Detect NaN |
| `lt(x, y)` | Less than |
| `le(x, y)` | Less than or equal |
| `gt(x, y)` | Greater than |
| `ge(x, y)` | Greater than or equal |
| `eq(x, y)` | Equal |
| `ne(x, y)` | Not equal |

### Group (6)
| Operator | Description |
|----------|-------------|
| `group_rank(x, g)` | Rank within groups |
| `group_zscore(x, g)` | Z-score within groups |
| `group_scale(x, g)` | Scale within groups |
| `group_neutralize(x, g)` | Subtract group mean |
| `group_mean(x, w, g)` | Weighted group mean |
| `group_backfill(x, g, d)` | Fill NaN with group mean |

### Vector (2)
| Operator | Description |
|----------|-------------|
| `vec_avg(x)` | Mean of list elements |
| `vec_sum(x)` | Sum of list elements |

### Transformational (2)
| Operator | Description |
|----------|-------------|
| `bucket(x, ...)` | Discretize to buckets |
| `trade_when(t, a, e)` | Stateful entry/exit |