Skip to content

ElenYoung/QFeatureLib

Repository files navigation

QFeatureLib

PyPI version Python 3.10+ License: MIT Code style: black

English | 中文

QFeatureLib is a high-performance, production-grade feature engineering library for quantitative investment. It focuses on financial time series processing with strict handling of future function avoidance, computational efficiency, and rigorous sample splitting.

Key Features

  • Zero Future Function: All time-series operations use shift=1 by default to prevent data leakage. The library raises FutureFunctionError if you accidentally try to use future information.
  • High Performance: Pure NumPy implementation with vectorized operations, 10-100x faster than pandas.
  • Memory Efficient: Uses views instead of copies, supports in-place operations for large-scale panel data.
  • Quantitative Finance Focused: Specialized for financial scenarios - suspended stock handling, industry neutralization, market cap neutralization, etc.

Installation

pip install qfeaturelib

For development:

pip install qfeaturelib[dev]

Quick Start

import numpy as np
from qfeaturelib import PanelData
from qfeaturelib.standardization import rolling_zscore, cs_zscore
from qfeaturelib.splitting import RollingWindowSplitter

# Create panel data (T=100 days, N=50 stocks, F=5 features)
values = np.random.randn(100, 50, 5)
dates = np.arange(100)
tickers = [f'STOCK_{i:02d}' for i in range(50)]

panel = PanelData(values, dates, tickers)

# Time-series standardization (rolling Z-score with shift=1 to prevent leakage)
zscore_values = rolling_zscore(
    panel.values[..., 0],  # First feature
    window=20,
    shift=1,  # Use past 20 days only, excluding current moment
)

# Cross-sectional standardization (Z-score across all stocks each day)
cs_values = cs_zscore(panel.values[..., 0])

# Sample splitting for backtesting
splitter = RollingWindowSplitter(
    n_samples=100,
    train_ratio=0.6,
    val_ratio=0.2,
    test_ratio=0.2,
)

for split in splitter.split():
    train_data = zscore_values[split.train]
    val_data = zscore_values[split.val]
    test_data = zscore_values[split.test]
    # Train your model...

Core Modules

1. Time-Series Standardization

Operations along the time dimension with rolling windows:

from qfeaturelib.standardization import (
    rolling_zscore,      # Rolling Z-Score
    rolling_robust_zscore,  # Robust Z-Score using Median/MAD
    rolling_minmax,      # Rolling Min-Max scaling
)

# Parameters explained
result = rolling_zscore(
    data,
    window=20,      # Rolling window size
    shift=1,        # Window end offset (shift=1 excludes current moment)
    outlier_method="squash",  # Outlier handling: 'truncate' or 'squash'
    outlier_bounds=(0.01, 0.99),  # Quantile bounds for outliers
)

2. Cross-Sectional Standardization

Operations across all assets at each time point:

from qfeaturelib.standardization import (
    cs_zscore,           # Cross-sectional Z-Score
    cs_robust_zscore,    # Cross-sectional robust Z-Score
    cs_minmax,           # Cross-sectional Min-Max
    cs_rank,             # Cross-sectional rank (percentile)
)

# Support for group-wise operations
result = cs_zscore(data, groups=industry_labels)

3. Sample Splitting Engine

Time-series aware train/validation/test splitting:

from qfeaturelib.splitting import RollingWindowSplitter, ExpandingWindowSplitter

# Rolling window (fixed training size)
rolling_splitter = RollingWindowSplitter(
    n_samples=1000,
    train_ratio=0.6,
    val_ratio=0.2,
    test_ratio=0.2,
    step=100,  # Roll forward 100 samples each iteration
    gap=0,     # Gap between train/val/test to prevent leakage
)

# Expanding window (growing training size)
expanding_splitter = ExpandingWindowSplitter(
    n_samples=1000,
    train_ratio=0.6,
    val_ratio=0.2,
    test_ratio=0.2,
    step=50,   # Expand by 50 samples each iteration
)

# Use split.apply() to split multiple arrays consistently
for split in rolling_splitter.split():
    (X_train, X_val, X_test), (y_train, y_val, y_test) = split.apply([X, y])

4. Missing Value Imputation

from qfeaturelib.imputation import (
    ffill,          # Forward fill
    ffill_limit,    # Forward fill with limit (prevents stale data filling)
    cs_median_fill, # Cross-sectional median fill
    cs_mean_fill,   # Cross-sectional mean fill
)

# Forward fill with maximum 5 consecutive fills
result = ffill_limit(data, limit=5)

5. Feature Neutralization

Remove effects of control factors via regression residuals:

from qfeaturelib.neutralization import (
    neutralize,
    industry_neutralize,
    size_neutralize,
)

# Industry neutralization
neutralized = industry_neutralize(feature, industry_labels)

# Size (market cap) neutralization
neutralized = size_neutralize(feature, log_market_cap)

# Custom control factors
neutralized = neutralize(feature, control_factors, method="ols")

6. Macro Indicators

Special handling for macro-economic indicators without asset dimension:

from qfeaturelib import (
    macro_rolling_zscore,
    adapt_macro_to_panel,
)

# Direct standardization of 1D macro data
gdp_zscore = macro_rolling_zscore(gdp_growth, window=12, shift=1)

# Broadcast to panel format for combination with asset features
gdp_panel = adapt_macro_to_panel(gdp_growth, n_assets=50)  # (T,) -> (T, N)

Performance Benchmarks

On standard test data (T=5000, N=1000, F=50):

Operation Pandas QFeatureLib Speedup
Rolling Z-Score ~5s ~0.1s 50x
Cross-sectional Z-Score ~2s ~0.02s 100x
Rolling Rank ~10s ~0.5s 20x

Design Principles

  1. Safety First: Default shift=1 prevents accidental future function usage
  2. Vectorization: All core computations use NumPy vectorized operations
  3. Memory Efficiency: Return views instead of copies, support in-place operations
  4. Type Safety: Full type annotations, passes mypy strict mode

Related Projects

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Changelog

See CHANGELOG.md for version history and changes.

Support


Note: This library is part of a quantitative finance ecosystem. When implementing features, consider compatibility with downstream projects.

About

Feature-Engineering for Quant.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages