QFeatureLib is a high-performance, production-grade feature engineering library for quantitative investment. It focuses on financial time series processing with strict handling of future function avoidance, computational efficiency, and rigorous sample splitting.
- Zero Future Function: All time-series operations use
shift=1by default to prevent data leakage. The library raisesFutureFunctionErrorif you accidentally try to use future information. - High Performance: Pure NumPy implementation with vectorized operations, 10-100x faster than pandas.
- Memory Efficient: Uses views instead of copies, supports in-place operations for large-scale panel data.
- Quantitative Finance Focused: Specialized for financial scenarios - suspended stock handling, industry neutralization, market cap neutralization, etc.
pip install qfeaturelibFor development:
pip install qfeaturelib[dev]import numpy as np
from qfeaturelib import PanelData
from qfeaturelib.standardization import rolling_zscore, cs_zscore
from qfeaturelib.splitting import RollingWindowSplitter
# Create panel data (T=100 days, N=50 stocks, F=5 features)
values = np.random.randn(100, 50, 5)
dates = np.arange(100)
tickers = [f'STOCK_{i:02d}' for i in range(50)]
panel = PanelData(values, dates, tickers)
# Time-series standardization (rolling Z-score with shift=1 to prevent leakage)
zscore_values = rolling_zscore(
panel.values[..., 0], # First feature
window=20,
shift=1, # Use past 20 days only, excluding current moment
)
# Cross-sectional standardization (Z-score across all stocks each day)
cs_values = cs_zscore(panel.values[..., 0])
# Sample splitting for backtesting
splitter = RollingWindowSplitter(
n_samples=100,
train_ratio=0.6,
val_ratio=0.2,
test_ratio=0.2,
)
for split in splitter.split():
train_data = zscore_values[split.train]
val_data = zscore_values[split.val]
test_data = zscore_values[split.test]
# Train your model...Operations along the time dimension with rolling windows:
from qfeaturelib.standardization import (
rolling_zscore, # Rolling Z-Score
rolling_robust_zscore, # Robust Z-Score using Median/MAD
rolling_minmax, # Rolling Min-Max scaling
)
# Parameters explained
result = rolling_zscore(
data,
window=20, # Rolling window size
shift=1, # Window end offset (shift=1 excludes current moment)
outlier_method="squash", # Outlier handling: 'truncate' or 'squash'
outlier_bounds=(0.01, 0.99), # Quantile bounds for outliers
)Operations across all assets at each time point:
from qfeaturelib.standardization import (
cs_zscore, # Cross-sectional Z-Score
cs_robust_zscore, # Cross-sectional robust Z-Score
cs_minmax, # Cross-sectional Min-Max
cs_rank, # Cross-sectional rank (percentile)
)
# Support for group-wise operations
result = cs_zscore(data, groups=industry_labels)Time-series aware train/validation/test splitting:
from qfeaturelib.splitting import RollingWindowSplitter, ExpandingWindowSplitter
# Rolling window (fixed training size)
rolling_splitter = RollingWindowSplitter(
n_samples=1000,
train_ratio=0.6,
val_ratio=0.2,
test_ratio=0.2,
step=100, # Roll forward 100 samples each iteration
gap=0, # Gap between train/val/test to prevent leakage
)
# Expanding window (growing training size)
expanding_splitter = ExpandingWindowSplitter(
n_samples=1000,
train_ratio=0.6,
val_ratio=0.2,
test_ratio=0.2,
step=50, # Expand by 50 samples each iteration
)
# Use split.apply() to split multiple arrays consistently
for split in rolling_splitter.split():
(X_train, X_val, X_test), (y_train, y_val, y_test) = split.apply([X, y])from qfeaturelib.imputation import (
ffill, # Forward fill
ffill_limit, # Forward fill with limit (prevents stale data filling)
cs_median_fill, # Cross-sectional median fill
cs_mean_fill, # Cross-sectional mean fill
)
# Forward fill with maximum 5 consecutive fills
result = ffill_limit(data, limit=5)Remove effects of control factors via regression residuals:
from qfeaturelib.neutralization import (
neutralize,
industry_neutralize,
size_neutralize,
)
# Industry neutralization
neutralized = industry_neutralize(feature, industry_labels)
# Size (market cap) neutralization
neutralized = size_neutralize(feature, log_market_cap)
# Custom control factors
neutralized = neutralize(feature, control_factors, method="ols")Special handling for macro-economic indicators without asset dimension:
from qfeaturelib import (
macro_rolling_zscore,
adapt_macro_to_panel,
)
# Direct standardization of 1D macro data
gdp_zscore = macro_rolling_zscore(gdp_growth, window=12, shift=1)
# Broadcast to panel format for combination with asset features
gdp_panel = adapt_macro_to_panel(gdp_growth, n_assets=50) # (T,) -> (T, N)On standard test data (T=5000, N=1000, F=50):
| Operation | Pandas | QFeatureLib | Speedup |
|---|---|---|---|
| Rolling Z-Score | ~5s | ~0.1s | 50x |
| Cross-sectional Z-Score | ~2s | ~0.02s | 100x |
| Rolling Rank | ~10s | ~0.5s | 20x |
- Safety First: Default
shift=1prevents accidental future function usage - Vectorization: All core computations use NumPy vectorized operations
- Memory Efficiency: Return views instead of copies, support in-place operations
- Type Safety: Full type annotations, passes mypy strict mode
- AssetPanelForest - Supervised clustering for panel data
- MASFactorMiner - Factor mining and analysis
- GeneralBacktest - Backtesting framework
MIT License - see LICENSE file for details.
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
See CHANGELOG.md for version history and changes.
- GitHub Issues: https://github.com/ElenYoung/QFeatureLib/issues
- Documentation: https://github.com/ElenYoung/QFeatureLib#readme
Note: This library is part of a quantitative finance ecosystem. When implementing features, consider compatibility with downstream projects.