Skip to content

Feature Engineering

Giacomo Saccaggi edited this page Jun 19, 2026 · 1 revision

Feature Engineering

FeatureEngineer is a sklearn-compatible transformer that automates common feature engineering tasks.

Transformations

Transform Flag What it does
Log transform log_transform=True Applies log1p to numeric features with skewness > threshold (non-negative only)
Interactions interactions=True Creates col_a_x_col_b for top 10 numeric column pairs
Date extraction date_features=True Extracts _year, _month, _day_of_week, _is_weekend, _quarter from datetime columns
Target encoding target_encode=True Replaces high-cardinality categoricals (>threshold unique values) with mean(target) per category
Auto-binning auto_bin=True Creates _bin columns with quantile-based bucket labels

Usage

from scomp_link import FeatureEngineer

fe = FeatureEngineer(
    interactions=True,
    log_transform=True,
    skew_threshold=1.0,        # skewness threshold for log
    date_features=True,
    target_encode=True,
    target_encode_threshold=10, # cardinality threshold
    auto_bin=True,
    n_bins=5,
)

# fit learns: which columns are skewed, date columns, high-cardinality cats, bin edges
fe.fit(X_train, y_train)

# transform applies all learned transformations
X_train_eng = fe.transform(X_train)
X_test_eng = fe.transform(X_test)

# Or in one step
X_eng = fe.fit_transform(X, y)

CLI

scomp-link engineer --data raw.csv --target y \
  --interactions --log-transform --date-features --target-encode --auto-bin \
  --output engineered.parquet

Notes

  • The transformer is fitted once on training data and can be applied to test/production data
  • Target encoding uses mean of target per category (with global mean as fallback for unseen categories)
  • Interactions are capped at top 10 numeric columns to avoid feature explosion
  • Compatible with sklearn Pipeline and ColumnTransformer

Clone this wiki locally