-
Notifications
You must be signed in to change notification settings - Fork 0
Feature Engineering
Giacomo Saccaggi edited this page Jun 19, 2026
·
1 revision
FeatureEngineer is a sklearn-compatible transformer that automates common feature engineering tasks.
| Transform | Flag | What it does |
|---|---|---|
| Log transform | log_transform=True |
Applies log1p to numeric features with skewness > threshold (non-negative only) |
| Interactions | interactions=True |
Creates col_a_x_col_b for top 10 numeric column pairs |
| Date extraction | date_features=True |
Extracts _year, _month, _day_of_week, _is_weekend, _quarter from datetime columns |
| Target encoding | target_encode=True |
Replaces high-cardinality categoricals (>threshold unique values) with mean(target) per category |
| Auto-binning | auto_bin=True |
Creates _bin columns with quantile-based bucket labels |
from scomp_link import FeatureEngineer
fe = FeatureEngineer(
interactions=True,
log_transform=True,
skew_threshold=1.0, # skewness threshold for log
date_features=True,
target_encode=True,
target_encode_threshold=10, # cardinality threshold
auto_bin=True,
n_bins=5,
)
# fit learns: which columns are skewed, date columns, high-cardinality cats, bin edges
fe.fit(X_train, y_train)
# transform applies all learned transformations
X_train_eng = fe.transform(X_train)
X_test_eng = fe.transform(X_test)
# Or in one step
X_eng = fe.fit_transform(X, y)scomp-link engineer --data raw.csv --target y \
--interactions --log-transform --date-features --target-encode --auto-bin \
--output engineered.parquet- The transformer is fitted once on training data and can be applied to test/production data
- Target encoding uses mean of target per category (with global mean as fallback for unseen categories)
- Interactions are capped at top 10 numeric columns to avoid feature explosion
- Compatible with sklearn
PipelineandColumnTransformer