Skip to content

fix(STEF-3054): exclude __-prefixed columns from feature_names#870

Merged
MvLieshout merged 3 commits into
release/v4.0.0from
fix/STEF-3054-exclude-internal-columns-from-feature-names
May 7, 2026
Merged

fix(STEF-3054): exclude __-prefixed columns from feature_names#870
MvLieshout merged 3 commits into
release/v4.0.0from
fix/STEF-3054-exclude-internal-columns-from-feature-names

Conversation

@MvLieshout
Copy link
Copy Markdown
Collaborator

Problem

The OutlierHandler emits sentinel columns (e.g. __outlier_nan_load_lag_P7D__) when it NaN's values. These columns were included in TimeSeriesDataset.feature_names, causing feature-aware transforms (Scaler) to fit on them during training. At predict time, when no outliers are detected, the sentinel columns are absent → sklearn's feature-name validation crashes:

ValueError: The feature names should match those that were passed during fit.
Feature names seen at fit time, yet now missing:
- __outlier_nan_load_lag_P7D__

Fix

TimeSeriesDataset now treats any column whose name starts with __ as an internal column (same mechanism as horizon/available_at columns). These are:

  • Kept in data so transforms can pass them through the pipeline
  • Excluded from feature_names so feature-aware transforms ignore them

This makes the sentinel column approach work correctly: they flow through the pipeline untouched, are consumed by restore_target at the end, and never interfere with the Scaler or model.

Changes

  • openstef-core/timeseries_dataset.py: Initialize _internal_columns with any __-prefixed columns; also apply in the non-versioned branch
  • test_outlier_handler.py: Assert sentinel columns are excluded from feature_names

TimeSeriesDataset now treats columns starting with __ as internal columns,
excluded from feature_names. This prevents feature-aware transforms (e.g.
Scaler) from fitting on sentinel columns emitted by OutlierHandler, fixing
a production crash where the Scaler expected sentinel columns at predict
time that were only present during training when outliers were detected.

Signed-off-by: Marnix van Lieshout <marnix.van.lieshout@alliander.com>
@MvLieshout MvLieshout requested a review from a team May 6, 2026 18:34
@github-actions github-actions Bot added the fix Something isn't working label May 6, 2026
Signed-off-by: Marnix van Lieshout <marnix.van.lieshout@alliander.com>
Copy link
Copy Markdown
Collaborator

@egordm egordm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 7, 2026

@MvLieshout MvLieshout merged commit 5a46ff8 into release/v4.0.0 May 7, 2026
4 checks passed
@MvLieshout MvLieshout deleted the fix/STEF-3054-exclude-internal-columns-from-feature-names branch May 7, 2026 07:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants