### Conceptual Overview of Sequential Feature Selection

- SFS starts with an empty set of selected features in forward selection or the full set in backward selection.
- In each iteration, it considers all remaining candidate features for addition (forward) or removal (backward).
- It fits a model with the current selected features plus one candidate feature (forward) or minus one feature (backward).
- The feature that yields the best improvement in the chosen scoring metric on validation (development) data is permanently added or removed.
- The process continues until a predetermined number of features are selected or the improvement tolerance is reached.
- This approach drastically reduces the search space compared to exhaustive search. For example, selecting 4 features from 55 requires testing only 214 models versus over 341,000 combinations exhaustively.
- However, it is not guaranteed to find the global optimal subset because it makes the best greedy choice at each step.

### SFS Computational Example

Given 55 features and a goal to choose 4 features:

- The first feature is chosen by testing 55 single-feature models.

- The second feature by testing 54 models, each adding one different unselected feature.

- The third by testing 53 models.

- The fourth by testing 52 models.

- Total models fitted = 55 + 54 + 53 + 52 = 214.

- This is much more efficient than exhaustive search but can yield different feature sets on different data splits due to sampling variability and feature correlations.

### Stability and Variants

- SFS results can vary between runs because of data sampling and correlated features (e.g., horsepower and weight are often correlated in vehicle data).

- Variants include backward elimination and mixed strategies that add/remove features iteratively.

- Feature selection can also be guided not only by validation error but by statistical tests or other criteria.

- Preprocessing to remove highly correlated features can improve stability.

### Scikit-learn Implementation Details

- The class `SequentialFeatureSelector` supports forward or backward selection with parameters:

  - `estimator`: the predictive model to evaluate feature subsets.
  - `n_features_to_select`: how many features to select (or `'auto'` with tolerance).
  - `direction`: `'forward'` or `'backward'`.
  - `scoring`: metric to evaluate model performance (e.g., `'neg_mean_squared_error'` for regression).
  - `cv`: cross-validation folds.
  - `tol`: minimum improvement threshold to continue adding/removing features.

- After fitting, use `.get_support()` to get a boolean mask of selected features.

- Use `.transform()` to reduce data to selected features for further modeling or evaluation.



In [3]:
# Example Python Code: Forward Sequential Feature Selection

from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing
import numpy as np

# Load dataset
X, y = fetch_california_housing(return_X_y=True)

# Split data
X_train, X_dev, y_train, y_dev = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize model
model = LinearRegression()

# Setup sequential forward selection to select 4 features based on neg mean squared error
sfs = SequentialFeatureSelector(model, n_features_to_select=4, direction='forward', scoring='neg_mean_squared_error', cv=5)

# Fit SFS on training data
sfs.fit(X_train, y_train)

# Selected feature mask
selected_features = sfs.get_support()

# Transform data to selected features
X_train_selected = sfs.transform(X_train)
X_dev_selected = sfs.transform(X_dev)

# Fit model on selected features
model.fit(X_train_selected, y_train)

# Evaluate performance on dev set
mse_dev = np.mean((model.predict(X_dev_selected) - y_dev) ** 2)
print(f"Development set MSE with selected features: {mse_dev:.4f}")

Development set MSE with selected features: 0.5318


In [4]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error
from sklearn import set_config
set_config(display="diagram")

import pandas as pd

housing = fetch_california_housing()
X = pd.DataFrame(housing.data, columns=housing.feature_names)
y = housing.target

# Split data
X_train, X_dev, y_train, y_dev = train_test_split(X, y, test_size=0.3, random_state=42)


poly_features = PolynomialFeatures(degree = 3, include_bias=False)
X_train_poly = poly_features.fit_transform(X_train[['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude']])
X_test_poly = poly_features.transform(X_dev[['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude']])
columns = poly_features.get_feature_names_out()
train_df = pd.DataFrame(X_train_poly, columns=columns)
test_df = pd.DataFrame(X_test_poly, columns = columns)


pipe = Pipeline([('column_selector', sfs),
                ('linreg', LinearRegression())])
pipe.fit(train_df, y_train)
train_preds = pipe.predict(train_df)
test_preds = pipe.predict(test_df)
train_mse = mean_squared_error(y_train, train_preds)
test_mse = mean_squared_error(y_dev, test_preds)


# Answer check
print(f'Train MSE: {train_mse: .2f}')
print(f'Test MSE: {test_mse: .2f}')
pipe

Train MSE:  0.54
Test MSE:  0.53


### Summary

Sequential Feature Selection in scikit-learn offers an easy and computationally efficient way to select predictive features by iteratively evaluating models to add or remove features based on their contribution to a scoring metric. While faster than exhaustive search, SFS can produce different results across runs, especially when features are correlated. Understanding and implementing this method helps to build more generalizable and interpretable models with fewer features, as demonstrated by the Python examples above.

Sources: 

[1](https://www.geeksforgeeks.org/machine-learning/sequential-feature-selection/)
[2](https://www.yourdatateacher.com/2023/02/15/a-practical-introduction-to-sequential-feature-selection/)
[3](https://rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/)
[4](https://codefinity.com/blog/Sequential-Backward-and-Forward-Selection)
[5](https://scikit-learn.org/stable/modules/feature_selection.html)
[6](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SequentialFeatureSelector.html)
[7](https://www.reddit.com/r/learnmachinelearning/comments/qzy12t/how_do_you_choose_the_feature_selection/)
[8](https://scikit-learn.ru/example/model-based-and-sequential-feature-selection/)
[9](https://stackoverflow.com/questions/79528929/why-does-sequentialfeatureselector-return-at-most-n-features-in-1-predictor)
[10](https://github.com/scikit-learn/scikit-learn/blob/99bf3d8e4/sklearn/feature_selection/_sequential.py)