Here is an activity demonstrating the application of Sequential Feature Selection (SFS), Ridge Regression, and LASSO Regression on a real dataset using scikit-learn pipelines. 

It includes the use of PolynomialFeatures, StandardScaler, SequentialFeatureSelector, Lasso, Ridge, SelectFromModel, and LinearRegression, then compares their performance and feature selections.

In [17]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_diabetes
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.feature_selection import SequentialFeatureSelector, SelectFromModel
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

In [18]:
# Load dataset (Boston housing)
data = load_diabetes()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

# Split data into train and dev sets
X_train, X_dev, y_train, y_dev = train_test_split(X, y, test_size=0.2, random_state=42)

In [19]:
# 1. Sequential Feature Selector + Linear Regression Pipeline
pipe_sfs = Pipeline([
    ('poly', PolynomialFeatures(degree=2, include_bias=False)),
    ('scaler', StandardScaler()),
    ('sfs', SequentialFeatureSelector(LinearRegression(), n_features_to_select=10, direction='forward')),
    ('lr', LinearRegression())
])
pipe_sfs.fit(X_train, y_train)
y_pred_sfs = pipe_sfs.predict(X_dev)
mse_sfs = mean_squared_error(y_dev, y_pred_sfs)

# After fitting pipe_sfs
poly_feature_names = pipe_sfs.named_steps['poly'].get_feature_names_out(input_features=X.columns)

# Get boolean mask of selected features on polynomial features
mask = pipe_sfs.named_steps['sfs'].get_support()

# Select feature names corresponding to the mask
selected_features_sfs = poly_feature_names[mask]

pipe_sfs

In [20]:
# 2. Ridge Regression Pipeline with PolynomialFeatures and Scaling
pipe_ridge = Pipeline([
    ('poly', PolynomialFeatures(degree=2, include_bias=False)),
    ('scaler', StandardScaler()),
    ('ridge', Ridge(alpha=1.0))
])
pipe_ridge.fit(X_train, y_train)
y_pred_ridge = pipe_ridge.predict(X_dev)
mse_ridge = mean_squared_error(y_dev, y_pred_ridge)
pipe_ridge

In [21]:
# 3. Lasso Regression Pipeline with PolynomialFeatures, Scaling, and SelectFromModel for feature selection
pipe_lasso = Pipeline([
    ('poly', PolynomialFeatures(degree=2, include_bias=False)),
    ('scaler', StandardScaler()),
    ('lasso', Lasso(alpha=0.1, max_iter=10000)),
])
pipe_lasso.fit(X_train, y_train)
y_pred_lasso = pipe_lasso.predict(X_dev)
mse_lasso = mean_squared_error(y_dev, y_pred_lasso)
pipe_lasso

In [22]:
# Using SelectFromModel with Lasso for explicit feature selection
selector = SelectFromModel(Lasso(alpha=0.1, max_iter=10000))
selector.fit(pipe_lasso.named_steps['poly'].fit_transform(X_train), y_train)
selected_features_mask = selector.get_support()

# Get polynomial feature names
poly_features = pipe_lasso.named_steps['poly'].get_feature_names_out(X.columns)
selected_features_lasso = poly_features[selected_features_mask]
selected_features_lasso

array(['sex', 'bmi', 'bp', 's1', 's3', 's5', 's6'], dtype=object)

In [23]:
# Display results
print("Sequential Feature Selection + Linear Regression:")
print(f"Development set MSE: {mse_sfs:.3f}")
print(f"Selected features (degree 2 poly expanded): {list(selected_features_sfs)}\n")

print("Ridge Regression:")
print(f"Development set MSE: {mse_ridge:.3f}\n")

print("LASSO Regression:")
print(f"Development set MSE: {mse_lasso:.3f}")
print(f"Number of selected features by LASSO: {len(selected_features_lasso)}")
print(f"Selected features by LASSO: {list(selected_features_lasso)}")

Sequential Feature Selection + Linear Regression:
Development set MSE: 2620.391
Selected features (degree 2 poly expanded): ['bmi', 'bp', 's3', 's5', 'age sex', 'sex^2', 'sex s4', 'bmi bp', 's3^2', 's3 s5']

Ridge Regression:
Development set MSE: 2883.363

LASSO Regression:
Development set MSE: 2761.385
Number of selected features by LASSO: 7
Selected features by LASSO: ['sex', 'bmi', 'bp', 's1', 's3', 's5', 's6']


### Explanation
- We use the Diabetes dataset as the use case.
- Polynomial expansion of degree 2 is applied to capture non-linearities.
- StandardScaler standardizes features to mean=0 and std=1 for better model stability.
- SequentialFeatureSelector (SFS) selects a fixed number of features (10 features here) based on forward selection and linear regression performance.
- Ridge regression is fit with polynomial and scaled features using a fixed alpha.
- Lasso regression is fit likewise; Lasso inherently performs feature selection by pushing some coefficients to zero.
- SelectFromModel wraps Lasso to explicitly extract which polynomial features Lasso retained.
- The models' mean squared errors on the dev set are compared alongside the selected features from SFS and Lasso.

This end-to-end example illustrates how different feature selection and regularization methods can be implemented and compared systematically using scikit-learn pipelines.

If you want to tune parameters like alpha or the number of selected features, you can integrate GridSearchCV for hyperparameter optimization. This example can be the basis for experimenting with those extensions.

Sources:

[1](https://www.geeksforgeeks.org/machine-learning/feature-selection-in-python-with-scikit-learn/)
[2](https://datasciencehorizons.com/overview-feature-selection-techniques-scikit-learn/)
[3](https://neptune.ai/blog/feature-selection-methods)
[4](https://scikit-learn.org/stable/modules/feature_selection.html)
[5](https://machinelearningmastery.com/feature-selection-in-python-with-scikit-learn/)
[6](https://scikit-learn.org/stable/auto_examples/feature_selection/index.html)
[7](https://www.shedloadofcode.com/blog/eight-ways-to-perform-feature-selection-with-scikit-learn/)
[8](https://www.reddit.com/r/learnmachinelearning/comments/qzy12t/how_do_you_choose_the_feature_selection/)
[9](https://stackoverflow.com/questions/25792012/feature-selection-using-scikit-learn)
[10](https://terracoil.com/feature-selection-with-scikit-learn-e030af1e28b7)