Feature selection is the process of choosing the most relevant features (columns) from your dataset that contribute the most to predicting the target variable.


Why and When to Use Feature Selection

Types of Feature Selection

Filter Methods (e.g., correlation)

Wrapper Methods (e.g., RFE)

Embedded Methods (e.g., feature importance from model)

Practical Code Using Feature Selection Techniques


 Tools:
SelectKBest with f_regression

RFE (Recursive Feature Elimination)

Feature Importance from:

DecisionTree

RandomForest

XGBoost

In [2]:
from sklearn.feature_selection import SelectKBest, f_regression
from sklearn.datasets import fetch_california_housing  # Use California housing dataset instead
from sklearn.model_selection import train_test_split
import pandas as pd


In [4]:
data = fetch_california_housing()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [6]:
#SelectKBest with f_regression
selector = SelectKBest(score_func=f_regression, k=5)
X_train_selected = selector.fit_transform(X_train, y_train)
X_test_selected = selector.transform(X_test)


In [7]:
selected_features = X.columns[selector.get_support()]
print("Top 5 selected features:", selected_features.tolist())

Top 5 selected features: ['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Latitude']
