Filter Method:
The filter method in feature selection involves selecting features based on their statistical properties without involving any machine learning model. It evaluates the relevance of features using statistical measures and selects the most relevant ones.

How it Works:
Compute Statistical Metrics: For each feature, compute a statistical metric that measures the relationship between the feature and the target variable. Common metrics include correlation coefficients, mutual information, Chi-square scores, and ANOVA F-values.
Rank Features: Rank the features based on their scores from the statistical metric.
Select Features: Choose the top-ranked features according to a predetermined threshold or a desired number of features.

In [1]:
import pandas as pd
from sklearn.feature_selection import SelectKBest, f_classif

X = pd.DataFrame({
    'feature1': [1, 2, 3, 4, 5],
    'feature2': [10, 20, 30, 40, 50],
    'feature3': [5, 4, 3, 2, 1]
})
y = [0, 1, 0, 1, 0]

selector = SelectKBest(score_func=f_classif, k=2)
X_new = selector.fit_transform(X, y)

print(X_new)


[[10  5]
 [20  4]
 [30  3]
 [40  2]
 [50  1]]


Wrapper Method:
The wrapper method involves selecting features based on their performance in a specific machine learning model. It evaluates subsets of features by training and testing a model on them and selecting the subset that yields the best performance.

Differences:
Model Involvement: The filter method is model-agnostic and purely statistical, while the wrapper method involves training a machine learning model.
Evaluation: The filter method uses statistical measures to evaluate features, whereas the wrapper method evaluates the performance of feature subsets using model accuracy, precision, recall, or other relevant metrics.
Computational Cost: The wrapper method is typically more computationally intensive than the filter method because it involves training and evaluating multiple models.

In [2]:
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression

X = pd.DataFrame({
    'feature1': [1, 2, 3, 4, 5],
    'feature2': [10, 20, 30, 40, 50],
    'feature3': [5, 4, 3, 2, 1]
})
y = [0, 1, 0, 1, 0]

model = LogisticRegression()
rfe = RFE(model, n_features_to_select=2)
X_new = rfe.fit_transform(X, y)

print(X_new)

[[10  5]
 [20  4]
 [30  3]
 [40  2]
 [50  1]]


Embedded Feature Selection Methods:
Embedded methods perform feature selection during the model training process and are integrated with specific learning algorithms.

Common Techniques:
Lasso Regression (L1 Regularization): Introduces a penalty equal to the absolute value of the magnitude of coefficients, effectively shrinking some coefficients to zero.
Ridge Regression (L2 Regularization): Adds a penalty equal to the square of the magnitude of coefficients but does not set any coefficients to zero.
Tree-Based Methods (e.g., Random Forest, Gradient Boosting): Use feature importance scores based on how much each feature improves the purity of the splits.
Elastic Net: Combines L1 and L2 regularization

Drawbacks:
Ignores Feature Interactions: The filter method evaluates each feature independently, ignoring potential interactions between features.
Model-Agnostic: Since it does not involve a specific machine learning model, it may not select the features that work best for a particular model.
Simplicity: The simplicity of the statistical measures used might lead to suboptimal feature selection compared to more complex, model-based methods.
Potential for Overlooked Features: Relevant features that do not show strong individual correlation with the target variable might be overlooked.

Situations for Using Filter Method:
Large Datasets: When dealing with large datasets where computational efficiency is a priority, the filter method is faster and less computationally intensive.
Initial Feature Reduction: As a first step to quickly reduce the dimensionality before applying more sophisticated feature selection methods.
High-Dimensional Data: In cases of high-dimensional data (many features), where the computational cost of wrapper methods would be prohibitive.
Model-Agnostic Requirement: When you need a method that does not depend on a specific machine learning model and provides a general feature selection.
Exploratory Data Analysis: For preliminary analysis to understand the relationships between features and the target variable without training multiple models.

In [3]:
import pandas as pd
from sklearn.feature_selection import RFE
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

data = {
    'size': [1500, 1800, 2400, 3000, 3500],
    'location': [5, 3, 4, 2, 1],  
    'age': [10, 5, 15, 20, 5],
    'price': [300000, 350000, 400000, 450000, 500000]
}
df = pd.DataFrame(data)

X = df.drop('price', axis=1)
y = df['price']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LinearRegression()
selector = RFE(estimator=model, n_features_to_select=2)
selector = selector.fit(X_train, y_train)
selected_features = X_train.columns[selector.support_]
print("Selected features:", selected_features)

model.fit(X_train[selected_features], y_train)

y_pred_train = model.predict(X_train[selected_features])
y_pred_test = model.predict(X_test[selected_features])

train_mse = mean_squared_error(y_train, y_pred_train)
test_mse = mean_squared_error(y_test, y_pred_test)

print("Train MSE:", train_mse)
print("Test MSE:", test_mse)

Selected features: Index(['size', 'location'], dtype='object')
Train MSE: 8.470329472543003e-22
Test MSE: 1406250000.0
