<a href="https://colab.research.google.com/github/MehrdadJalali-AI/Statistics-and-Machine-Learning/blob/main/Day6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Filter Methods: Using Correlation**
To select features based on their correlation with the target variable, you can use pandas and scipy to compute correlation and select the most relevant features.

In [18]:
from sklearn.feature_selection import mutual_info_classif
import pandas as pd

# Example data
data = pd.DataFrame({
    'Feature1': [10, 20, 30, 40, 50],
    'Feature2': [5, 10, 15, 20, 25],
    'Feature3': [15, 25, 35, 45, 55],
    'Target': [1, 0, 1, 0, 1]
})

# Calculate mutual information
X = data.drop(columns='Target')
y = data['Target']
mutual_info = mutual_info_classif(X, y)

# Show mutual information values
mi_df = pd.DataFrame({'Feature': X.columns, 'Mutual Information': mutual_info})
selected_features = mi_df[mi_df['Mutual Information'] > 0]['Feature'].tolist()

print("Mutual Information values:\n", mi_df)
print("Selected Features:", selected_features)






Mutual Information values:
     Feature  Mutual Information
0  Feature1                   0
1  Feature2                   0
2  Feature3                   0
Selected Features: []


**Wrapper Method (Recursive Feature Elimination)**
As shown earlier, Recursive Feature Elimination (RFE) can be used to find the best subset of features by evaluating model performance.

In [19]:
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
rfe = RFE(model, n_features_to_select=2)
rfe.fit(data.drop(columns='Target'), data['Target'])

selected_features = data.drop(columns='Target').columns[rfe.support_].tolist()
print("Selected Features with RFE:", selected_features)


Selected Features with RFE: ['Feature1', 'Feature3']


**Tree-Based Feature Importance**
Tree-based models like Random Forests and Decision Trees can provide feature importances, which can be useful for feature selection.

In [20]:
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()
model.fit(data.drop(columns='Target'), data['Target'])

feature_importances = model.feature_importances_
fi_df = pd.DataFrame({'Feature': data.drop(columns='Target').columns, 'Importance': feature_importances})
selected_features = fi_df[fi_df['Importance'] > 0]['Feature'].tolist()

print("Feature Importances:\n", fi_df)
print("Selected Features:", selected_features)


Feature Importances:
     Feature  Importance
0  Feature1    0.342643
1  Feature2    0.340965
2  Feature3    0.316392
Selected Features: ['Feature1', 'Feature2', 'Feature3']
