<h1 style=color:orange> Feature Selection method </h1>

### <code>1. Chi-squared</code>
### <code>2. Recursive Feature Elimination (RFE)</code>
### <code>3. LASSO (Least Absolute Shrinkage and Selection Operator):</code>

In [42]:
import pandas as pd
import numpy as np

from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

from sklearn.feature_selection import SelectKBest, chi2, RFE, SelectFromModel

In [43]:
np.random.seed(42) #values are still same, when we run this code unlimited times.

#create a dictonary 
data = {
    'Feature1': np.random.uniform(-10, 10, 100),
    'Feature2': np.random.randint(1,500,100),
    'Feature3': np.random.uniform(-20, 20, 100),
    'Feature4': np.random.uniform(1, 50, 100),
    'Feature5': np.random.uniform(10, 150, 100),
    'Target': np.random.randint(0, 2, 100)
}

df = pd.DataFrame(data) #dictonary convert into DataFrame
df.head()

Unnamed: 0,Feature1,Feature2,Feature3,Feature4,Feature5,Target
0,-2.509198,63,1.237383,14.155268,105.754928,1
1,9.014286,352,-2.088673,24.778714,96.219163,1
2,4.639879,231,2.115724,19.261656,142.144825,0
3,1.97317,241,3.707869,20.339882,142.195224,0
4,-6.879627,52,-16.765867,42.366444,131.407851,1


<h4>NOTE: </h4>

* Ensure positive values for Chi-squared(chi2)<br>
* By shifting negative values to positive

In [44]:
X = df.drop(columns=['Target'])
y = df['Target']

X_change = X.apply(lambda x: x - x.min() if x.min() < 0 else x) #negative value convet into positive 

In [45]:
X_change.head() #now all values are positive

Unnamed: 0,Feature1,Feature2,Feature3,Feature4,Feature5
0,7.38036,63,20.514369,14.155268,105.754928
1,18.903844,352,17.188312,24.778714,96.219163
2,14.529436,231,21.392709,19.261656,142.144825
3,11.862727,241,22.984854,20.339882,142.195224
4,3.00993,52,2.511119,42.366444,131.407851


<h2 style=color:green> SelectKBest with chi-squared </h2>

**Description:** This method computes the chi-squared statistic between each feature and the target variable.<br>

**Use Case:** It is commonly used for categorical data, particularly for classification problems where the features and target variable are both categorical.

In [46]:
sel_feature = SelectKBest(chi2, k=3) #k means numbers of feature we want.
sel_feature.fit(X_change, y) 

#Get the selected feature names
selected_features = X.columns[sel_feature.get_support(indices=True)] #get_support() ---> return boolean array [True,False,True]

# Create a DataFrame with the selected features
Update_X = df[selected_features]

print("Selected features using SelectKBest (chi2):")
pd.DataFrame(Update_X).head()

Selected features using SelectKBest (chi2):


Unnamed: 0,Feature1,Feature3,Feature4
0,-2.509198,1.237383,14.155268
1,9.014286,-2.088673,24.778714
2,4.639879,2.115724,19.261656
3,1.97317,3.707869,20.339882
4,-6.879627,-16.765867,42.366444


<h2 style=color:green> Recursive Feature Elimination (RFE) </h2>

**Description:** This method recursively fits a model and removes the least important features based on model coefficients or feature importance.<br>

**Use Case:** Works with any model that can provide feature importance or coefficients. Suitable for both regression and classification problems.

In [47]:
# Recursive Feature Elimination (RFE) using a Random Forest Classifier
RanFor = RandomForestClassifier(random_state=42) 
#logReg = LogisticRegression(solver='liblinear') #dataset is huge then use ((saga,sag,ibfgs replace to liblinear))

RanFor_Eli = RFE(estimator=RanFor) #n_features_to_select (not mendatory)
RanFor_Eli.fit(X, y)

#Get the selected feature names
selected_features = X.columns[RanFor_Eli.get_support(indices=True)] #get_support() ---> return boolean array [True,False,True]
                                                                    #if we want get_support() replace to upport_

# DataFrame with selected features using RFE
Update_X = df[selected_features]


# Display selected feature DataFrames
print("\nSelected features using RFE (RandomForestClassifier):")
pd.DataFrame(Update_X).head()


Selected features using RFE (RandomForestClassifier):


Unnamed: 0,Feature1,Feature3
0,-2.509198,1.237383
1,9.014286,-2.088673
2,4.639879,2.115724
3,1.97317,3.707869
4,-6.879627,-16.765867


<h2 style=color:green> LASSO (Least Absolute Shrinkage and Selection Operator) </h2>

**Description:** This method performs linear regression with L1 regularization,
            which can shrink some coefficients to zero, effectively performing feature selection.
        
**Use Case:** Suitable for both regression and classification problems. 
          Works well with continuous and categorical data (after encoding).

In [48]:
lasso = LogisticRegression(penalty='l1', solver='liblinear', C=0.1)
lasso.fit(X, y)
model = SelectFromModel(lasso, prefit=True)

# Selected features by Lasso
selected_features_lasso = X.columns[model.get_support(indices=True)]

print("Selected features by Lasso:")
pd.DataFrame(df[selected_features_lasso])

Selected features by Lasso:


Unnamed: 0,Feature1,Feature2,Feature3,Feature4,Feature5
0,-2.509198,63,1.237383,14.155268,105.754928
1,9.014286,352,-2.088673,24.778714,96.219163
2,4.639879,231,2.115724,19.261656,142.144825
3,1.973170,241,3.707869,20.339882,142.195224
4,-6.879627,52,-16.765867,42.366444,131.407851
...,...,...,...,...,...
95,-0.124088,225,2.891699,13.783788,59.030555
96,0.454657,385,19.213263,1.749922,109.145367
97,-1.449180,403,-16.986150,46.738379,77.433378
98,-9.491617,126,-7.772119,25.550954,62.918263
