<a href="https://colab.research.google.com/github/AbrahamOtero/MLiB/blob/main/3_FeatureSelection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Feature selection

##Set up

We import the libraries that we are going to need

In [None]:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

We will first import the iris data set

In [None]:
url = 'https://raw.githubusercontent.com/AbrahamOtero/MLiB/main/datasets/iris.csv'

iris = pd.read_csv(url)

To implement different filtering method strategies, **SelectKBest** can be used, which selects the number of attributes that we indicate in its constructor (parameter k) based on some slag function. In this case, the chi-square function will be used. If the score_func used is **'mutual_info_classif'** will use the information gain criterion. In the case where the class is metric, the **'f_regression'** criterion can be used

In [None]:

from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2

#X will be the matrix with the features that we are going to evaluate and y the class
X = iris.drop('class', axis=1)
y = iris['class']

# Apply SelectKBest with chi2 to select the 2 best attributes
best_features = SelectKBest(score_func=chi2, k=2)
fit = best_features.fit(X, y)

# Obtener los índices de los atributos seleccionados
feature_indices = fit.get_support(indices=True)

# Print the names of the selected attributes
print(X.columns[feature_indices])


Index(['petal.length', 'petal.width'], dtype='object')


To carry out feature selection based on model wrappers, we can use the **RFE** (Recursive feature elimination)class, to which we must pass the model we want to use for the selection. In the example below, the model will be a decision tree.

In [None]:
from sklearn.feature_selection import RFE
from sklearn.tree import DecisionTreeClassifier

# The model we will use will be a decision tree
estimator = DecisionTreeClassifier()

# Create RFE object to select 2 attributes based on decision tree
selector = RFE(estimator, n_features_to_select=2)

# Fitting the RFE object to the data
selector = selector.fit(X, y)

# Get the indexes of the selected attributes
feature_indices = selector.get_support(indices=True)

# Print the names of the selected attributes
print(X.columns[feature_indices])
print(selector.support_)
print(selector.ranking_)


Index(['petal.length', 'petal.width'], dtype='object')
[False False  True  True]
[2 3 1 1]


RFE starts from all attributes and tries to eliminate them. If we want to use the opposite strategy (start from a set of attributes and add them) we can use **SequentialFeatureSelector**. The following example applies this strategy, also using a decision tree.

In [None]:
from sklearn.feature_selection import SequentialFeatureSelector

# The model we will use will be a decision tree
estimator = DecisionTreeClassifier()

# Create SFS object to select 2 attributes based on decision tree
sfs = SequentialFeatureSelector(estimator, n_features_to_select=2)

# Fitting the SFS object to the data
sfs = sfs.fit(X, y)

# Get the indexes of the selected attributes
feature_indices = sfs.get_support(indices=True)

# Print the names of the selected attributes
print(X.columns[feature_indices])


Index(['petal.length', 'petal.width'], dtype='object')
