# SIB - Portfolio of Machine Learning Algorithms

## Exercise 3: Implementing SelectPercentile

### 3.1) 
Add the SelectPercentile object to the feature_selection sub-package. You should create a module called "select_percentile.py" to implement this object. The SelectPercentile class has a similar architecture to the SelectKBest class. Consider the structure presented:  

- class SelectPercentile(Transformer):
    - parameters:
        - score_func – variance analysis function (f_classification by default)
        - percentile – percentile for selecting features
    - estimated parameters:
        - F – the F value for each feature estimated by the score_func
        - p – the p value for each feature estimated by the score_func
    - methods:
        - _fit – estimates the F and p values for each feature using the scoring_func; returns itself (self)
        - _transform – selects features with the highest F value up to the specified percentile. For example, for a dataset with 10 features and a percentile of 50% your transform should select the top 5 features with the highest F value. Returns the transformed Dataset object.

### 3.3) 
Test the SelectPercentile class in a Jupyter notebook using the "iris.csv" dataset (classification).

In [6]:
import sys
sys.path.append('C:/Users/dases/Desktop/SI/repositorio/si-2/src')

from si.io.csv_file import read_csv

# 'iris' is defined again:
iris = read_csv("../datasets/iris/iris.csv", features=True, label=True)

from si.feature_selection.select_percentile import SelectPercentile
from si.statistics.f_classification import f_classification

# Initialize SelectPercentile
percentile = 50
selector = SelectPercentile(score_func=f_classification, percentile=percentile)

# Fit the selector
selector.fit(iris)

# Transform the dataset
new_iris = selector.transform(iris)

# Display results
print("Original features:", iris.features)
print("Selected features:", new_iris.features)
print("Original shape:", iris.X.shape)
print("New shape:", new_iris.X.shape)


Original features: Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], dtype='object')
Selected features: ['petal_length', 'petal_width']
Original shape: (150, 4)
New shape: (150, 2)
