# Exercise 3: Test the SelectPercentile class 
 
## SIB - Intelligent Systems for Bioinformatics

BÃ¡rbara Freitas PG55693

In [2]:
import numpy as np
from si.io.csv_file import read_csv 
from si.feature_selection.select_percentile import SelectPercentile 

# --- 1. Load the Dataset ---
path_to_iris = '../datasets/iris/iris.csv' 
iris_dataset = read_csv(path_to_iris, sep=',', label=True)

print("--- Original Dataset ---")
print("Shape (samples, features):", iris_dataset.shape())
print("Original Features:", iris_dataset.features)

# --- 2. Initialize SelectPercentile ---
# Select 50% of features with the best scores. There are 4 features, 2 will be selected.

PERCENTILE_TO_SELECT = 50
selector = SelectPercentile(percentile=PERCENTILE_TO_SELECT)

# --- 3. Train the selector (Fit) ---
# This calculates the F-scores for all features.
selector.fit(iris_dataset)

print(f"\n--- Calculated F-Scores ({PERCENTILE_TO_SELECT}%) ---")
for feat, score in zip(iris_dataset.features, selector.F):
    print(f"  - {feat}: {score:.4f}")

# --- 4. Transform the Dataset ---
# This returns a new Dataset with only the 2 selected features.
iris_selected = selector.transform(iris_dataset)

print("\n--- Selection Result ---")
print("New Shape (samples, features):", iris_selected.shape())
print("Selected Features:", iris_selected.features)

--- Original Dataset ---
Shape (samples, features): (150, 4)
Original Features: ['feat_0', 'feat_1', 'feat_2', 'feat_3']

--- Calculated F-Scores (50%) ---
  - feat_0: 119.2645
  - feat_1: 47.3645
  - feat_2: 1179.0343
  - feat_3: 959.3244

--- Selection Result ---
New Shape (samples, features): (150, 2)
Selected Features: ['feat_2', 'feat_3']


The SelectPercentile class correctly performed feature selection on the Iris dataset. By setting the percentile to 50%, the model selected the two features (feat_2 and feat_3) that exhibited the highest F-scores (1179.0343 and 959.3244), confirming that the implementation successfully identifies and retains the most discriminative features for classification