<h2><div style="font-family: Trebuchet MS; background-color: red; color: #FFFFFF; padding: 12px; line-height: 1.5;"> Filter Method</div> 

<h10><div style="font-family: Trebuchet MS; background-color:Black; color: #FFFFFF; padding: 15px; line-height: 1.;">Importing Nassary Liberarys 📈:</div></h10>

In [13]:

import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectKBest, f_classif, VarianceThreshold
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

    
<h10><div style="font-family: Trebuchet MS; background-color:Black; color: #FFFFFF; padding: 15px; line-height: 1.;">Loading seed dataset  📈:</div></h10>

In [14]:
# Set seed for reproducibility
seed = 42
np.random.seed(seed)

# Load the Breast Cancer dataset
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target

    
<h10><div style="font-family: Trebuchet MS; background-color:Black; color: #FFFFFF; padding: 15px; line-height: 1.;">Split the dataset into training and testing sets as a 20% as a Test and 80% as a Train  :</div></h10>

In [15]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=seed)

<h10><div style="font-family: Trebuchet MS; background-color:Black; color: #FFFFFF; padding: 15px; line-height: 1.;">Filter method with ANOVA</div></h10>

In [16]:
# Filter method with ANOVA
k_best_features = 10
anova_selector = SelectKBest(f_classif, k=k_best_features)
X_train_anova = anova_selector.fit_transform(X_train, y_train)
X_test_anova = anova_selector.transform(X_test)

<h10><div style="font-family: Trebuchet MS; background-color:Black; color: #FFFFFF; padding: 15px; line-height: 1.;">Filter method with Variance Threshold</div></h10>

In [17]:
# Filter method with Variance Threshold
variance_threshold_value = 0.01
variance_selector = VarianceThreshold(threshold=variance_threshold_value)
X_train_filtered = variance_selector.fit_transform(X_train_anova)
X_test_filtered = variance_selector.transform(X_test_anova)


<h10><div style="font-family: Trebuchet MS; background-color:Black; color: #FFFFFF; padding: 15px; line-height: 1.;">Function to train and evaluate a model</div></h10>

In [18]:
# Function to train and evaluate a model
def train_and_evaluate(X_train, X_test, y_train, y_test):
    model = RandomForestClassifier(random_state=seed)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    return accuracy

# Train and evaluate the model with ANOVA and Variance Threshold
accuracy_anova = train_and_evaluate(X_train_anova, X_test_anova, y_train, y_test)
accuracy_variance = train_and_evaluate(X_train_filtered, X_test_filtered, y_train, y_test)

# Get selected feature indices
selected_indices_variance = np.where(variance_selector.get_support())[0]

In [19]:
# Print results
print("\nFilter Method with ANOVA")
print(f"Number of Features Selected (ANOVA): {k_best_features}")
print(f"Selected Feature Indices (ANOVA): {np.where(anova_selector.get_support())[0]}")
print(f"Selected Feature Names (ANOVA): {cancer.feature_names[anova_selector.get_support()]}")
print(f"Accuracy (ANOVA): {accuracy_anova:.4f}")

print("\nFilter Method with Variance Threshold")
print(f"Number of Features Selected (Variance Threshold): {len(selected_indices_variance)}")
print(f"Selected Feature Indices (Variance Threshold): {selected_indices_variance}")
print(f"Selected Feature Names (Variance Threshold): {cancer.feature_names[selected_indices_variance]}")
print(f"Accuracy (Variance Threshold): {accuracy_variance:.4f}")



Filter Method with ANOVA
Number of Features Selected (ANOVA): 10
Selected Feature Indices (ANOVA): [ 0  2  3  6  7 20 22 23 26 27]
Selected Feature Names (ANOVA): ['mean radius' 'mean perimeter' 'mean area' 'mean concavity'
 'mean concave points' 'worst radius' 'worst perimeter' 'worst area'
 'worst concavity' 'worst concave points']
Accuracy (ANOVA): 0.9561

Filter Method with Variance Threshold
Number of Features Selected (Variance Threshold): 7
Selected Feature Indices (Variance Threshold): [0 1 2 5 6 7 8]
Selected Feature Names (Variance Threshold): ['mean radius' 'mean texture' 'mean perimeter' 'mean compactness'
 'mean concavity' 'mean concave points' 'mean symmetry']
Accuracy (Variance Threshold): 0.9737
