## Feature Selection
- Select any data set with high dimensions (such as Bostan dataset, breast cancer dataset) 
from any repository of data such as SK-Learn, UCI library, Kaggle dataset library etc. 
- Write a program to perform the following operations on the selected dataset and display 
the result.
1. Reduce dimensions using SelectKBest method
2. Reduce dimensions using SelectPercentile method
3. Reduce dimensions using PCA techniques

In [4]:
import pandas as pd
from sklearn.feature_selection import SelectKBest, SelectPercentile, chi2
import numpy as np

### i. Reduce dimensions using SelectKBest method
1. k_best = SelectKBest(score_func=chi2, k=10)
2. X_k_best = k_best.fit_transform(X_scaled, y)
3. selected_features_k_best = np.array(data.feature_names)[k_best.get_support()]


In [5]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load breast cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Get column names
column_names = data.feature_names

print("Column Names:")
for col in column_names:
    print(col)

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize Logistic Regression classifier
clf = LogisticRegression(max_iter=10000, random_state=42)

# Train and evaluate the model without dimensionality reduction
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
accuracy_no_reduction = accuracy_score(y_test, y_pred)
print("\nAccuracy without dimensionality reduction:", accuracy_no_reduction)

# Perform dimensionality reduction using SelectKBest
k = 10  # Number of features to select
selector = SelectKBest(score_func=f_classif, k=k)
X_train_reduced = selector.fit_transform(X_train, y_train)
X_test_reduced = selector.transform(X_test)

# Train and evaluate the model with dimensionality reduction
clf.fit(X_train_reduced, y_train)
y_pred_reduced = clf.predict(X_test_reduced)
accuracy_with_reduction = accuracy_score(y_test, y_pred_reduced)
print("\nAccuracy with dimensionality reduction:", accuracy_with_reduction)

# Display selected features
selected_features = np.array(data.feature_names)[selector.get_support()]
print("\nSelected Features:", selected_features)


Column Names:
mean radius
mean texture
mean perimeter
mean area
mean smoothness
mean compactness
mean concavity
mean concave points
mean symmetry
mean fractal dimension
radius error
texture error
perimeter error
area error
smoothness error
compactness error
concavity error
concave points error
symmetry error
fractal dimension error
worst radius
worst texture
worst perimeter
worst area
worst smoothness
worst compactness
worst concavity
worst concave points
worst symmetry
worst fractal dimension

Accuracy without dimensionality reduction: 0.956140350877193

Accuracy with dimensionality reduction: 0.9912280701754386

Selected Features: ['mean radius' 'mean perimeter' 'mean area' 'mean concavity'
 'mean concave points' 'worst radius' 'worst perimeter' 'worst area'
 'worst concavity' 'worst concave points']
