# 🧠 SVM + Feature Selection + PCA

This pipeline performs classification using a Support Vector Machine (SVM) with a radial basis function (RBF) kernel. The workflow includes:
	1.	Standardization of the radiomics data using StandardScaler to ensure all features contribute equally.
	2.	Feature selection using SelectKBest with mutual_info_classif, a proxy for mRMR (Minimum Redundancy Maximum Relevance). This selects the top 50 most relevant features with respect to the target class.
	3.	Dimensionality reduction via Principal Component Analysis (PCA) to project the data onto 10 principal components, simplifying the feature space while preserving variance.
	4.	Classification with an SVM, a powerful algorithm for high-dimensional data, using stratified 5-fold cross-validation to evaluate its performance robustly.

In [1]:
import pandas as pd
from sklearn.feature_selection import SelectKBest, mutual_info_classif
from sklearn.decomposition import PCA
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score, StratifiedKFold
from sklearn.metrics import make_scorer, accuracy_score

# Load the dataset
df = pd.read_csv(r"../datasets/ACDC_radiomics.csv") 

# Separate features and target
X = df.drop(columns=["class"])
y = df["class"]

# Define the pipeline: scaling, feature selection, PCA, SVM
pipeline = Pipeline([
    ("scaler", StandardScaler()),
    ("select", SelectKBest(score_func=mutual_info_classif, k=50)),  # mRMR-like selection
    ("pca", PCA(n_components=10)),
    ("svm", SVC(kernel="rbf", class_weight="balanced"))
])

# Configure stratified K-Fold cross-validation
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# Run cross-validation using accuracy as the metric
scores = cross_val_score(pipeline, X, y, cv=cv, scoring=make_scorer(accuracy_score))

# Display results
print("Accuracy scores for each fold:", scores)
print("Mean accuracy:", scores.mean())
print("Standard deviation:", scores.std())

Accuracy scores for each fold: [0.85 0.85 0.8  0.9  0.9 ]
Mean accuracy: 0.86
Standard deviation: 0.03741657386773941


# 🧠 Neural Network (NN) + Feature Selection + PCA
This pipeline follows the same preprocessing structure as the SVM pipeline but replaces the classifier with a Neural Network (MLPClassifier):
	1.	Standardization is applied to center and scale the data.
	2.	Feature selection is done using mutual_info_classif to retain the 50 most informative features.
	3.	PCA is applied to reduce the selected features down to 10 components.
	4.	Classification is carried out using a Multi-layer Perceptron (MLP) with one hidden layer of 100 neurons. Like the SVM pipeline, it uses 5-fold stratified cross-validation to measure model performance.

In [2]:
import pandas as pd
from sklearn.feature_selection import SelectKBest, mutual_info_classif
from sklearn.decomposition import PCA
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score, StratifiedKFold
from sklearn.metrics import make_scorer, accuracy_score

# Load dataset
df = pd.read_csv(r"../datasets/ACDC_radiomics.csv") 

# Split features and target
X = df.drop(columns=["class"])
y = df["class"]

# Define pipeline: scaling, feature selection, PCA, and NN classifier
pipeline = Pipeline([
    ("scaler", StandardScaler()),
    ("select", SelectKBest(score_func=mutual_info_classif, k=50)),
    ("pca", PCA(n_components=10)),
    ("nn", MLPClassifier(hidden_layer_sizes=(100,), max_iter=1000, random_state=42))
])

# Stratified K-Fold CV setup
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# Cross-validation with accuracy
scores = cross_val_score(pipeline, X, y, cv=cv, scoring=make_scorer(accuracy_score))

# Results
print("Accuracy scores for each fold:", scores)
print("Mean accuracy:", scores.mean())
print("Standard deviation:", scores.std())

Accuracy scores for each fold: [0.9  0.85 0.95 0.9  0.75]
Mean accuracy: 0.8699999999999999
Standard deviation: 0.06782329983125268
