# Elaheh Beheshti
Feature Combinations

For a dataset with 4 features, possible feature combinations are:

Single Features:

{Feature 1} | {Feature 2} | {Feature 3} | {Feature 4}


Pairs of Features:

{Feature 1, Feature 2}  |  {Feature 1, Feature 3}  |  {Feature 1, Feature 4}

{Feature 2, Feature 3}  |  {Feature 2, Feature 4}  |  {Feature 3, Feature 4}

Triplets of Features:

{Feature 1, Feature 2, Feature 3}  |  {Feature 1, Feature 2, Feature 4}

{Feature 1, Feature 3, Feature 4}  |  {Feature 2, Feature 3, Feature 4}

All Four Features:

{Feature 1, Feature 2, Feature 3, Feature 4}

In total, there are 15 different feature subsets.

SVM Kernels

We’ll be testing 3 different SVM kernels:

Linear Kernel (linear): Suitable for linearly separable data.

Radial Basis Function Kernel (rbf): Captures more complex relationships using non-linear mapping.

Polynomial Kernel (poly): Applies polynomial transformations to the data, useful for specific geometric relationships.


C Values

C is the regularization parameter that controls the trade-off between maximizing the margin and minimizing classification error. Common values to try are:

0.1

1

10

100

Gamma Values

Gamma defines the influence of a single training example. Low values mean 'far' influence, and high values mean 'close' influence. Common values to test are:

0.01

0.1

1

10

Total Combinations

For each feature subset, we’re testing:

3 kernels (linear, rbf, poly)

4 C values (0.1, 1, 10, 100)

4 gamma values (0.01, 0.1, 1, 10)


Thus, for each feature subset, we’re trying:

3(kernels)×4(C values)×4(gamma values)=48combinations

Since there are 15 different feature subsets, the total number of trials is:

15(feature combinations)×48(parameter combinations)=720trials

Finding the Best Configuration

For each trial, you’re evaluating the accuracy of the model. After evaluating all 720 combinations, the one with the highest accuracy will be 

reported as the best configuration. This configuration includes:

Selected Feature Subset (e.g., {Feature 1, Feature 2, Feature 4})

Kernel (e.g., rbf)

C value (e.g., 10)

Gamma value (e.g., 0.1)

# Import Libraries

In [26]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
from itertools import combinations


# Load and Prepare the Dataset

In [11]:
columns= ['Variance', 'Skewness', 'Kurtosis', 'Entropy', 'Class']
df=pd.read_csv('/Users/elahehbeheshti/Desktop/Fall2024/Machine Learning/Algorithm/SVM/data_banknote_authentication.txt', delimiter=",", names=columns)
print(df.head(6))

   Variance  Skewness  Kurtosis  Entropy  Class
0   3.62160    8.6661   -2.8073 -0.44699      0
1   4.54590    8.1674   -2.4586 -1.46210      0
2   3.86600   -2.6383    1.9242  0.10645      0
3   3.45660    9.5228   -4.0112 -3.59440      0
4   0.32924   -4.4552    4.5718 -0.98880      0
5   4.36840    9.6718   -3.9606 -3.16250      0


In [12]:
# Split features and target
X = df.drop('Class', axis=1)
y = df['Class']


# Standardize the Data

In [13]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)


# Defining Feature Combinations

Generate various feature combinations to try with the SVM

In [27]:
from itertools import combinations

# Generate all feature combinations
num_features = X.shape[1]  # Number of features in the dataset
feature_combinations = []
for i in range(1, num_features + 1):
    feature_combinations.extend(list(combinations(range(num_features), i)))


#  Defining SVM Parameters

In [28]:
# SVM kernels and hyperparameters
kernels = ['linear', 'rbf', 'poly']
C_values = [0.1, 1, 10, 100]
gamma_values = [0.01, 0.1, 1, 10]


# Creating the Evaluation Function

Create a function that evaluates the SVM model for a given set of features and parameters:

In [29]:
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

def evaluate_svm(X, y, feature_indices, kernel, C, gamma):
    """
    Train and evaluate an SVM model with a given set of features and parameters.
    
    Args:
    - X (numpy array): Feature matrix.
    - y (numpy array): Target vector.
    - feature_indices (tuple): Indices of features to include.
    - kernel (str): SVM kernel type.
    - C (float): Regularization parameter.
    - gamma (float): Kernel coefficient.
    
    Returns:
    - float: Accuracy score on the test data.
    """
    # Subset the features based on selected indices
    X_subset = X[:, feature_indices]
    
    # Split into training and test sets
    X_train, X_test, y_train, y_test = train_test_split(X_subset, y, test_size=0.3, random_state=42)
    
    # Train the SVM model
    model = SVC(kernel=kernel, C=C, gamma=gamma)
    model.fit(X_train, y_train)
    
    # Predict on the test set
    y_pred = model.predict(X_test)
    
    # Return the accuracy score
    return accuracy_score(y_test, y_pred)


# Running the SVM with Different Feature Combinations and Parameters

In [30]:
# Track the best configuration
best_accuracy = 0
best_config = None

# Try all combinations
for feature_indices in feature_combinations:
    for kernel in kernels:
        for C in C_values:
            for gamma in gamma_values:
                # Calculate the accuracy for the given configuration
                accuracy = evaluate_svm(X_scaled, y.values, feature_indices, kernel, C, gamma)
                
                # Store the best configuration
                if accuracy > best_accuracy:
                    best_accuracy = accuracy
                    best_config = (feature_indices, kernel, C, gamma)

                # Print the configuration and its accuracy
                print(f"Features: {feature_indices}, Kernel: {kernel}, C: {C}, Gamma: {gamma}, Accuracy: {accuracy:.4f}")


Features: (0,), Kernel: linear, C: 0.1, Gamma: 0.01, Accuracy: 0.8252
Features: (0,), Kernel: linear, C: 0.1, Gamma: 0.1, Accuracy: 0.8252
Features: (0,), Kernel: linear, C: 0.1, Gamma: 1, Accuracy: 0.8252
Features: (0,), Kernel: linear, C: 0.1, Gamma: 10, Accuracy: 0.8252
Features: (0,), Kernel: linear, C: 1, Gamma: 0.01, Accuracy: 0.8252
Features: (0,), Kernel: linear, C: 1, Gamma: 0.1, Accuracy: 0.8252
Features: (0,), Kernel: linear, C: 1, Gamma: 1, Accuracy: 0.8252
Features: (0,), Kernel: linear, C: 1, Gamma: 10, Accuracy: 0.8252
Features: (0,), Kernel: linear, C: 10, Gamma: 0.01, Accuracy: 0.8228
Features: (0,), Kernel: linear, C: 10, Gamma: 0.1, Accuracy: 0.8228
Features: (0,), Kernel: linear, C: 10, Gamma: 1, Accuracy: 0.8228
Features: (0,), Kernel: linear, C: 10, Gamma: 10, Accuracy: 0.8228
Features: (0,), Kernel: linear, C: 100, Gamma: 0.01, Accuracy: 0.8228
Features: (0,), Kernel: linear, C: 100, Gamma: 0.1, Accuracy: 0.8228
Features: (0,), Kernel: linear, C: 100, Gamma: 1, A

In [31]:
print("\nBest Configuration:")
print(f"Features: {best_config[0]}, Kernel: {best_config[1]}, C: {best_config[2]}, Gamma: {best_config[3]}")
print(f"Best Accuracy: {best_accuracy:.4f}")



Best Configuration:
Features: (0, 1, 2), Kernel: rbf, C: 1, Gamma: 1
Best Accuracy: 1.0000
