# Lab Assignment - SVM Experiments

1. Construct an SVM model using the data **voice.csv** under the following conditions,

a.Split the data using ratios of 70:30 and 80:20 for each model to be developed.

- Use a model with a linear kernel.
- Use a model with a polynomial kernel.
- Use a model with an RBF kernel.

b.Tabulate the performance of each split and kernel based on the accuracy metric.

2. Use the data from practical session 5 to develop a daytime and nighttime classification model using an SVM with an RBF kernel employing histogram features. Use an 80:20 ratio. You may experiment with hyperparameter tuning of the RBF kernel. Record the accuracy performance!

## Step 1 - Import libraries & load voice.csv

In [16]:
import os
import cv2
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.metrics import accuracy_score, classification_report

df_voice = pd.read_csv("voice.csv")
# Encode textual labels to integers
label_encoder = LabelEncoder()
df_voice['label'] = label_encoder.fit_transform(df_voice['label'])

# Split features / labels
X = df_voice.drop('label', axis=1)
y = df_voice['label']

# Scale features for SVM
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)


## Step 2 - Define SVM evaluation function

In [17]:
def evaluate_svm(X, y, test_size, kernel):
    # Stratified split keeps class proportions in train/test
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=test_size, random_state=42, stratify=y
    )
    # Create and train SVM with the requested kernel
    model = SVC(kernel=kernel, random_state=42)
    model.fit(X_train, y_train)
    # Predict on test set and compute accuracy
    y_pred = model.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    return acc


Step 3 - Run experiments for 70:30 & 80:20 splits with linear/poly/rbf kernels

In [18]:
# Run experiments for both split ratios and kernels
results = []
splits = [0.3, 0.2]  # 70:30 and 80:20
kernels = ['linear', 'poly', 'rbf']

for split in splits:
    for kernel in kernels:
        acc = evaluate_svm(X_scaled, y, split, kernel)
        results.append({
            'Split': f"{int((1-split)*100)}:{int(split*100)}",
            'Kernel': kernel,
            'Accuracy': acc
        })

df_results = pd.DataFrame(results)
print(df_results)

   Split  Kernel  Accuracy
0  70:30  linear  0.978970
1  70:30    poly  0.958991
2  70:30     rbf  0.983176
3  80:20  linear  0.974763
4  80:20    poly  0.957413
5  80:20     rbf  0.982650


## Step 4 - Template: Daytime vs Nighttime (histogram features) using SVM RBF + tuning (80:20)

In [19]:
train_dir = "images/training/"
test_dir = "images/test/"


In [20]:
def extract_hist_features(directory):
    features, labels = [], []
    for label in ['day', 'night']:
        folder = os.path.join(directory, label)
        # Skip if folder missing to avoid errors
        if not os.path.isdir(folder):
            continue
        for file in os.listdir(folder):
            path = os.path.join(folder, file)
            img = cv2.imread(path)
            if img is None:
                # skip unreadable files
                continue
            # Resize to a consistent size for histogram extraction
            img = cv2.resize(img, (128, 128))
            # Compute a 3D color histogram and normalize it
            hist = cv2.calcHist([img], [0, 1, 2], None, [8, 8, 8],
                                [0, 256, 0, 256, 0, 256])
            hist = cv2.normalize(hist, hist).flatten()
            features.append(hist)
            labels.append(label)
    return np.array(features), np.array(labels)

# Extract features from provided train/test image folders
X_train, y_train = extract_hist_features(train_dir)
X_test, y_test = extract_hist_features(test_dir)


In [21]:
# Scale image features before SVM
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Encode labels to integers for training
le = LabelEncoder()
y_train_enc = le.fit_transform(y_train)
y_test_enc = le.transform(y_test)

# Small grid for RBF hyperparameters to tune
param_grid = {'C': [0.1, 1, 10, 100],
              'gamma': ['scale', 0.1, 0.01, 0.001]}
# Run GridSearch with 5-fold CV to find best RBF parameters
grid = GridSearchCV(SVC(kernel='rbf', random_state=42),
                    param_grid, cv=5, scoring='accuracy')
grid.fit(X_train_scaled, y_train_enc)

# Evaluate best model on the held-out test set
best_model = grid.best_estimator_
y_pred = best_model.predict(X_test_scaled)

print("Best Parameters:", grid.best_params_)
print("Accuracy:", accuracy_score(y_test_enc, y_pred))
print("\nClassification Report:")
print(classification_report(y_test_enc, y_pred, target_names=le.classes_))


Best Parameters: {'C': 1, 'gamma': 0.001}
Accuracy: 0.925

Classification Report:
              precision    recall  f1-score   support

         day       0.96      0.89      0.92        80
       night       0.90      0.96      0.93        80

    accuracy                           0.93       160
   macro avg       0.93      0.93      0.92       160
weighted avg       0.93      0.93      0.92       160

