Lab Assignment 6 : SVM

Machine Learning (UML501)

KRISH KHAJURIA(102317023)

1) The Iris dataset is a classic example for demonstrating classification algorithms. It consists of 150 samples of iris flowers belonging to three species: Setosa, Versicolor, and Virginica, with four input features (sepal and petal length/width). Use SVC from sklearn.svm on the Iris dataset and follow the steps below:
a) Load the dataset and perform train–test split (80:20).
b) Train three different SVM models using the following kernels:
Linear, Polynomial (degree=3), RBF
c) Evaluate each model using:
• Accuracy
• Precision
• Recall
• F1-Score
d) Display the confusion matrix for each kernel.
e) Identify which kernel performs the best and why.

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
import pandas as pd

# a) load and split (80:20)
iris = load_iris()
X = iris.data
y = iris.target

X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.2, random_state=1, stratify=y)

# helper to train and evaluate
def fit_check(kern_name, model):
    model.fit(X_tr, y_tr)
    y_pr = model.predict(X_te)

    acc = accuracy_score(y_te, y_pr)
    pre = precision_score(y_te, y_pr, average="macro")
    rec = recall_score(y_te, y_pr, average="macro")
    f1  = f1_score(y_te, y_pr, average="macro")
    cm  = confusion_matrix(y_te, y_pr)

    print(f"\nKernel: {kern_name}")
    print("Accuracy :", acc)
    print("Precision:", pre)
    print("Recall   :", rec)
    print("F1-score :", f1)
    print("Confusion matrix:\n", cm)

    return acc, pre, rec, f1

# b) 3 SVM models
svc_lin = SVC(kernel="linear", random_state=1)
svc_poly = SVC(kernel="poly", degree=3, random_state=1)
svc_rbf = SVC(kernel="rbf", random_state=1)

res = {}
res["linear"] = fit_check("linear", svc_lin)
res["poly3"] = fit_check("poly (3)", svc_poly)
res["rbf"]   = fit_check("rbf", svc_rbf)

# put scores in a small table
cols = ["acc", "prec", "rec", "f1"]
df_res = pd.DataFrame(res, index=cols).T
print("\nScores table:\n", df_res)



Kernel: linear
Accuracy : 0.9666666666666667
Precision: 0.9696969696969697
Recall   : 0.9666666666666667
F1-score : 0.9665831244778613
Confusion matrix:
 [[10  0  0]
 [ 0 10  0]
 [ 0  1  9]]

Kernel: poly (3)
Accuracy : 0.9666666666666667
Precision: 0.9696969696969697
Recall   : 0.9666666666666667
F1-score : 0.9665831244778613
Confusion matrix:
 [[10  0  0]
 [ 0 10  0]
 [ 0  1  9]]

Kernel: rbf
Accuracy : 0.9666666666666667
Precision: 0.9696969696969697
Recall   : 0.9666666666666667
F1-score : 0.9665831244778613
Confusion matrix:
 [[10  0  0]
 [ 0 10  0]
 [ 0  1  9]]

Scores table:
              acc      prec       rec        f1
linear  0.966667  0.969697  0.966667  0.966583
poly3   0.966667  0.969697  0.966667  0.966583
rbf     0.966667  0.969697  0.966667  0.966583


2) SVM models are highly sensitive to the scale of input features. When features have different ranges, the algorithm may incorrectly assign higher importance to variables with larger magnitudes, affecting the placement of the separating hyperplane. Feature scaling ensures that all attributes contribute equally to distance-based computations, which is especially crucial for kernels like RBF or polynomial.
A) Use the Breast Cancer dataset from sklearn.datasets.load_breast_cancer.
B) Train an SVM (RBF kernel) model with and without feature scaling (StandardScaler). Compare both results using:
• Training accuracy
• Testing accuracy
C) Discuss the effect of feature scaling on SVM performance.

In [None]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score

# load data
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target

X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.2, random_state=1, stratify=y)

# model 1: RBF SVM WITHOUT scaling
svm_raw = SVC(kernel="rbf", gamma="scale", random_state=1)
svm_raw.fit(X_tr, y_tr)

y_tr_raw = svm_raw.predict(X_tr)
y_te_raw = svm_raw.predict(X_te)

acc_tr_raw = accuracy_score(y_tr, y_tr_raw)
acc_te_raw = accuracy_score(y_te, y_te_raw)

print("Without scaling - train acc:", acc_tr_raw)
print("Without scaling - test  acc:", acc_te_raw)

# model 2: RBF SVM WITH StandardScaler
svm_scaled = Pipeline([
    ("sc", StandardScaler()),
    ("svm", SVC(kernel="rbf", gamma="scale", random_state=1))
])

svm_scaled.fit(X_tr, y_tr)

y_tr_sc = svm_scaled.predict(X_tr)
y_te_sc = svm_scaled.predict(X_te)

acc_tr_sc = accuracy_score(y_tr, y_tr_sc)
acc_te_sc = accuracy_score(y_te, y_te_sc)

print("\nWith scaling - train acc:", acc_tr_sc)
print("With scaling - test  acc:", acc_te_sc)


Linear R2: 0.1487 MSE: 184380.2415
Ridge(0.5748) R2: 0.2158 MSE: 169859.6296
Lasso(0.5748) R2: 0.2091 MSE: 171309.4014
