Libraries

In [3]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

1) The Iris dataset is a classic example for demonstrating classification algorithms. It consists of 150 samples ofiris flowers belonging to three species: Setosa, Versicolor, and Virginica, with four input features (sepal and petal length/width). Use SVC from sklearn.svm on the Iris dataset and follow the steps below:

a) Load the dataset and perform train–test split (80:20). 
b)  Train three different SVM models using the following kernels: 
Linear, Polynomial (degree=3), RBF 
c) Evaluate each model using: 
• Accuracy 
• Precision 
• Recall 
• F1-Score 
d)  Display the confusion matrix for each kernel. 
e) Identify which kernel performs the best and why.

In [2]:
iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

kernels = ['linear', 'poly', 'rbf']

for k in kernels:
    if k == 'poly':
        model = SVC(kernel=k, degree=3)
    else:
        model = SVC(kernel=k)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    print(f"\nKernel: {k}")
    print("Accuracy:", accuracy_score(y_test, y_pred))
    print("Precision:", precision_score(y_test, y_pred, average='macro'))
    print("Recall:", recall_score(y_test, y_pred, average='macro'))
    print("F1-Score:", f1_score(y_test, y_pred, average='macro'))
    print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))


Kernel: linear
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1-Score: 1.0
Confusion Matrix:
 [[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]

Kernel: poly
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1-Score: 1.0
Confusion Matrix:
 [[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]

Kernel: rbf
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1-Score: 1.0
Confusion Matrix:
 [[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]


Best Kernel: Usually RBF or Linear, both perform perfectly on Iris since classes are linearly separable.
Reason: RBF captures non-linear relations; Linear works great for simple separable data like Iris.

2) SVM models are highly sensitive to the scale of input features. When features have different ranges, the algorithm may incorrectly assign higher importance to variables with larger magnitudes, affecting the placement of the separating hyperplane. Feature scaling ensures that all attributes contribute equally to distance-based computations, which is especially crucial for kernels like RBF or polynomial. 
 
A) Use the Breast Cancer dataset from sklearn.datasets.load_breast_cancer. 
B) Train an SVM (RBF kernel) model with and without feature scaling 
(StandardScaler). Compare both results using: 
• Training accuracy 
• Testing accuracy

In [4]:
data = load_breast_cancer()
X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model1 = SVC(kernel='rbf')
model1.fit(X_train, y_train)
train_acc1 = model1.score(X_train, y_train)
test_acc1 = model1.score(X_test, y_test)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model2 = SVC(kernel='rbf')
model2.fit(X_train_scaled, y_train)
train_acc2 = model2.score(X_train_scaled, y_train)
test_acc2 = model2.score(X_test_scaled, y_test)

print("\nWithout Scaling:")
print("Training Accuracy:", train_acc1)
print("Testing Accuracy:", test_acc1)

print("\nWith Scaling:")
print("Training Accuracy:", train_acc2)
print("Testing Accuracy:", test_acc2)


Without Scaling:
Training Accuracy: 0.9142857142857143
Testing Accuracy: 0.9473684210526315

With Scaling:
Training Accuracy: 0.989010989010989
Testing Accuracy: 0.9824561403508771
