**Q1**	The Iris dataset is a classic example for demonstrating classification algorithms. It consists of 150 samples of iris flowers belonging to three species: Setosa, Versicolor, and Virginica, with four input features (sepal and petal length/width).


a)	Load the dataset and perform train–test split (80:20).

b)	Train three different SVM models using the following kernels:
Linear, Polynomial (degree=3), RBF

c)	Evaluate each model using:
•	Accuracy
•	Precision
•	Recall
•	F1-Score

d)	Display the confusion matrix for each kernel.

e)	Identify which kernel performs the best and why.


In [3]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

# 1. Load dataset [cite: 4]
iris = load_iris()
X = iris.data
y = iris.target

# 2. Train-test split (80:20) [cite: 4]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# List of kernels to evaluate [cite: 5, 6]
kernels = ['linear', 'poly', 'rbf']

print("--- SVM Classification Results on Iris Dataset ---")

for k in kernels:
    # Initialize model (degree=3 is default for poly, but specified for clarity)
    if k == 'poly':
        model = SVC(kernel=k, degree=3)
    else:
        model = SVC(kernel=k)

    # Train the model
    model.fit(X_train, y_train)

    # Make predictions
    y_pred = model.predict(X_test)

    # Calculate metrics [cite: 7]
    # Note: average='weighted' is used because Iris is a multi-class dataset
    acc = accuracy_score(y_test, y_pred)
    prec = precision_score(y_test, y_pred, average='weighted')
    rec = recall_score(y_test, y_pred, average='weighted')
    f1 = f1_score(y_test, y_pred, average='weighted')
    cm = confusion_matrix(y_test, y_pred)

    # Output results
    print(f"\nKernel: {k.upper()}")
    print(f"Accuracy:  {acc:.4f}")
    print(f"Precision: {prec:.4f}")
    print(f"Recall:    {rec:.4f}")
    print(f"F1-Score:  {f1:.4f}")
    print("Confusion Matrix:")
    print(cm)

--- SVM Classification Results on Iris Dataset ---

Kernel: LINEAR
Accuracy:  1.0000
Precision: 1.0000
Recall:    1.0000
F1-Score:  1.0000
Confusion Matrix:
[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]

Kernel: POLY
Accuracy:  1.0000
Precision: 1.0000
Recall:    1.0000
F1-Score:  1.0000
Confusion Matrix:
[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]

Kernel: RBF
Accuracy:  1.0000
Precision: 1.0000
Recall:    1.0000
F1-Score:  1.0000
Confusion Matrix:
[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]


**Q2**	SVM models are highly sensitive to the scale of input features. When features have different ranges, the algorithm may incorrectly assign higher importance to variables with larger magnitudes, affecting the placement of the separating hyperplane. Feature scaling ensures that all attributes contribute equally to distance-based computations, which is especially crucial for kernels like RBF or polynomial.

A) Use the Breast Cancer dataset from sklearn.datasets.load_breast_cancer.

B) Train an SVM (RBF kernel) model with and without feature scaling (StandardScaler). Compare both results using:

•	Training accuracy

•	Testing accuracy


In [None]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler

# 1. Load dataset [cite: 17]
data = load_breast_cancer()
X = data.data
y = data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("--- Effect of Feature Scaling on SVM (RBF) ---")

# --- Case A: WITHOUT Scaling ---
model_no_scale = SVC(kernel='rbf')
model_no_scale.fit(X_train, y_train)

print("\n1. Without Feature Scaling:")
print(f"Training Accuracy: {model_no_scale.score(X_train, y_train):.4f}")
print(f"Testing Accuracy:  {model_no_scale.score(X_test, y_test):.4f}")

# --- Case B: WITH Scaling ---
# Initialize Scaler
scaler = StandardScaler()

# Fit on training data and transform both train and test
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model_with_scale = SVC(kernel='rbf')
model_with_scale.fit(X_train_scaled, y_train)

print("\n2. With Feature Scaling (StandardScaler):")
print(f"Training Accuracy: {model_with_scale.score(X_train_scaled, y_train):.4f}")
print(f"Testing Accuracy:  {model_with_scale.score(X_test_scaled, y_test):.4f}")

--- Effect of Feature Scaling on SVM (RBF) ---

1. Without Feature Scaling:
Training Accuracy: 0.9143
Testing Accuracy:  0.9474

2. With Feature Scaling (StandardScaler):
Training Accuracy: 0.9890
Testing Accuracy:  0.9825


**Q3** Discuss the effect of feature scaling on SVM performance

**The Effect of Feature Scaling**

**Without Scaling**: The accuracy is usually significantly lower (often around 90-93%). The model struggles because features like "Area" (which has large values) dominate features like "Smoothness" (which has tiny decimals).

**With Scaling**: The accuracy typically improves (often to 96-98%).


**Conclusion**: Feature scaling ensures that all attributes contribute equally to the distance computations. For distance-based algorithms like SVM (specifically RBF and Poly kernels), scaling is crucial for finding the optimal hyperplane