7)	Implement a Support Vector Machine model for the built-in iris dataset

Support Vector Machine (SVM) is a supervised machine learning algorithm
used for both classification and regression. Though we say regression problems as
well its best suited for classification. The main objective of the SVM algorithm is to
find the optimal hyper plane in an N-dimensional space that can separate the data
points in different classes in the feature space. The hyper plane tries that the margin
between the closest points of different classes should be as maximum as possible.
The dimension of the hyper plane depends upon the number of features. If the
number of input features is two, then the hyper plane is just a line. If the number of
input features is three, then the hyper plane becomes a 2-D plane. It becomes
difficult to imagine when the number of features exceeds three. Let’s consider two
independent variables x1, x2, and one dependent variable which is either a blue
circle or a red circle.

Hyper plane: Hyper plane is the decision boundary that is used to separate the data points of
different classes in a feature space. In the case of linear classifications, it will be a linear
equation i.e. wx+b = 0.

Support Vectors: These are the points that are closest to the hyperplane. A separating line
will be defined with the help of these data points
Margin: it is the distance between the hyperplane and the observations closest to the
hyperplane (support vectors). In SVM large margin is considered a good margin. There are
two types of margins : hard margin and soft margin.
What does SVM do?
Given a set of training examples, each marked as belonging to one or the other of two
categories, an SVM training algorithm builds a model that assigns new examples to one
category or the other, making it a non-probabilistic binary linear classifier.
Pros and Cons-
● Pros:
○ It works really well with a clear margin of separation
○ It is effective in high dimensional spaces and in cases where the number of
dimensions is greater than the number of samples.
○ It uses a subset of training points in the decision function (called support
vectors), so it is also memory efficient.

● Cons:
○ It doesn’t perform well when we have large data set (as the required training
time is higher) and when the data set has more noise i.e. target classes are
overlapping
○ SVM doesn’t directly provide probability estimates, these are calculated using
an expensive five-fold cross-validation.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, classification_report
import seaborn as sns

# Load the iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# i. Data scaling
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# ii. Training and testing of the model
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# iii. Create the SVM model
model = SVC(kernel='rbf', random_state=42)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# iv. Display confusion matrix and classification report
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(cm)

print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))

# Visualize the confusion matrix
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=iris.target_names,
            yticklabels=iris.target_names)
plt.title('Confusion Matrix')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

# v. Display k-fold cross-validation score
cv_scores = cross_val_score(model, X_scaled, y, cv=5)
print("\nCross-validation scores:", cv_scores)
print("Mean CV score:", cv_scores.mean())
print("Standard deviation of CV scores:", cv_scores.std())

# Function to plot decision boundaries
def plot_decision_boundaries(X, y, ax=None):
    model = SVC(kernel='rbf', random_state=42)
    model.fit(X, y)

    h = .02  # step size in the mesh
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    if ax is None:
        ax = plt.gca()

    ax.contourf(xx, yy, Z, alpha=0.8, cmap=plt.cm.RdYlBu)
    scatter = ax.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.RdYlBu, edgecolor='black')
    ax.set_xlabel(iris.feature_names[0])
    ax.set_ylabel(iris.feature_names[1])
    return scatter

# Visualize decision boundaries for different feature pairs
fig, axs = plt.subplots(3, 2, figsize=(15, 20))
feature_pairs = [(0, 1), (0, 2), ( 0, 3), (1, 2), (1, 3), (2, 3)]
for i, (ax, pair) in enumerate(zip(axs.flatten(), feature_pairs)):
    X_pair = X_scaled[:, pair]
    scatter = plot_decision_boundaries(X_pair, y, ax=ax)
    ax.set_title(f'Decision Boundaries (Features {pair[0]} and {pair[1]})')
plt.tight_layout()
plt.show()