# quest 1

In [1]:
# f(x)=w⋅x+b

# Where:

# 𝑤
# w is the weight vector
# 𝑥
# x is the input feature vector
# 𝑏
# b is the bias term

# quest 2

In [2]:
# The objective function of a linear Support Vector Machine (SVM) aims to maximize the margin between the decision boundary and the support vectors while minimizing the classification error.

# quest 3

In [3]:

 
    #The kernel trick in Support Vector Machines (SVM) is a powerful technique that allows SVMs to efficiently handle non-linear classification tasks by implicitly mapping input features into a higher-dimensional space.

# quest 4

In [4]:

# Support vectors play a crucial role in Support Vector Machines (SVM). They are the data points that lie closest to the decision boundary (hyperplane) and have the most influence on determining the position and orientation of the boundary. Support vectors are critical because they directly influence the margin, which is the distance between the decision boundary and the nearest data point of any class.

# Let's illustrate the role of support vectors with a simple example:

# Imagine we have a binary classification problem where we aim to classify whether an email is spam (positive class) or not spam (negative class) based on two features: the number of words "buy" and "discount" in the email. We have the following data:

# Positive class (spam): (3, 4), (4, 5), (5, 6)
# Negative class (not spam): (1, 1), (2, 2), (3, 3)
# Now, let's train a linear SVM on this data. The decision boundary will be a line that separates the positive and negative classes. However, only the data points closest to this decision boundary, i.e., the support vectors, are crucial for defining the boundary.

# In this example, the support vectors are:

# For the positive class (spam): (3, 4), (5, 6)
# For the negative class (not spam): (1, 1), (3, 3)
# These support vectors are the data points that lie closest to the decision boundary. They essentially define the margin of the SVM, which is the perpendicular distance from the decision boundary to the nearest support vector of any class.

# quest 5


In [5]:
# Hyperplane:
# In SVM, the hyperplane is the decision boundary that separates the data points of different classes in the feature space.
# In a binary classification problem, the hyperplane is a (d-1)-dimensional subspace of the d-dimensional feature space.
# Example: Consider a simple 2D dataset with two classes, denoted by red and blue points. The hyperplane is a line that separates these two classes.
# Marginal Plane:
# The marginal plane is the boundary parallel to the hyperplane, located at a distance of one margin from the hyperplane.
# It defines the region where the support vectors lie.
# Example: In the same 2D dataset, the marginal planes are lines parallel to the hyperplane and equidistant from it.
# Hard Margin:
# In a hard margin SVM, the decision boundary (hyperplane) is required to perfectly separate the classes without any misclassification.
# It means no data points are allowed to fall within the margin.
# Example: In the 2D dataset, a hard margin SVM would find a line that perfectly separates the red and blue points without any overlap.
# Soft Margin:
# In a soft margin SVM, a certain degree of misclassification is allowed to find a better generalization of the decision boundary, especially in cases where data is not perfectly separable.
# Soft margin SVM introduces a penalty parameter (C) that controls the trade-off between maximizing the margin and minimizing the misclassification.
# Example: In the 2D dataset, a soft margin SVM may allow some data points to fall within the margin or even on the wrong side of the decision boundary to achieve better generalization.

# quest 6

In [6]:
# First, we'll load the Iris dataset, split it into training and testing sets, and train a linear SVM classifier using scikit-learn. Then, we'll implement a linear SVM classifier from scratch, train it on the training set, and evaluate its performance on the testing set. Finally, we'll plot the decision boundaries of the trained models and observe the effect of different values of the regularization parameter 
# 𝐶
# C on the performance.
# import numpy as np
# import matplotlib.pyplot as plt
# from sklearn import datasets
# from sklearn.model_selection import train_test_split
# from sklearn.preprocessing import StandardScaler
# from sklearn.svm import SVC
# from sklearn.metrics import accuracy_score

# # Load the Iris dataset
# iris = datasets.load_iris()
# X = iris.data[:, :2]  # Using only the first two features for visualization
# y = iris.target

# # Split the dataset into training and testing sets
# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# # Standardize features
# scaler = StandardScaler()
# X_train = scaler.fit_transform(X_train)
# X_test = scaler.transform(X_test)

# # Train linear SVM classifier using scikit-learn
# svm_clf_sklearn = SVC(kernel='linear')
# svm_clf_sklearn.fit(X_train, y_train)

# # Predict labels for the testing set
# y_pred_sklearn = svm_clf_sklearn.predict(X_test)

# # Compute accuracy of the model
# accuracy_sklearn = accuracy_score(y_test, y_pred_sklearn)
# print("Accuracy of scikit-learn SVM:", accuracy_sklearn)

# # Plot decision boundaries for scikit-learn SVM
# def plot_decision_boundary(clf, X, y):
#     x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
#     y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
#     xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
#                          np.arange(y_min, y_max, 0.01))
#     Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
#     Z = Z.reshape(xx.shape)
#     plt.contourf(xx, yy, Z, alpha=0.8)
#     plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm)
#     plt.xlabel('Sepal length')
#     plt.ylabel('Sepal width')
#     plt.title('Decision Boundary (scikit-learn SVM)')
#     plt.show()

# plot_decision_boundary(svm_clf_sklearn, X_train, y_train)


In [None]:
# Next, let's implement a linear SVM classifier from scratch and compare its performance with scikit-learn's implementation.

class LinearSVM:
    def __init__(self, lr=0.01, n_iters=1000):
        self.lr = lr
        self.n_iters = n_iters
        self.w = None
        self.b = None

    def fit(self, X, y):
        n_samples, n_features = X.shape

        self.w = np.zeros(n_features)
        self.b = 0

        for _ in range(self.n_iters):
            for idx, x_i in enumerate(X):
                condition = y[idx] * (np.dot(x_i, self.w) - self.b) >= 1
                if condition:
                    self.w -= self.lr * (2 * 1 / self.n_iters * self.w)
                else:
                    self.w -= self.lr * (2 * 1 / self.n_iters * self.w - np.dot(x_i, y[idx]))
                    self.b -= self.lr * y[idx]

    def predict(self, X):
        linear_output = np.dot(X, self.w) - self.b
        return np.sign(linear_output)

# Train linear SVM classifier from scratch
svm_clf_scratch = LinearSVM()
svm_clf_scratch.fit(X_train, y_train)

# Predict labels for the testing set
y_pred_scratch = svm_clf_scratch.predict(X_test)

# Compute accuracy of the model
accuracy_scratch = accuracy_score(y_test, y_pred_scratch)
print("Accuracy of scratch SVM:", accuracy_scratch)

# Plot decision boundaries for scratch SVM
plot_decision_boundary(svm_clf_scratch, X_train, y_train)
