# Support Vector Classifiers

SVM classifiers, short for Support Vector Machine classifiers, are a powerful tool in machine learning used for classifying data. They work by finding a hyperplane, essentially a dividing line in lower dimensions or a plane in higher dimensions, that best separates different classes of data points. The key idea is to maximize the margin between this decision boundary and the closest data points from each class, called support vectors. This approach allows SVMs to effectively handle complex data and even perform well in situations where there are more dimensions than data points, making them a versatile and well-regarded classification technique.

$$
\begin{array}{ll}
\text{maximize:} & \quad ||\mathbf{w}||^2 \ \
\text{subject to:} & \quad y_i(\mathbf{w} \cdot \mathbf{x}_i + b) \geq 1, \forall i
\end{array}
$$

In [1]:
# importing libraries

import pandas as pd
import numpy as np
import plotly.express as px
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

In [2]:
# Creating Data with Somewhat Random Numbers

mean1 = 55
std_dev1 = 10
num_samples = 500

column1_numbers = np.random.normal(mean1, std_dev1, num_samples)
column1_numbers = np.clip(column1_numbers, 12, 26)
column1_numbers = np.round(column1_numbers).astype(int)

mean2 = 18
std_dev2 = 3

column2_numbers = np.random.normal(mean2, std_dev2, num_samples)
column2_numbers = np.clip(column2_numbers, 12, 26)
column2_numbers = np.round(column2_numbers).astype(int)

column3_numbers = np.random.randint(2, size = num_samples)
column3_numbers[column1_numbers > mean1] = 1

data = {
    'Miles_Per_week': column1_numbers,
    'Farthest_run': column2_numbers,
    'Qualified_Boston_Marathon': column3_numbers
}

df = pd.DataFrame(data)

In [3]:
df.describe()

Unnamed: 0,Miles_Per_week,Farthest_run,Qualified_Boston_Marathon
count,500.0,500.0,500.0
mean,26.0,17.97,0.516
std,0.0,2.968286,0.500244
min,26.0,12.0,0.0
25%,26.0,16.0,0.0
50%,26.0,18.0,1.0
75%,26.0,20.0,1.0
max,26.0,26.0,1.0


In [4]:
# Making a Visualization of the Data

fig = px.scatter(df,
                 x = 'Miles_Per_week',
                 y = 'Qualified_Boston_Marathon',
                 marginal_x='histogram',
                 marginal_y='violin',
                 color='Qualified_Boston_Marathon',
                 color_continuous_scale=px.colors.sequential.Electric)
fig.update_layout(title = 'Miles Per Week Data',
                  xaxis_title = 'Miles Per Week',
                  yaxis_title = 'Qualified Boston Marathon')
fig.show()

In [5]:
# Visualizing the Farthest Run Data

fig = px.scatter(df,
                 x = 'Farthest_run',
                 y = 'Qualified_Boston_Marathon',
                 marginal_x='histogram',
                 marginal_y='violin',
                 color='Qualified_Boston_Marathon',
                 color_continuous_scale=px.colors.sequential.Electric)
fig.update_layout(title = 'Farthest Run Scatter',
                  xaxis_title = 'Farthest Run',
                  yaxis_title = 'Qualified Boston Marathon')
fig.show()

In [6]:
# Feature/ Label Selection

X = df.iloc[:, 0:2]
y = df.iloc[:, 2]

In [20]:
# Splitting the Data and making the model

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Regularization
sv_classifier = SVC(C=1000)
sv_classifier.fit(X_train, y_train)

In [30]:
# Making Predictions

y_pred = sv_classifier.predict(X_test)
print(y_pred)

[1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1
 0 0 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1]


In [32]:
# Model Specifications

cm = confusion_matrix(y_test, y_pred)
print(cm)

cr = classification_report(y_test, y_pred)
print(cr)

[[12 36]
 [ 4 48]]
              precision    recall  f1-score   support

           0       0.75      0.25      0.38        48
           1       0.57      0.92      0.71        52

    accuracy                           0.60       100
   macro avg       0.66      0.59      0.54       100
weighted avg       0.66      0.60      0.55       100

