# Support Vector Machines

- This is a supplement material for the [Machine Learning Simplified](https://themlsbook.com) book. It sheds light on Python implementations of the topics discussed while all detailed explanations can be found in the book. 
- I also assume you know Python syntax and how it works. If you don't, I highly recommend you to take a break and get introduced to the language before going forward with my code. 
- This material can be downloaded as a Jupyter notebook (Download button in the upper-right corner -> `.ipynb`) to reproduce the code and play around with it. 



## 1. Required Libraries & Functions

Before we start, we need to import few libraries and functions that we will use in this jupyterbook. You don't need to understand what those functions do for now.

In [12]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from tqdm import tqdm

In [14]:
def conv_target(x: str) -> int:
    """
    Function transforming target value to -1 and 1.
    """
    if x == 'Mandarin':
        return 1
    else:
        return -1

## 2. Probelm Representation

Let's recall Chapter X of [the Machine Learning Simplified book](https://themlsbook.com). We have a hypothetical dataset (Table XX in the MLS book) containing 8 data points of two classes, `Mandarin` and `Apple`. 


| height | width | fruit |
| ----------- | ----------- | ----------- | 
| 4.6 | 7.0 | Mandarin | 
| 5.3  | 6.2 | Mandarin | 
| 5.6 | 7.3 | Mandarin |
| 6.3 | 6.0 | Mandarin | 
| 6.9 | 9.4 | Apple | 
| 7.5 | 8.2 | Apple | 
| 8.3 | 8.9 | Apple | 
| 8.6 | 8.0 | Apple | 

### 2.1. Create Hypothetical Dataset

Let's re-create aforementioned table in python. We use `pandas` library - a library that manages **PAN**el **DA**ta **S**ets - to do so. Note that we have already imported it in the beginning of this notebook.

In [5]:
#re-create a hypothetical dataset
data = {'height': [4.6, 5.3, 5.6, 6.3, 6.9, 7.5, 8.3, 8.6],
'width': [7.0, 6.2, 7.3, 6.0, 9.4, 8.2, 8.9, 8.0],
'fruit': ["Mandarin","Mandarin","Mandarin","Mandarin","Apple","Apple","Apple","Apple"]
}

#transform dataset into a DataFrame df using pandas library
df = pd.DataFrame(data)

#print the output
df

Unnamed: 0,height,width,fruit
0,4.6,7.0,Mandarin
1,5.3,6.2,Mandarin
2,5.6,7.3,Mandarin
3,6.3,6.0,Mandarin
4,6.9,9.4,Apple
5,7.5,8.2,Apple
6,8.3,8.9,Apple
7,8.6,8.0,Apple


In [17]:
# Plot dataset
fig = px.scatter(df, x="width", y="height", color="fruit", color_continuous_scale='Bluered_r')
fig.update_traces(marker_size=8)
fig.show()

### 2.2. Defining Variables

In [15]:
# To implement SVM we need to transform target value to -1 and 1 values.
df['target'] = df['fruit'].apply(conv_target)

# Print the result
df

Unnamed: 0,height,width,fruit,target
0,4.6,7.0,Mandarin,1
1,5.3,6.2,Mandarin,1
2,5.6,7.3,Mandarin,1
3,6.3,6.0,Mandarin,1
4,6.9,9.4,Apple,-1
5,7.5,8.2,Apple,-1
6,8.3,8.9,Apple,-1
7,8.6,8.0,Apple,-1


In [16]:
# Also we split our dataset into target and independent variables
X, y = df[['height', 'width']].values, df['target'].values

## 3. Plotting Random Hyperplane

Now we are going to see how random line splits dataset into two parts. Let's plot random line:

$$ y = - 2x + 19 $$

Or in terms of vector calculus:

$$ (x, y)^T (1, -2) - 19 = 0 $$

$$ w = (1, -2), b = -19 $$

In [9]:
# Defining line
def init_line(a: float, b: float, x: float) -> float:
    return  a * x + b

In [10]:
fig = px.scatter(df, x="width", y="height", color="fruit", color_continuous_scale='Bluered_r')
fig.update_traces(marker_size=8)

x_0 = min(df.width.values)
x_1 = max(df.width.values)

fig.add_shape(type="line",
              x0=x_0, 
              y0=init_line(-2, 19, x_0), 
              x1=x_1, 
              y1=init_line(-2, 19, x_1))
fig.show()

This line not only doesn't split our dataset properly, it also doesn't satisfy properties of SVM hyperplane.

Even if we will plot line that properly divides our dataset, there are infinitely many variants.

In [18]:
fig = px.scatter(df, x="width", y="height", color="fruit", color_continuous_scale='Bluered_r')
fig.update_traces(marker_size=8)

x_0 = min(df.width.values)
x_1 = max(df.width.values)



fig.add_shape(type="line",
              x0=x_0, 
              y0=init_line(-3, 30, x_0), 
              x1=x_1, 
              y1=init_line(-3, 30, x_1))

fig.add_shape(type="line",
              x0=x_0, 
              y0=init_line(-1.8, 21, x_0), 
              x1=x_1, 
              y1=init_line(-1.8, 21, x_1))

fig.add_shape(type="line",
              x0=x_0, 
              y0=init_line(-1, 15, x_0), 
              x1=x_1, 
              y1=init_line(-1, 15, x_1))
fig.show()

## 4. Hard-margin SVM

In [None]:
# to be done

UNEDITED -> 

## Soft-margin SVM

Recall optimization problem for soft-margin SVM:

$$ minimize [\frac{1}{2} || w ||^2 + C \cdot \sum \zeta_i] $$

subject to:

$$ y_i(w^T \cdot x_i + b) \ge 1 - \zeta_i $$

More natural form:

$$ f(w, b) = \frac{\lambda}{2} ||w||^2 + \frac{1}{2} \sum_i^m \max [0, 1 - y_i(w^T \cdot x_i + b)] $$

$$ f(w, b) \to \min $$

Let's see what loss value will different variants of hyperplanes produce. To do so, we will implement function that counts loss. For simplicity we will assume that $ \lambda = 1 $.

In [12]:
def loss_value(
    w: np.array,
    b: float,
    y: np.array,
    X: np.array,
    lambda_param=1):
    
    # Init loss with zero
    loss = 0
    
    # Sum of max's terms
    for x_i, y_i in zip(X, y):
        loss += max(0, 1 - y_i * (w.T @ x_i + b))
    
    # Multiplying sum by 1/2
    loss *= 1 / 2
    
    # Adding first summand
    loss += lambda_param / 2 * w.T @ w

    return loss

Also we recall that in 2d case, we have different ways to describe line:

**Slope-intercept form**:
$$ y(x) = ax + b_1 $$

**Standard form**:
$$ w^T z + b_2 = 0 $$

(not only these ones, but these are enough for us)

Obviously it is pretty easy to convert one form of line description to another:

$$ y = ax + b $$

$$ y - ax - b = 0 $$

$$ w = (1, -a), b_2 = - b_1 $$

And vice versea:

$$ w^T z + b_2 = 0 $$

$$ w_1 y + w_2 x + b_2 = 0 $$

$$ y = - \frac{w_2}{w_1} x - \frac{b_2}{w_1} $$

In [13]:
def conv_stand_to_slope(w: np.array, b_2: float):
    """
    This function converts line equation from standard to slope-intercept form:
    """
    a = - w[1] / w[0]
    b_1 = - b_2 / w[0]
    return a, b_1

def conv_slope_to_stand(a: float, b_1: float):
    """
    This function converts line equation from slope-intercept to standard form.
    """
    w = np.array([1, -a])
    b_2 = - b_1
    return w, b_2

Let's start with our first example line: $ y = -2x + 19 $.

In [20]:
# list to store losses
losses = []

w, b = conv_slope_to_stand(a = -2, 
                           b_1 = 19)

loss = loss_value(w, b, y, X)

print('y = -2x + 19', f'loss={loss}')
losses.append(loss)

y = -2x + 19 loss=19.35


Now let's see loss values of lines:
$$ y = -3.0x + 30 $$
$$ y = -1.8x + 21 $$
$$ y = -1.0x + 15 $$

In [24]:
a = -3
b = 30

w, b = conv_slope_to_stand(a, b)
loss = loss_value(w, b, y, X)
print(f'y = {a}x + {b}', f'loss={loss:.2f}')
losses.append(loss)

a = -1.8
b = 21

w, b = conv_slope_to_stand(a, b)
loss = loss_value(w, b, y, X)
print(f'y = {a}x + {b}', f'loss={loss:.2f}')
losses.append(loss)

a = -1
b = 15

w, b = conv_slope_to_stand(a, b)
loss = loss_value(w, b, y, X)
print(f'y = {a}x + {b}', f'loss={loss:.2f}')
losses.append(loss)

y = -3x + -30 loss=25.75
y = -1.8x + -21 loss=18.07
y = -1x + -15 loss=13.75


Now we plot the results to see the difference.

In [25]:
fig = px.scatter(df, x="width", y="height", color="fruit", color_continuous_scale='Bluered_r')
fig.update_traces(marker_size=8)

x_0 = min(df.width.values)
x_1 = max(df.width.values)

fig.add_trace(
    go.Scatter(
        x = [x_0, x_1],
        y = [init_line(-2, 19, x_0), init_line(-2, 19, x_1)],
        mode='lines',
        line=dict(color='green', width=2),
        name = f'y = -2x + 19, loss={round(losses[0], 4)}'
    )
)

fig.add_trace(
    go.Scatter(
        x = [x_0, x_1],
        y = [init_line(-3.0, 30, x_0), init_line(-3.0, 30, x_1)],
        mode='lines',
        line=dict(color='black', width=2),
        name = f'y = -3.0x + 30, loss={round(losses[1], 4)}'
    )
)

fig.add_trace(
    go.Scatter(
        x = [x_0, x_1],
        y = [init_line(-1.8, 21, x_0), init_line(-1.8, 21, x_1)],
        mode='lines',
        line=dict(color='purple', width=2),
        name = f'y = -1.8x + 21, loss={round(losses[2], 4)}'
    )
)

fig.add_trace(
    go.Scatter(
        x = [x_0, x_1],
        y = [init_line(-1.0, 15, x_0), init_line(-1.0, 15, x_1)],
        mode='lines',
        line=dict(color='pink', width=2),
        name = f'y = -1.0x + 15, loss={round(losses[3], 4)}'
    )
)

fig.show()

## Implementing Soft-margin SVM

We are going to implement subgradient descent to find weights and bias that minimizes our loss function. It will we pretty the same as simple SGD, except find subgradient instead of gradient.

Subgradient for our loss function `f`:

$$ f(w, b) = \frac{\lambda}{2} ||w||^2 + \frac{1}{2} \sum_i^m \max [0, 1 - y_i(w^T \cdot x_i + b)] $$

If $ y_i(w^T \cdot x_i + b) < 1 $
$$ \frac{\partial f(w, b)}{\partial w} = \lambda w - y_i x_i $$

$$ \frac{\partial f(w, b)}{\partial b} = - y_i x_i $$

Else
$$ \frac{\partial f(w, b)}{\partial w} = \lambda w + 0 $$

$$ \frac{\partial f(w, b)}{\partial b} = 0 $$

In [26]:
class SVM:
    def __init__(
        self,
        C=1,
        epochs=40000,
        batch_size=8
    ):
        self.C = C
        self.batch_size = batch_size
        self.epochs = epochs
        
    def calc_subgrad(
        self,
        X: np.array,
        y: np.array,
        w: np.array,
        b: float
    ):
        '''
        This function calculates subgradient of loss function at point (w, b).
        '''
        
        # Init subgrad for w and b by 0
        subgrad_w = 0
        subgrad_b = 0
        
        # Iterate over all samples of a given dataset
        for x_i, y_i in zip(X, y):
            # Calculating condition defined in the above cell
            decision_cond = y_i * (w.T @ x_i + b)
            
            # If condition met -> add -x_i*y_i to subgrad
            if decision_cond < 1:
                subgrad_w += - y_i * x_i
                subgrad_b += - 1 * y_i
        
        # Multiply subgrad by C (C = 1/lambda)
        subgrad_w *= self.C
        subgrad_b *= self.C
        
        return subgrad_w + w, subgrad_b
        
    def fit(self, X: np.array, y: np.array) -> None:
        C = self.C
        batch_size = self.batch_size
        
        number_samples = X.shape[0]
        number_features = X.shape[1]
        
        # Init starting points for stochastic subgrad descent
        w = np.ones((number_features, ))
        b = 0
        
        # Get ids of dataset and randomly shuffle them
        ids = list(range(number_samples))
        np.random.shuffle(ids)
        
        # Loop over epochs
        for epoch in tqdm(range(1, self.epochs + 1)):
            # Learning rate is defined as 1/epoch_number 
            lr = 1 / epoch
            
            # Loop over batches of dataset
            for batch_start in range(0, number_samples, batch_size):
                X_batch = X[ids[batch_start : min(batch_start + batch_size, number_samples)]]
                y_batch = y[ids[batch_start : min(batch_start + batch_size, number_samples)]]
                
                # Calculating subgrad at current point (point = (w, b))
                subgrad_w, subgrad_b = self.calc_subgrad(X_batch, y_batch, w, b)
                
                # Substituting subgrad multiplied by lr from current point (point = (w, b))
                w = w - lr * subgrad_w
                b = b - lr * subgrad_b

        # Assigning resulting weights to class fields
        self.w = w
        self.b = b
    
    def get_line(self, w: np.array, b: float, x: float):
        k = - w[1] / w[0]
        b /= - w[0]
        
        return k * x + b
    
    def plot_res(self, X, y):
        fig = px.scatter(x=X[:, 0], y=X[:, 1], color=y, color_continuous_scale='Bluered_r')
        fig.update_traces(marker_size=8)

        x_0 = min(X[:, 0])
        x_1 = max(X[:, 0])

        fig.add_shape(type="line",
                      x0=x_0, 
                      y0=self.get_line(self.w, self.b, x_0), 
                      x1=x_1, 
                      y1=self.get_line(self.w, self.b, x_1))
        fig.show()

In [27]:
svm = SVM()

In [28]:
svm.fit(X, y)

100%|████████████████████████████████████████████████████████████| 40000/40000 [00:00<00:00, 49281.54it/s]


In [29]:
svm.plot_res(X, y)

In [30]:
# Resulting params
print(f'w={svm.w}')
print(f'b={svm.b}')
print(f'loss={loss_value(svm.w, svm.b, y, X)}')

w=[-0.39563   -0.4186175]
b=5.749394719514705
loss=0.7717296193457828
