# Soft margin
 The goal is now not to make zero classification mistakes, but to make as few mistakes as possible.  

To do so, they modified the constraints of the optimization problem by adding a variable $\zeta$ (zeta). So the constraint

$${y_{i}\left(\mathbf{w} \cdot \mathbf{x}_{i}+b\right) \geq 1} $$

becomes

$${y_{i}\left(\mathbf{w} \cdot \mathbf{x}_{i}+b\right) \geq 1-\zeta_{i}}$$

As a result, when minimizing the objective function, it is possible to satisfy the constraint even if the example does not meet the original constraint (that is, it is too close from the hyperplane, or it is not on the correct side of the hyperplane)

The problem with hard margin is that we cannot seperate non-seperable data. But soft margin allow us.

In [1]:
import numpy as np 
 
w = np.array([0.4, 1]) 
b = -10 
 
x = np.array([6, 8]) 
y = -1 
 
def constraint(w, b, x, y):
    return y * (np.dot(w, x) + b) 

In [2]:
def hard_constraint_is_satisfied(w, b, x, y):
    return constraint(w, b, x, y) >= 1 
 
def soft_constraint_is_satisfied(w, b, x, y, zeta):
    return constraint(w, b, x, y) >= 1 - zeta 
 
    
# While the constraint is not satisfied for the example (6,8). 
print(hard_constraint_is_satisfied(w, b, x, y))               # False 
 
# We can use zeta = 2 and satisfy the soft constraint. 
print(soft_constraint_is_satisfied(w, b, x, y, zeta=2))       # True 

False
True


In [3]:
# We can pick a huge zeta for every point 
# to always satisfy the soft constraint. 
print(soft_constraint_is_satisfied(w, b, x, y, zeta=10))   # True 
print(soft_constraint_is_satisfied(w, b, x, y, zeta=1000)) # True

True
True


To avoid this, we need to modify the objective function to penalize the choice of a big $\zeta$

$$
\begin{array}{ll}{\underset{\mathbf{w}, b, \zeta}{\operatorname{minimize}}} & {\frac{1}{2}\|\mathbf{w}\|^{2}+\sum_{i=1}^{m} \zeta_{i}} \\ {\text { subject to }} & {y_{i}\left(\mathbf{w} \cdot \mathbf{x}_{i}+b\right) \geq 1-\zeta_{i} \quad \text { for any } i=1, \ldots, m}\end{array}
$$

We take the sum of all individual $\zeta_{i}$ and add it to the objective function. Adding such a penalty is called **regularization**. As a result, the solution will be the hyperplane that maximizes the margin while having the smallest error possible

There is still a little problem. With this formulation, one can easily minimize the function by using negative values of $\zeta_{i}$. We add the constraint $\zeta_{i}>=0$ to prevent this. Moreover, we would like to keep some control over the soft margin. Maybe sometimes we want to use the hard margin— after all, that is why we add the parameter 'C', which will help us to determine how important the $\zeta$ should be (more on that later). 

This leads us to the **soft margin formulation**

$$
\begin{array}{cl}{\underset{\mathbf{w}, b, \zeta}{\operatorname{minimize}}} & {\frac{1}{2}\|\mathbf{w}\|^{2}+C \sum_{i=1}^{m} \zeta_{i}} \\ {\text { subject to }} & {y_{i}\left(\mathbf{w} \cdot \mathbf{x}_{i}+b\right) \geq 1-\zeta_{i}} \\ {} & {\zeta_{i} \geq 0 \quad \text { for any } i=1, \ldots, m}\end{array}
$$

**The wolfe dual problem:**

$$
\begin{array}{cl}{\underset{\alpha}{\operatorname{maximize}}} & {\sum_{i=1}^{m} \alpha_{i}-\frac{1}{2} \sum_{i=1}^{m} \sum_{j=1}^{m} \alpha_{i} \alpha_{j} y_{i} y_{j} \mathbf{x}_{i} \cdot \mathbf{x}_{j}} \\ {\text { subject to }} & {0 \leq \alpha_{i} \leq C, \text { for any } i=1, \ldots, m} \\ {} & {\sum_{i=1}^{m} \alpha_{i} y_{i}=0}\end{array}
$$